#. Named values are represented as a string of characters with their
prefix. For example, ``%foo``, ``@DivisionByZero``,
``%a.really.long.identifier``. The actual regular expression used is
- '``[%@][a-zA-Z$._][a-zA-Z$._0-9]*``'. Identifiers which require other
+ '``[%@][a-zA-Z$._][a-zA-Z$._0-9]*``'. Identifiers that require other
characters in their names can be surrounded with quotes. Special
characters may be escaped using ``"\xx"`` where ``xx`` is the ASCII
code for the character in hexadecimal. In this way, any character can
- be used in a name value, even quotes themselves.
+ be used in a name value, even quotes themselves. The ``"\01"`` prefix
+ can be used on global variables to suppress mangling.
#. Unnamed values are represented as an unsigned numeric value with
their prefix. For example, ``%12``, ``@2``, ``%44``.
#. Constants, which are described in the section Constants_ below.
.. code-block:: llvm
- %0 = add i32 %X, %X ; yields {i32}:%0
- %1 = add i32 %0, %0 ; yields {i32}:%1
+ %0 = add i32 %X, %X ; yields i32:%0
+ %1 = add i32 %0, %0 ; yields i32:%1
%result = add i32 %1, %1
This last way of multiplying ``%X`` by 8 illustrates several important
#. Unnamed temporaries are created when the result of a computation is
not assigned to a named value.
#. Unnamed temporaries are numbered sequentially (using a per-function
- incrementing counter, starting with 0). Note that basic blocks are
- included in this numbering. For example, if the entry basic block is not
- given a label name, then it will get number 0.
+ incrementing counter, starting with 0). Note that basic blocks and unnamed
+ function parameters are included in this numbering. For example, if the
+ entry basic block is not given a label name and all function parameters are
+ named, then it will get number 0.
It also shows a convention that we follow in this document. When
demonstrating instructions, we will follow an instruction with a comment
}
; Named metadata
- !1 = metadata !{i32 42}
- !foo = !{!1, null}
+ !0 = metadata !{i32 42, null, metadata !"string"}
+ !foo = !{!0}
This example is made up of a :ref:`global variable <globalvars>` named
"``.str``", an external declaration of the "``puts``" function, a
defining module will bind to the local symbol. That is, the symbol
cannot be overridden by another module.
-.. _namedtypes:
+A symbol with ``internal`` or ``private`` linkage must have ``default``
+visibility.
+
+.. _dllstorageclass:
DLL Storage Classes
-------------------
exists for defining a dll interface, the compiler, assembler and linker know
it is externally referenced and must refrain from deleting the symbol.
+.. _tls_model:
+
+Thread Local Storage Models
+---------------------------
+
+A variable may be defined as ``thread_local``, which means that it will
+not be shared by threads (each thread will have a separated copy of the
+variable). Not all targets support thread-local variables. Optionally, a
+TLS model may be specified:
+
+``localdynamic``
+ For variables that are only used within the current shared library.
+``initialexec``
+ For variables in modules that will not be loaded dynamically.
+``localexec``
+ For variables defined in the executable and only used within it.
+
+If no explicit model is given, the "general dynamic" model is used.
+
+The models correspond to the ELF TLS models; see `ELF Handling For
+Thread-Local Storage <http://people.redhat.com/drepper/tls.pdf>`_ for
+more information on under which circumstances the different models may
+be used. The target may choose a different TLS model if the specified
+model is not supported, or if a better choice of model can be made.
+
+A model can also be specified in a alias, but then it only governs how
+the alias is accessed. It will not have any effect in the aliasee.
+
+.. _namedtypes:
+
Structure Types
---------------
LLVM IR allows you to specify both "identified" and "literal" :ref:`structure
types <t_struct>`. Literal types are uniqued structurally, but identified types
are never uniqued. An :ref:`opaque structural type <t_opaque>` can also be used
-to forward declare a type which is not yet available.
+to forward declare a type that is not yet available.
An example of a identified structure specification is:
Global variables define regions of memory allocated at compilation time
instead of run-time.
-Global variables definitions must be initialized, may have an explicit section
-to be placed in, and may have an optional explicit alignment specified.
+Global variables definitions must be initialized.
Global variables in other translation units can also be declared, in which
case they don't have an initializer.
-A variable may be defined as ``thread_local``, which means that it will
-not be shared by threads (each thread will have a separated copy of the
-variable). Not all targets support thread-local variables. Optionally, a
-TLS model may be specified:
-
-``localdynamic``
- For variables that are only used within the current shared library.
-``initialexec``
- For variables in modules that will not be loaded dynamically.
-``localexec``
- For variables defined in the executable and only used within it.
-
-The models correspond to the ELF TLS models; see `ELF Handling For
-Thread-Local Storage <http://people.redhat.com/drepper/tls.pdf>`_ for
-more information on under which circumstances the different models may
-be used. The target may choose a different TLS model if the specified
-model is not supported, or if a better choice of model can be made.
+Either global variable definitions or declarations may have an explicit section
+to be placed in and may have an optional explicit alignment specified.
A variable may be defined as a global ``constant``, which indicates that
the contents of the variable will **never** be modified (enabling better
LLVM allows an explicit section to be specified for globals. If the
target supports it, it will emit globals to the section specified.
+Additionally, the global can placed in a comdat if the target has the necessary
+support.
By default, global initializers are optimized by assuming that global
variables defined within the module are not modified from their
case, the extra alignment could be observable: for example, code could
assume that the globals are densely packed in their section and try to
iterate over them as an array, alignment padding would break this
-iteration.
+iteration. The maximum alignment is ``1 << 29``.
Globals can also have a :ref:`DLL storage class <dllstorageclass>`.
+Variables and aliasaes can have a
+:ref:`Thread Local Storage Model <tls_model>`.
+
Syntax::
[@<GlobalVarName> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal]
- [AddrSpace] [unnamed_addr] [ExternallyInitialized]
- <global | constant> <Type>
+ [unnamed_addr] [AddrSpace] [ExternallyInitialized]
+ <global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, align <Alignment>]
For example, the following defines a global in a numbered address space
:ref:`parameter attribute <paramattrs>` for the return type, a function
name, a (possibly empty) argument list (each with optional :ref:`parameter
attributes <paramattrs>`), optional :ref:`function attributes <fnattrs>`,
-an optional section, an optional alignment, an optional :ref:`garbage
-collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
+an optional section, an optional alignment,
+an optional :ref:`comdat <langref_comdats>`,
+an optional :ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`, an opening
curly brace, a list of basic blocks, and a closing curly brace.
LLVM function declarations consist of the "``declare``" keyword, an
LLVM allows an explicit section to be specified for functions. If the
target supports it, it will emit functions to the section specified.
+Additionally, the function can placed in a COMDAT.
An explicit alignment may be specified for a function. If not present,
or if the alignment is set to zero, the alignment of the function is set
define [linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
- [unnamed_addr] [fn Attrs] [section "name"] [align N]
- [gc] [prefix Constant] { ... }
+ [unnamed_addr] [fn Attrs] [section "name"] [comdat $<ComdatName>]
+ [align N] [gc] [prefix Constant] { ... }
+
+The argument list is a comma seperated sequence of arguments where each
+argument is of the following form
+
+Syntax::
+
+ <type> [parameter Attrs] [name]
+
.. _langref_aliases:
Aliases
-------
-Aliases act as "second name" for the aliasee value (which can be either
-function, global variable, another alias or bitcast of global value).
+Aliases, unlike function or variables, don't create any new data. They
+are just a new symbol and metadata for an existing position.
+
+Aliases have a name and an aliasee that is either a global value or a
+constant expression.
+
Aliases may have an optional :ref:`linkage type <linkage>`, an optional
-:ref:`visibility style <visibility>`, and an optional :ref:`DLL storage class
-<dllstorageclass>`.
+:ref:`visibility style <visibility>`, an optional :ref:`DLL storage class
+<dllstorageclass>` and an optional :ref:`tls model <tls_model>`.
Syntax::
- @<Name> = [Visibility] [DLLStorageClass] alias [Linkage] <AliaseeTy> @<Aliasee>
+ @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy> @<Aliasee>
The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``,
``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers
-might not correctly handle dropping a weak symbol that is aliased by a non-weak
-alias.
+might not correctly handle dropping a weak symbol that is aliased.
Alias that are not ``unnamed_addr`` are guaranteed to have the same address as
-the aliasee.
+the aliasee expression. ``unnamed_addr`` ones are only guaranteed to point
+to the same content.
+
+Since aliases are only a second name, some restrictions apply, of which
+some can only be checked when producing an object file:
+
+* The expression defining the aliasee must be computable at assembly
+ time. Since it is just a name, no relocations can be used.
+
+* No alias in the expression can be weak as the possibility of the
+ intermediate alias being overridden cannot be represented in an
+ object file.
+
+* No global value in the expression can be a declaration, since that
+ would require a relocation, which is not possible.
+
+.. _langref_comdats:
+
+Comdats
+-------
+
+Comdat IR provides access to COFF and ELF object file COMDAT functionality.
-The aliasee must be a definition.
+Comdats have a name which represents the COMDAT key. All global objects that
+specify this key will only end up in the final object file if the linker chooses
+that key over some other key. Aliases are placed in the same COMDAT that their
+aliasee computes to, if any.
+
+Comdats have a selection kind to provide input on how the linker should
+choose between keys in two different object files.
+
+Syntax::
+
+ $<Name> = comdat SelectionKind
+
+The selection kind must be one of the following:
+
+``any``
+ The linker may choose any COMDAT key, the choice is arbitrary.
+``exactmatch``
+ The linker may choose any COMDAT key but the sections must contain the
+ same data.
+``largest``
+ The linker will choose the section containing the largest COMDAT key.
+``noduplicates``
+ The linker requires that only section with this COMDAT key exist.
+``samesize``
+ The linker may choose any COMDAT key but the sections must contain the
+ same amount of data.
+
+Note that the Mach-O platform doesn't support COMDATs and ELF only supports
+``any`` as a selection kind.
+
+Here is an example of a COMDAT group where a function will only be selected if
+the COMDAT key's section is the largest:
+
+.. code-block:: llvm
+
+ $foo = comdat largest
+ @foo = global i32 2, comdat $foo
+
+ define void @bar() comdat $foo {
+ ret void
+ }
+
+In a COFF object file, this will create a COMDAT section with selection kind
+``IMAGE_COMDAT_SELECT_LARGEST`` containing the contents of the ``@foo`` symbol
+and another COMDAT section with selection kind
+``IMAGE_COMDAT_SELECT_ASSOCIATIVE`` which is associated with the first COMDAT
+section and contains the contents of the ``@baz`` symbol.
+
+There are some restrictions on the properties of the global object.
+It, or an alias to it, must have the same name as the COMDAT group when
+targeting COFF.
+The contents and size of this object may be used during link-time to determine
+which COMDAT groups get selected depending on the selection kind.
+Because the name of the object must match the name of the COMDAT group, the
+linkage of the global object must not be local; local symbols can get renamed
+if a collision occurs in the symbol table.
+
+The combined use of COMDATS and section attributes may yield surprising results.
+For example:
+
+.. code-block:: llvm
+
+ $foo = comdat any
+ $bar = comdat any
+ @g1 = global i32 42, section "sec", comdat $foo
+ @g2 = global i32 42, section "sec", comdat $bar
+
+From the object file perspective, this requires the creation of two sections
+with the same name. This is necessary because both globals belong to different
+COMDAT groups and COMDATs, at the object file level, are represented by
+sections.
+
+Note that certain IR constructs like global variables and functions may create
+COMDATs in the object file in addition to any which are specified using COMDAT
+IR. This arises, for example, when a global variable has linkonce_odr linkage.
.. _namedmetadatastructure:
``inalloca``
-.. Warning:: This feature is unstable and not fully implemented.
-
The ``inalloca`` argument attribute allows the caller to take the
address of outgoing stack arguments. An ``inalloca`` argument must
be a pointer to stack memory produced by an ``alloca`` instruction.
The alloca, or argument allocation, must also be tagged with the
- inalloca keyword. Only the past argument may have the ``inalloca``
+ inalloca keyword. Only the last argument may have the ``inalloca``
attribute, and that argument is guaranteed to be passed in memory.
An argument allocation may be used by a call at most once because
not to trap and to be properly aligned. This may only be applied to
the first parameter. This is not a valid attribute for return
values.
+
+``align <n>``
+ This indicates that the pointer value may be assumed by the optimizer to
+ have the specified alignment.
+
+ Note that this attribute has additional semantics when combined with the
+ ``byval`` attribute.
+
+.. _noalias:
+
``noalias``
This indicates that pointer values :ref:`based <pointeraliasing>` on
- the argument or return value do not alias pointer values which are
+ the argument or return value do not alias pointer values that are
not *based* on it, ignoring certain "irrelevant" dependencies. For a
call to the parent function, dependencies between memory references
from before or after the call and from those during the call are
"irrelevant" to the ``noalias`` keyword for the arguments and return
value used in that call. The caller shares the responsibility with
the callee for ensuring that these requirements are met. For further
- details, please see the discussion of the NoAlias response in `alias
- analysis <AliasAnalysis.html#MustMayNo>`_.
+ details, please see the discussion of the NoAlias response in :ref:`alias
+ analysis <Must, May, or No>`.
Note that this definition of ``noalias`` is intentionally similar
to the definition of ``restrict`` in C99 for function arguments,
operands for the :ref:`bitcast instruction <i_bitcast>`. This is not a
valid attribute for return values and can only be applied to one parameter.
+``nonnull``
+ This indicates that the parameter or return pointer is not null. This
+ attribute may only be applied to pointer typed parameters. This is not
+ checked or enforced by LLVM, the caller must ensure that the pointer
+ passed in is non-null, or the callee must ensure that the returned pointer
+ is non-null.
+
+``dereferenceable(<n>)``
+ This indicates that the parameter or return pointer is dereferenceable. This
+ attribute may only be applied to pointer typed parameters. A pointer that
+ is dereferenceable can be loaded from speculatively without a risk of
+ trapping. The number of bytes known to be dereferenceable must be provided
+ in parentheses. It is legal for the number of bytes to be less than the
+ size of the pointee type. The ``nonnull`` attribute does not imply
+ dereferenceability (consider a pointer to one element past the end of an
+ array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in
+ ``addrspace(0)`` (which is the default address space).
+
.. _gc:
Garbage Collector Names
define void @f() gc "name" { ... }
The compiler declares the supported values of *name*. Specifying a
-collector which will cause the compiler to alter its output in order to
+collector will cause the compiler to alter its output in order to
support the named garbage collection algorithm.
.. _prefixdata:
This indicates that the callee function at a call site should be
recognized as a built-in function, even though the function's declaration
uses the ``nobuiltin`` attribute. This is only valid at call sites for
- direct calls to functions which are declared with the ``nobuiltin``
+ direct calls to functions that are declared with the ``nobuiltin``
attribute.
``cold``
This attribute indicates that this function is rarely called. When
inlining this function is desirable (such as the "inline" keyword in
C/C++). It is just a hint; it imposes no requirements on the
inliner.
+``jumptable``
+ This attribute indicates that the function should be added to a
+ jump-instruction table at code-generation time, and that all address-taken
+ references to this function should be replaced with a reference to the
+ appropriate jump-instruction-table function pointer. Note that this creates
+ a new pointer for the original function, which means that code that depends
+ on function-pointer identity can break. So, any function annotated with
+ ``jumptable`` must also be ``unnamed_addr``.
``minsize``
This attribute suggests that optimization passes and code generator
passes make choices that keep the code size of this function as small
- If R is volatile, the result is target-dependent. (Volatile is
supposed to give guarantees which can support ``sig_atomic_t`` in
- C/C++, and may be used for accesses to addresses which do not behave
+ C/C++, and may be used for accesses to addresses that do not behave
like normal memory. It does not generally provide cross-thread
synchronization.)
- Otherwise, if there is no write to the same byte that happens before
address. This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
``seq_cst`` (sequentially consistent)
In addition to the guarantees of ``acq_rel`` (``acquire`` for an
- operation which only reads, ``release`` for an operation which only
+ operation that only reads, ``release`` for an operation that only
writes), there is a global total order on all
sequentially-consistent operations on all addresses, which is
consistent with the *happens-before* partial order and with the
dramatically change results in floating point (e.g. reassociate). This
flag implies all the others.
+.. _uselistorder:
+
+Use-list Order Directives
+-------------------------
+
+Use-list directives encode the in-memory order of each use-list, allowing the
+order to be recreated. ``<order-indexes>`` is a comma-separated list of
+indexes that are assigned to the referenced value's uses. The referenced
+value's use-list is immediately sorted by these indexes.
+
+Use-list directives may appear at function scope or global scope. They are not
+instructions, and have no effect on the semantics of the IR. When they're at
+function scope, they must appear after the terminator of the final basic block.
+
+If basic blocks have their address taken via ``blockaddress()`` expressions,
+``uselistorder_bb`` can be used to reorder their use-lists from outside their
+function's scope.
+
+:Syntax:
+
+::
+
+ uselistorder <ty> <value>, { <order-indexes> }
+ uselistorder_bb @function, %block { <order-indexes> }
+
+:Examples:
+
+::
+
+ define void @foo(i32 %arg1, i32 %arg2) {
+ entry:
+ ; ... instructions ...
+ bb:
+ ; ... instructions ...
+
+ ; At function scope.
+ uselistorder i32 %arg1, { 1, 0, 2 }
+ uselistorder label %bb, { 1, 0 }
+ }
+
+ ; At global scope.
+ uselistorder i32* @global, { 1, 2, 0 }
+ uselistorder i32 7, { 1, 0 }
+ uselistorder i32 (i32) @bar, { 1, 0 }
+ uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 }
+
.. _typesystem:
Type System
< <# elements> x <elementtype> >
The number of elements is a constant integer value larger than 0;
-elementtype may be any integer or floating point type, or a pointer to
-these types. Vectors of size zero are not allowed.
+elementtype may be any integer, floating point or pointer type. Vectors
+of size zero are not allowed.
:Examples:
| ``opaque`` | An opaque type. |
+--------------+-------------------+
+.. _constants:
+
Constants
=========
Poison values are similar to :ref:`undef values <undefvalues>`, however
they also represent the fact that an instruction or constant expression
-which cannot evoke side effects has nevertheless detected a condition
-which results in undefined behavior.
+that cannot evoke side effects has nevertheless detected a condition
+that results in undefined behavior.
There is currently no way of representing a poison value in the IR; they
only exist when produced by operations such as :ref:`add <i_add>` with
successor.
- Dependence is transitive.
-Poison Values have the same behavior as :ref:`undef values <undefvalues>`,
-with the additional affect that any instruction which has a *dependence*
+Poison values have the same behavior as :ref:`undef values <undefvalues>`,
+with the additional effect that any instruction that has a *dependence*
on a poison value has undefined behavior.
Here are some examples:
4 byte gap between the two fields. This gap represents padding which
does not carry useful data and need not be preserved.
+'``noalias``' and '``alias.scope``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``noalias`` and ``alias.scope`` metadata provide the ability to specify generic
+noalias memory-access sets. This means that some collection of memory access
+instructions (loads, stores, memory-accessing calls, etc.) that carry
+``noalias`` metadata can specifically be specified not to alias with some other
+collection of memory access instructions that carry ``alias.scope`` metadata.
+Each type of metadata specifies a list of scopes where each scope has an id and
+a domain. When evaluating an aliasing query, if for some some domain, the set
+of scopes with that domain in one instruction's ``alias.scope`` list is a
+subset of (or qual to) the set of scopes for that domain in another
+instruction's ``noalias`` list, then the two memory accesses are assumed not to
+alias.
+
+The metadata identifying each domain is itself a list containing one or two
+entries. The first entry is the name of the domain. Note that if the name is a
+string then it can be combined accross functions and translation units. A
+self-reference can be used to create globally unique domain names. A
+descriptive string may optionally be provided as a second list entry.
+
+The metadata identifying each scope is also itself a list containing two or
+three entries. The first entry is the name of the scope. Note that if the name
+is a string then it can be combined accross functions and translation units. A
+self-reference can be used to create globally unique scope names. A metadata
+reference to the scope's domain is the second entry. A descriptive string may
+optionally be provided as a third list entry.
+
+For example,
+
+.. code-block:: llvm
+
+ ; Two scope domains:
+ !0 = metadata !{metadata !0}
+ !1 = metadata !{metadata !1}
+
+ ; Some scopes in these domains:
+ !2 = metadata !{metadata !2, metadata !0}
+ !3 = metadata !{metadata !3, metadata !0}
+ !4 = metadata !{metadata !4, metadata !1}
+
+ ; Some scope lists:
+ !5 = metadata !{metadata !4} ; A list containing only scope !4
+ !6 = metadata !{metadata !4, metadata !3, metadata !2}
+ !7 = metadata !{metadata !3}
+
+ ; These two instructions don't alias:
+ %0 = load float* %c, align 4, !alias.scope !5
+ store float %0, float* %arrayidx.i, align 4, !noalias !5
+
+ ; These two instructions also don't alias (for domain !1, the set of scopes
+ ; in the !alias.scope equals that in the !noalias list):
+ %2 = load float* %c, align 4, !alias.scope !5
+ store float %2, float* %arrayidx.i2, align 4, !noalias !6
+
+ ; These two instructions don't alias (for domain !0, the set of scopes in
+ ; the !noalias list is not a superset of, or equal to, the scopes in the
+ ; !alias.scope list):
+ %2 = load float* %c, align 4, !alias.scope !6
+ store float %0, float* %arrayidx.i, align 4, !noalias !7
+
'``fpmath``' Metadata
^^^^^^^^^^^^^^^^^^^^^
'``range``' Metadata
^^^^^^^^^^^^^^^^^^^^
-``range`` metadata may be attached only to loads of integer types. It
-expresses the possible ranges the loaded value is in. The ranges are
-represented with a flattened list of integers. The loaded value is known
-to be in the union of the ranges defined by each consecutive pair. Each
-pair has the following properties:
+``range`` metadata may be attached only to ``load``, ``call`` and ``invoke`` of
+integer types. It expresses the possible ranges the loaded value or the value
+returned by the called function at this call site is in. The ranges are
+represented with a flattened list of integers. The loaded value or the value
+returned is known to be in the union of the ranges defined by each consecutive
+pair. Each pair has the following properties:
- The type must match the type loaded by the instruction.
- The pair ``a,b`` represents the range ``[a,b)``.
%a = load i8* %x, align 1, !range !0 ; Can only be 0 or 1
%b = load i8* %y, align 1, !range !1 ; Can only be 255 (-1), 0 or 1
- %c = load i8* %z, align 1, !range !2 ; Can only be 0, 1, 3, 4 or 5
- %d = load i8* %z, align 1, !range !3 ; Can only be -2, -1, 3, 4 or 5
+ %c = call i8 @foo(), !range !2 ; Can only be 0, 1, 3, 4 or 5
+ %d = invoke i8 @bar() to label %cont
+ unwind label %lpad, !range !3 ; Can only be -2, -1, 3, 4 or 5
...
!0 = metadata !{ i8 0, i8 2 }
!1 = metadata !{ i8 255, i8 2 }
!0 = metadata !{ metadata !0 }
!1 = metadata !{ metadata !1 }
-The loop identifier metadata can be used to specify additional per-loop
-metadata. Any operands after the first operand can be treated as user-defined
-metadata. For example the ``llvm.vectorizer.unroll`` metadata is understood
-by the loop vectorizer to indicate how many times to unroll the loop:
+The loop identifier metadata can be used to specify additional
+per-loop metadata. Any operands after the first operand can be treated
+as user-defined metadata. For example the ``llvm.loop.unroll.count``
+suggests an unroll factor to the loop unroller:
.. code-block:: llvm
br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
...
!0 = metadata !{ metadata !0, metadata !1 }
- !1 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 2 }
+ !1 = metadata !{ metadata !"llvm.loop.unroll.count", i32 4 }
+
+'``llvm.loop.vectorize``' and '``llvm.loop.interleave``'
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Metadata prefixed with ``llvm.loop.vectorize`` or ``llvm.loop.interleave`` are
+used to control per-loop vectorization and interleaving parameters such as
+vectorization width and interleave count. These metadata should be used in
+conjunction with ``llvm.loop`` loop identification metadata. The
+``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
+optimization hints and the optimizer will only interleave and vectorize loops if
+it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata
+which contains information about loop-carried memory dependencies can be helpful
+in determining the safety of these transformations.
+
+'``llvm.loop.interleave.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests an interleave count to the loop interleaver.
+The first operand is the string ``llvm.loop.interleave.count`` and the
+second operand is an integer specifying the interleave count. For
+example:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.interleave.count", i32 4 }
+
+Note that setting ``llvm.loop.interleave.count`` to 1 disables interleaving
+multiple iterations of the loop. If ``llvm.loop.interleave.count`` is set to 0
+then the interleave count will be determined automatically.
+
+'``llvm.loop.vectorize.enable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata selectively enables or disables vectorization for the loop. The
+first operand is the string ``llvm.loop.vectorize.enable`` and the second operand
+is a bit. If the bit operand value is 1 vectorization is enabled. A value of
+0 disables vectorization:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 0 }
+ !1 = metadata !{ metadata !"llvm.loop.vectorize.enable", i1 1 }
+
+'``llvm.loop.vectorize.width``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata sets the target width of the vectorizer. The first
+operand is the string ``llvm.loop.vectorize.width`` and the second
+operand is an integer specifying the width. For example:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.vectorize.width", i32 4 }
+
+Note that setting ``llvm.loop.vectorize.width`` to 1 disables
+vectorization of the loop. If ``llvm.loop.vectorize.width`` is set to
+0 or if the loop does not have this metadata the width will be
+determined automatically.
+
+'``llvm.loop.unroll``'
+^^^^^^^^^^^^^^^^^^^^^^
+
+Metadata prefixed with ``llvm.loop.unroll`` are loop unrolling
+optimization hints such as the unroll factor. ``llvm.loop.unroll``
+metadata should be used in conjunction with ``llvm.loop`` loop
+identification metadata. The ``llvm.loop.unroll`` metadata are only
+optimization hints and the unrolling will only be performed if the
+optimizer believes it is safe to do so.
+
+'``llvm.loop.unroll.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata suggests an unroll factor to the loop unroller. The
+first operand is the string ``llvm.loop.unroll.count`` and the second
+operand is a positive integer specifying the unroll factor. For
+example:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.unroll.count", i32 4 }
+
+If the trip count of the loop is less than the unroll count the loop
+will be partially unrolled.
+
+'``llvm.loop.unroll.disable``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata either disables loop unrolling. The metadata has a single operand
+which is the string ``llvm.loop.unroll.disable``. For example:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.unroll.disable" }
+
+'``llvm.loop.unroll.full``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata either suggests that the loop should be unrolled fully. The
+metadata has a single operand which is the string ``llvm.loop.unroll.disable``.
+For example:
+
+.. code-block:: llvm
+
+ !0 = metadata !{ metadata !"llvm.loop.unroll.full" }
'``llvm.mem``'
^^^^^^^^^^^^^^^
'``llvm.mem.parallel_loop_access``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-For a loop to be parallel, in addition to using
-the ``llvm.loop`` metadata to mark the loop latch branch instruction,
-also all of the memory accessing instructions in the loop body need to be
-marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
-is at least one memory accessing instruction not marked with the metadata,
-the loop must be considered a sequential loop. This causes parallel loops to be
-converted to sequential loops due to optimization passes that are unaware of
-the parallel semantics and that insert new memory instructions to the loop
-body.
+The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,
+or metadata containing a list of loop identifiers for nested loops.
+The metadata is attached to memory accessing instructions and denotes that
+no loop carried memory dependence exist between it and other instructions denoted
+with the same loop identifier.
+
+Precisely, given two instructions ``m1`` and ``m2`` that both have the
+``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
+set of loops associated with that metadata, respectively, then there is no loop
+carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and
+``L2``.
+
+As a special case, if all memory accessing instructions in a loop have
+``llvm.mem.parallel_loop_access`` metadata that refers to that loop, then the
+loop has no loop carried memory dependences and is considered to be a parallel
+loop.
+
+Note that if not all memory access instructions have such metadata referring to
+the loop, then the loop is considered not being trivially parallel. Additional
+memory dependence analysis is required to make that determination. As a fail
+safe mechanism, this causes loops that were originally parallel to be considered
+sequential (if optimization passes that are unaware of the parallel semantics
+insert new memory instructions into the loop body).
Example of a loop that is considered parallel due to its correct use of
both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
!1 = metadata !{ metadata !1 } ; an identifier for the inner loop
!2 = metadata !{ metadata !2 } ; an identifier for the outer loop
-'``llvm.vectorizer``'
-^^^^^^^^^^^^^^^^^^^^^
-
-Metadata prefixed with ``llvm.vectorizer`` is used to control per-loop
-vectorization parameters such as vectorization factor and unroll factor.
-
-``llvm.vectorizer`` metadata should be used in conjunction with ``llvm.loop``
-loop identification metadata.
-
-'``llvm.vectorizer.unroll``' Metadata
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-This metadata instructs the loop vectorizer to unroll the specified
-loop exactly ``N`` times.
-
-The first operand is the string ``llvm.vectorizer.unroll`` and the second
-operand is an integer specifying the unroll factor. For example:
-
-.. code-block:: llvm
-
- !0 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 4 }
-
-Note that setting ``llvm.vectorizer.unroll`` to 1 disables unrolling of the
-loop.
-
-If ``llvm.vectorizer.unroll`` is set to 0 then the amount of unrolling will be
-determined automatically.
-
-'``llvm.vectorizer.width``' Metadata
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-This metadata sets the target width of the vectorizer to ``N``. Without
-this metadata, the vectorizer will choose a width automatically.
-Regardless of this metadata, the vectorizer will only vectorize loops if
-it believes it is valid to do so.
-
-The first operand is the string ``llvm.vectorizer.width`` and the second
-operand is an integer specifying the width. For example:
-
-.. code-block:: llvm
-
- !0 = metadata !{ metadata !"llvm.vectorizer.width", i32 4 }
-
-Note that setting ``llvm.vectorizer.width`` to 1 disables vectorization of the
-loop.
-
-If ``llvm.vectorizer.width`` is set to 0 then the width will be determined
-automatically.
-
Module Flags Metadata
=====================
linker, or an option that is reserved by the target specific assembly writer or
object file emitter. No other aspect of these options is defined by the IR.
+C type width Module Flags Metadata
+----------------------------------
+
+The ARM backend emits a section into each generated object file describing the
+options that it was compiled with (in a compiler-independent way) to prevent
+linking incompatible objects, and to allow automatic library selection. Some
+of these options are not visible at the IR level, namely wchar_t width and enum
+width.
+
+To pass this information to the backend, these options are encoded in module
+flags metadata, using the following key-value pairs:
+
+.. list-table::
+ :header-rows: 1
+ :widths: 30 70
+
+ * - Key
+ - Value
+
+ * - short_wchar
+ - * 0 --- sizeof(wchar_t) == 4
+ * 1 --- sizeof(wchar_t) == 2
+
+ * - short_enum
+ - * 0 --- Enums are at least as large as an ``int``.
+ * 1 --- Enums are stored in the smallest integer type which can
+ represent all of its values.
+
+For example, the following metadata section specifies that the module was
+compiled with a ``wchar_t`` width of 4 bytes, and the underlying type of an
+enum is the smallest type which can represent all of its values::
+
+ !llvm.module.flags = !{!0, !1}
+ !0 = metadata !{i32 1, metadata !"short_wchar", i32 1}
+ !1 = metadata !{i32 1, metadata !"short_enum", i32 0}
+
.. _intrinsicglobalvariables:
Intrinsic Global Variables
.. code-block:: llvm
- %0 = type { i32, void ()* }
- @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor }]
+ %0 = type { i32, void ()*, i8* }
+ @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor, i8* @data }]
The ``@llvm.global_ctors`` array contains a list of constructor
-functions and associated priorities. The functions referenced by this
-array will be called in ascending order of priority (i.e. lowest first)
-when the module is loaded. The order of functions with the same priority
-is not defined.
+functions, priorities, and an optional associated global or function.
+The functions referenced by this array will be called in ascending order
+of priority (i.e. lowest first) when the module is loaded. The order of
+functions with the same priority is not defined.
+
+If the third field is present, non-null, and points to a global variable
+or function, the initializer function will only run if the associated
+data from the current module is not discarded.
.. _llvmglobaldtors:
.. code-block:: llvm
- %0 = type { i32, void ()* }
- @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor }]
+ %0 = type { i32, void ()*, i8* }
+ @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor, i8* @data }]
+
+The ``@llvm.global_dtors`` array contains a list of destructor
+functions, priorities, and an optional associated global or function.
+The functions referenced by this array will be called in descending
+order of priority (i.e. highest first) when the module is unloaded. The
+order of functions with the same priority is not defined.
-The ``@llvm.global_dtors`` array contains a list of destructor functions
-and associated priorities. The functions referenced by this array will
-be called in descending order of priority (i.e. highest first) when the
-module is loaded. The order of functions with the same priority is not
-defined.
+If the third field is present, non-null, and points to a global variable
+or function, the destructor function will only run if the associated
+data from the current module is not discarded.
Instruction Reference
=====================
.. code-block:: llvm
%retval = invoke i32 @Test(i32 15) to label %Continue
- unwind label %TestCleanup ; {i32}:retval set
+ unwind label %TestCleanup ; i32:retval set
%retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue
- unwind label %TestCleanup ; {i32}:retval set
+ unwind label %TestCleanup ; i32:retval set
.. _i_resume:
::
- <result> = add <ty> <op1>, <op2> ; yields {ty}:result
- <result> = add nuw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = add nsw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = add nuw nsw <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = add <ty> <op1>, <op2> ; yields ty:result
+ <result> = add nuw <ty> <op1>, <op2> ; yields ty:result
+ <result> = add nsw <ty> <op1>, <op2> ; yields ty:result
+ <result> = add nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = add i32 4, %var ; yields {i32}:result = 4 + %var
+ <result> = add i32 4, %var ; yields i32:result = 4 + %var
.. _i_fadd:
::
- <result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = fadd [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = fadd float 4.0, %var ; yields {float}:result = 4.0 + %var
+ <result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var
'``sub``' Instruction
^^^^^^^^^^^^^^^^^^^^^
::
- <result> = sub <ty> <op1>, <op2> ; yields {ty}:result
- <result> = sub nuw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = sub nsw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = sub nuw nsw <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = sub <ty> <op1>, <op2> ; yields ty:result
+ <result> = sub nuw <ty> <op1>, <op2> ; yields ty:result
+ <result> = sub nsw <ty> <op1>, <op2> ; yields ty:result
+ <result> = sub nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = sub i32 4, %var ; yields {i32}:result = 4 - %var
- <result> = sub i32 0, %val ; yields {i32}:result = -%var
+ <result> = sub i32 4, %var ; yields i32:result = 4 - %var
+ <result> = sub i32 0, %val ; yields i32:result = -%var
.. _i_fsub:
::
- <result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = fsub [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = fsub float 4.0, %var ; yields {float}:result = 4.0 - %var
- <result> = fsub float -0.0, %val ; yields {float}:result = -%var
+ <result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var
+ <result> = fsub float -0.0, %val ; yields float:result = -%var
'``mul``' Instruction
^^^^^^^^^^^^^^^^^^^^^
::
- <result> = mul <ty> <op1>, <op2> ; yields {ty}:result
- <result> = mul nuw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = mul nsw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = mul nuw nsw <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = mul <ty> <op1>, <op2> ; yields ty:result
+ <result> = mul nuw <ty> <op1>, <op2> ; yields ty:result
+ <result> = mul nsw <ty> <op1>, <op2> ; yields ty:result
+ <result> = mul nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = mul i32 4, %var ; yields {i32}:result = 4 * %var
+ <result> = mul i32 4, %var ; yields i32:result = 4 * %var
.. _i_fmul:
::
- <result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = fmul [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = fmul float 4.0, %var ; yields {float}:result = 4.0 * %var
+ <result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var
'``udiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^
::
- <result> = udiv <ty> <op1>, <op2> ; yields {ty}:result
- <result> = udiv exact <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = udiv <ty> <op1>, <op2> ; yields ty:result
+ <result> = udiv exact <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = udiv i32 4, %var ; yields {i32}:result = 4 / %var
+ <result> = udiv i32 4, %var ; yields i32:result = 4 / %var
'``sdiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^
::
- <result> = sdiv <ty> <op1>, <op2> ; yields {ty}:result
- <result> = sdiv exact <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = sdiv <ty> <op1>, <op2> ; yields ty:result
+ <result> = sdiv exact <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = sdiv i32 4, %var ; yields {i32}:result = 4 / %var
+ <result> = sdiv i32 4, %var ; yields i32:result = 4 / %var
.. _i_fdiv:
::
- <result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = fdiv [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = fdiv float 4.0, %var ; yields {float}:result = 4.0 / %var
+ <result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var
'``urem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^
::
- <result> = urem <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = urem <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = urem i32 4, %var ; yields {i32}:result = 4 % %var
+ <result> = urem i32 4, %var ; yields i32:result = 4 % %var
'``srem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^
::
- <result> = srem <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = srem <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = srem i32 4, %var ; yields {i32}:result = 4 % %var
+ <result> = srem i32 4, %var ; yields i32:result = 4 % %var
.. _i_frem:
::
- <result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = frem [fast-math flags]* <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = frem float 4.0, %var ; yields {float}:result = 4.0 % %var
+ <result> = frem float 4.0, %var ; yields float:result = 4.0 % %var
.. _bitwiseops:
::
- <result> = shl <ty> <op1>, <op2> ; yields {ty}:result
- <result> = shl nuw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = shl nsw <ty> <op1>, <op2> ; yields {ty}:result
- <result> = shl nuw nsw <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = shl <ty> <op1>, <op2> ; yields ty:result
+ <result> = shl nuw <ty> <op1>, <op2> ; yields ty:result
+ <result> = shl nsw <ty> <op1>, <op2> ; yields ty:result
+ <result> = shl nuw nsw <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = shl i32 4, %var ; yields {i32}: 4 << %var
- <result> = shl i32 4, 2 ; yields {i32}: 16
- <result> = shl i32 1, 10 ; yields {i32}: 1024
+ <result> = shl i32 4, %var ; yields i32: 4 << %var
+ <result> = shl i32 4, 2 ; yields i32: 16
+ <result> = shl i32 1, 10 ; yields i32: 1024
<result> = shl i32 1, 32 ; undefined
<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>
::
- <result> = lshr <ty> <op1>, <op2> ; yields {ty}:result
- <result> = lshr exact <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = lshr <ty> <op1>, <op2> ; yields ty:result
+ <result> = lshr exact <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = lshr i32 4, 1 ; yields {i32}:result = 2
- <result> = lshr i32 4, 2 ; yields {i32}:result = 1
- <result> = lshr i8 4, 3 ; yields {i8}:result = 0
- <result> = lshr i8 -2, 1 ; yields {i8}:result = 0x7F
+ <result> = lshr i32 4, 1 ; yields i32:result = 2
+ <result> = lshr i32 4, 2 ; yields i32:result = 1
+ <result> = lshr i8 4, 3 ; yields i8:result = 0
+ <result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
<result> = lshr i32 1, 32 ; undefined
<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
::
- <result> = ashr <ty> <op1>, <op2> ; yields {ty}:result
- <result> = ashr exact <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = ashr <ty> <op1>, <op2> ; yields ty:result
+ <result> = ashr exact <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = ashr i32 4, 1 ; yields {i32}:result = 2
- <result> = ashr i32 4, 2 ; yields {i32}:result = 1
- <result> = ashr i8 4, 3 ; yields {i8}:result = 0
- <result> = ashr i8 -2, 1 ; yields {i8}:result = -1
+ <result> = ashr i32 4, 1 ; yields i32:result = 2
+ <result> = ashr i32 4, 2 ; yields i32:result = 1
+ <result> = ashr i8 4, 3 ; yields i8:result = 0
+ <result> = ashr i8 -2, 1 ; yields i8:result = -1
<result> = ashr i32 1, 32 ; undefined
<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>
::
- <result> = and <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = and <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = and i32 4, %var ; yields {i32}:result = 4 & %var
- <result> = and i32 15, 40 ; yields {i32}:result = 8
- <result> = and i32 4, 8 ; yields {i32}:result = 0
+ <result> = and i32 4, %var ; yields i32:result = 4 & %var
+ <result> = and i32 15, 40 ; yields i32:result = 8
+ <result> = and i32 4, 8 ; yields i32:result = 0
'``or``' Instruction
^^^^^^^^^^^^^^^^^^^^
::
- <result> = or <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = or <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
::
- <result> = or i32 4, %var ; yields {i32}:result = 4 | %var
- <result> = or i32 15, 40 ; yields {i32}:result = 47
- <result> = or i32 4, 8 ; yields {i32}:result = 12
+ <result> = or i32 4, %var ; yields i32:result = 4 | %var
+ <result> = or i32 15, 40 ; yields i32:result = 47
+ <result> = or i32 4, 8 ; yields i32:result = 12
'``xor``' Instruction
^^^^^^^^^^^^^^^^^^^^^
::
- <result> = xor <ty> <op1>, <op2> ; yields {ty}:result
+ <result> = xor <ty> <op1>, <op2> ; yields ty:result
Overview:
"""""""""
.. code-block:: llvm
- <result> = xor i32 4, %var ; yields {i32}:result = 4 ^ %var
- <result> = xor i32 15, 40 ; yields {i32}:result = 39
- <result> = xor i32 4, 8 ; yields {i32}:result = 12
- <result> = xor i32 %V, -1 ; yields {i32}:result = ~%V
+ <result> = xor i32 4, %var ; yields i32:result = 4 ^ %var
+ <result> = xor i32 15, 40 ; yields i32:result = 39
+ <result> = xor i32 4, 8 ; yields i32:result = 12
+ <result> = xor i32 %V, -1 ; yields i32:result = ~%V
Vector Operations
-----------------
::
- <result> = extractelement <n x <ty>> <val>, i32 <idx> ; yields <ty>
+ <result> = extractelement <n x <ty>> <val>, <ty2> <idx> ; yields <ty>
Overview:
"""""""""
The first operand of an '``extractelement``' instruction is a value of
:ref:`vector <t_vector>` type. The second operand is an index indicating
the position from which to extract the element. The index may be a
-variable.
+variable of any integer type.
Semantics:
""""""""""
::
- <result> = insertelement <n x <ty>> <val>, <ty> <elt>, i32 <idx> ; yields <n x <ty>>
+ <result> = insertelement <n x <ty>> <val>, <ty> <elt>, <ty2> <idx> ; yields <n x <ty>>
Overview:
"""""""""
:ref:`vector <t_vector>` type. The second operand is a scalar value whose
type must equal the element type of the first operand. The third operand
is an index indicating the position at which to insert the value. The
-index may be a variable.
+index may be a variable of any integer type.
Semantics:
""""""""""
::
- <result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] ; yields {type*}:result
+ <result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>] ; yields type*:result
Overview:
"""""""""
appropriate type to the program. If "NumElements" is specified, it is
the number of elements allocated, otherwise "NumElements" is defaulted
to be one. If a constant alignment is specified, the value result of the
-allocation is guaranteed to be aligned to at least that boundary. If not
-specified, or if zero, the target can choose to align the allocation on
-any convenient boundary compatible with the type.
+allocation is guaranteed to be aligned to at least that boundary. The
+alignment may not be greater than ``1 << 29``. If not specified, or if
+zero, the target can choose to align the allocation on any convenient
+boundary compatible with the type.
'``type``' may be any sized type.
.. code-block:: llvm
- %ptr = alloca i32 ; yields {i32*}:ptr
- %ptr = alloca i32, i32 4 ; yields {i32*}:ptr
- %ptr = alloca i32, i32 4, align 1024 ; yields {i32*}:ptr
- %ptr = alloca i32, align 1024 ; yields {i32*}:ptr
+ %ptr = alloca i32 ; yields i32*:ptr
+ %ptr = alloca i32, i32 4 ; yields i32*:ptr
+ %ptr = alloca i32, i32 4, align 1024 ; yields i32*:ptr
+ %ptr = alloca i32, align 1024 ; yields i32*:ptr
.. _i_load:
alignment for the target. It is the responsibility of the code emitter
to ensure that the alignment information is correct. Overestimating the
alignment results in undefined behavior. Underestimating the alignment
-may produce less efficient code. An alignment of 1 is always safe.
+may produce less efficient code. An alignment of 1 is always safe. The
+maximum possible alignment is ``1 << 29``.
The optional ``!nontemporal`` metadata must reference a single
metadata name ``<index>`` corresponding to a metadata node with one
.. code-block:: llvm
- %ptr = alloca i32 ; yields {i32*}:ptr
- store i32 3, i32* %ptr ; yields {void}
- %val = load i32* %ptr ; yields {i32}:val = i32 3
+ %ptr = alloca i32 ; yields i32*:ptr
+ store i32 3, i32* %ptr ; yields void
+ %val = load i32* %ptr ; yields i32:val = i32 3
.. _i_store:
::
- store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields {void}
- store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> ; yields {void}
+ store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>] ; yields void
+ store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> ; yields void
Overview:
"""""""""
to ensure that the alignment information is correct. Overestimating the
alignment results in undefined behavior. Underestimating the
alignment may produce less efficient code. An alignment of 1 is always
-safe.
+safe. The maximum possible alignment is ``1 << 29``.
The optional ``!nontemporal`` metadata must reference a single metadata
name ``<index>`` corresponding to a metadata node with one ``i32`` entry of
.. code-block:: llvm
- %ptr = alloca i32 ; yields {i32*}:ptr
- store i32 3, i32* %ptr ; yields {void}
- %val = load i32* %ptr ; yields {i32}:val = i32 3
+ %ptr = alloca i32 ; yields i32*:ptr
+ store i32 3, i32* %ptr ; yields void
+ %val = load i32* %ptr ; yields i32:val = i32 3
.. _i_fence:
::
- fence [singlethread] <ordering> ; yields {void}
+ fence [singlethread] <ordering> ; yields void
Overview:
"""""""""
.. code-block:: llvm
- fence acquire ; yields {void}
- fence singlethread seq_cst ; yields {void}
+ fence acquire ; yields void
+ fence singlethread seq_cst ; yields void
.. _i_cmpxchg:
::
- cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <success ordering> <failure ordering> ; yields {ty}
+ cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <success ordering> <failure ordering> ; yields { ty, i1 }
Overview:
"""""""""
The '``cmpxchg``' instruction is used to atomically modify memory. It
loads a value in memory and compares it to a given value. If they are
-equal, it stores a new value into the memory.
+equal, it tries to store a new value into the memory.
Arguments:
""""""""""
other :ref:`volatile operations <volatile>`.
The success and failure :ref:`ordering <ordering>` arguments specify how this
-``cmpxchg`` synchronizes with other atomic operations. The both ordering
-parameters must be at least ``monotonic``, the ordering constraint on failure
-must be no stronger than that on success, and the failure ordering cannot be
-either ``release`` or ``acq_rel``.
+``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters
+must be at least ``monotonic``, the ordering constraint on failure must be no
+stronger than that on success, and the failure ordering cannot be either
+``release`` or ``acq_rel``.
The optional "``singlethread``" argument declares that the ``cmpxchg``
is only atomic with respect to code (usually signal handlers) running in
Semantics:
""""""""""
-The contents of memory at the location specified by the '``<pointer>``'
-operand is read and compared to '``<cmp>``'; if the read value is the
-equal, '``<new>``' is written. The original value at the location is
-returned.
+The contents of memory at the location specified by the '``<pointer>``' operand
+is read and compared to '``<cmp>``'; if the read value is the equal, the
+'``<new>``' is written. The original value at the location is returned, together
+with a flag indicating success (true) or failure (false).
+
+If the cmpxchg operation is marked as ``weak`` then a spurious failure is
+permitted: the operation may not write ``<new>`` even if the comparison
+matched.
+
+If the cmpxchg operation is strong (the default), the i1 value is 1 if and only
+if the value loaded equals ``cmp``.
A successful ``cmpxchg`` is a read-modify-write instruction for the purpose of
identifying release sequences. A failed ``cmpxchg`` is equivalent to an atomic
.. code-block:: llvm
entry:
- %orig = atomic load i32* %ptr unordered ; yields {i32}
+ %orig = atomic load i32* %ptr unordered ; yields i32
br label %loop
loop:
%cmp = phi i32 [ %orig, %entry ], [%old, %loop]
%squared = mul i32 %cmp, %cmp
- %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields {i32}
- %success = icmp eq i32 %cmp, %old
+ %val_success = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 }
+ %value_loaded = extractvalue { i32, i1 } %val_success, 0
+ %success = extractvalue { i32, i1 } %val_success, 1
br i1 %success, label %done, label %loop
done:
::
- atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> ; yields {ty}
+ atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> ; yields ty
Overview:
"""""""""
.. code-block:: llvm
- %old = atomicrmw add i32* %ptr, i32 1 acquire ; yields {i32}
+ %old = atomicrmw add i32* %ptr, i32 1 acquire ; yields i32
.. _i_getelementptr:
::
- <result> = icmp <cond> <ty> <op1>, <op2> ; yields {i1} or {<N x i1>}:result
+ <result> = icmp <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
Overview:
"""""""""
::
- <result> = fcmp <cond> <ty> <op1>, <op2> ; yields {i1} or {<N x i1>}:result
+ <result> = fcmp <cond> <ty> <op1>, <op2> ; yields i1 or <N x i1>:result
Overview:
"""""""""
"""""""""
The '``select``' instruction is used to choose one value based on a
-condition, without branching.
+condition, without IR-level branching.
Arguments:
""""""""""
::
- <result> = [tail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
+ <result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
Overview:
"""""""""
This instruction requires several arguments:
-#. The optional "tail" marker indicates that the callee function does
- not access any allocas or varargs in the caller. Note that calls may
- be marked "tail" even if they do not occur before a
- :ref:`ret <i_ret>` instruction. If the "tail" marker is present, the
- function call is eligible for tail call optimization, but `might not
- in fact be optimized into a jump <CodeGenerator.html#tailcallopt>`_.
- The code generator may optimize calls marked "tail" with either 1)
- automatic `sibling call
- optimization <CodeGenerator.html#sibcallopt>`_ when the caller and
- callee have matching signatures, or 2) forced tail call optimization
- when the following extra requirements are met:
+#. The optional ``tail`` and ``musttail`` markers indicate that the optimizers
+ should perform tail call optimization. The ``tail`` marker is a hint that
+ `can be ignored <CodeGenerator.html#sibcallopt>`_. The ``musttail`` marker
+ means that the call must be tail call optimized in order for the program to
+ be correct. The ``musttail`` marker provides these guarantees:
+
+ #. The call will not cause unbounded stack growth if it is part of a
+ recursive cycle in the call graph.
+ #. Arguments with the :ref:`inalloca <attr_inalloca>` attribute are
+ forwarded in place.
+
+ Both markers imply that the callee does not access allocas or varargs from
+ the caller. Calls marked ``musttail`` must obey the following additional
+ rules:
+
+ - The call must immediately precede a :ref:`ret <i_ret>` instruction,
+ or a pointer bitcast followed by a ret instruction.
+ - The ret instruction must return the (possibly bitcasted) value
+ produced by the call or void.
+ - The caller and callee prototypes must match. Pointer types of
+ parameters or return types may differ in pointee type, but not
+ in address space.
+ - The calling conventions of the caller and callee must match.
+ - All ABI-impacting function attributes, such as sret, byval, inreg,
+ returned, and inalloca, must match.
+ - The callee must be varargs iff the caller is varargs. Bitcasting a
+ non-varargs function to the appropriate varargs type is legal so
+ long as the non-varargs prefixes obey the other rules.
+
+ Tail call optimization for calls marked ``tail`` is guaranteed to occur if
+ the following conditions are met:
- Caller and callee both have the calling convention ``fastcc``.
- The call is in tail position (ret immediately follows call and ret
uses value of call or is void).
- Option ``-tailcallopt`` is enabled, or
``llvm::GuaranteedTailCallOpt`` is ``true``.
- - `Platform specific constraints are
+ - `Platform-specific constraints are
met. <CodeGenerator.html#tailcallopt>`_
#. The optional "cconv" marker indicates which :ref:`calling
call void %foo(i8 97 signext)
%struct.A = type { i32, i8 }
- %r = call %struct.A @foo() ; yields { 32, i8 }
+ %r = call %struct.A @foo() ; yields { i32, i8 }
%gr = extractvalue %struct.A %r, 0 ; yields i32
%gr1 = extractvalue %struct.A %r, 1 ; yields i8
%Z = call void @foo() noreturn ; indicates that %foo never returns normally
other aggressive transformations, so the value returned may not be that
of the obvious source-language caller.
+.. _int_read_register:
+.. _int_write_register:
+
+'``llvm.read_register``' and '``llvm.write_register``' Intrinsics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare i32 @llvm.read_register.i32(metadata)
+ declare i64 @llvm.read_register.i64(metadata)
+ declare void @llvm.write_register.i32(metadata, i32 @value)
+ declare void @llvm.write_register.i64(metadata, i64 @value)
+ !0 = metadata !{metadata !"sp\00"}
+
+Overview:
+"""""""""
+
+The '``llvm.read_register``' and '``llvm.write_register``' intrinsics
+provides access to the named register. The register must be valid on
+the architecture being compiled to. The type needs to be compatible
+with the register being read.
+
+Semantics:
+""""""""""
+
+The '``llvm.read_register``' intrinsic returns the current value of the
+register, where possible. The '``llvm.write_register``' intrinsic sets
+the current value of the register, where possible.
+
+This is useful to implement named register global variables that need
+to always be mapped to a specific register, as is common practice on
+bare-metal programs including OS kernels.
+
+The compiler doesn't check for register availability or use of the used
+register in surrounding code, including inline assembly. Because of that,
+allocatable registers are not supported.
+
+Warning: So far it only works with the stack pointer on selected
+architectures (ARM, AArch64, PowerPC and x86_64). Significant amount of
+work is needed to support other registers and even more so, allocatable
+registers.
+
.. _int_stacksave:
'``llvm.stacksave``' Intrinsic
Overview:
"""""""""
-The '``llvm.clear_cache``' intrinsic provides access to the systemcall
-that flushes the cache at the range specified. Some targets need this
-to specifically flush the instruction cache when executable data changes
-in memory (self-modifying code). Other targets have unified intruction
-and data cache, so they don't need any calls.
+The '``llvm.clear_cache``' intrinsic ensures visibility of modifications
+in the specified range to the execution unit of the processor. On
+targets with non-unified instruction and data cache, the implementation
+flushes the instruction cache.
Semantics:
""""""""""
-When directly supported, this intrinsic will either return a call to
-the appropriate cache clearing system call (usually ``__clear_cache``)
-when the caches are not unified (ARM, Mips) or just remove the call
-altogether when they are (ex. x86_64). Some targets can lower these
-directly into special instructions, if they have it.
+On platforms with coherent instruction and data caches (e.g. x86), this
+intrinsic is a nop. On platforms with non-coherent instruction and data
+cache (e.g. ARM, MIPS), the intrinsic is lowered either to appropriate
+instructions or a system call, if cache flushing requires special
+privileges.
-The default behaviour is to emit a call to ``__clear_cache``, so in
-case a target doesn't support it, the user gets a linker error rather
-than a compiler internal error. It also provides a work around to
-the user (implement an empty function called ``__clear_cache``) while
-LLVM doesn't implement it in the target's back-end.
+The default behavior is to emit a call to ``__clear_cache`` from the run
+time library.
-Please note that the caller is responsible for ensuring the cache
-is actually cleared. This is most important in targets that don't
-need to flush the cache directly (ex. x86_64) and could potentially
-still execute old instructions while the cache is not cleared. LLVM
-will *not* insert nops or busy-wait sequences.
+This instrinsic does *not* empty the instruction pipeline. Modifications
+of the current function are outside the scope of the intrinsic.
Standard C Library Intrinsics
-----------------------------
.. code-block:: llvm
- %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields {float}:r2 = (a * b) + c
+ %r2 = call float @llvm.fmuladd.f32(float %a, float %b, float %c) ; yields float:r2 = (a * b) + c
Half Precision Floating Point Intrinsics
----------------------------------------
::
- declare i16 @llvm.convert.to.fp16(f32 %a)
+ declare i16 @llvm.convert.to.fp16.f32(float %a)
+ declare i16 @llvm.convert.to.fp16.f64(double %a)
Overview:
"""""""""
-The '``llvm.convert.to.fp16``' intrinsic function performs a conversion
-from single precision floating point format to half precision floating
-point format.
+The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
+conventional floating point type to half precision floating point format.
Arguments:
""""""""""
Semantics:
""""""""""
-The '``llvm.convert.to.fp16``' intrinsic function performs a conversion
-from single precision floating point format to half precision floating
-point format. The return value is an ``i16`` which contains the
-converted number.
+The '``llvm.convert.to.fp16``' intrinsic function performs a conversion from a
+conventional floating point format to half precision floating point format. The
+return value is an ``i16`` which contains the converted number.
Examples:
"""""""""
.. code-block:: llvm
- %res = call i16 @llvm.convert.to.fp16(f32 %a)
+ %res = call i16 @llvm.convert.to.fp16.f32(float %a)
store i16 %res, i16* @x, align 2
.. _int_convert_from_fp16:
::
- declare f32 @llvm.convert.from.fp16(i16 %a)
+ declare float @llvm.convert.from.fp16.f32(i16 %a)
+ declare double @llvm.convert.from.fp16.f64(i16 %a)
Overview:
"""""""""
.. code-block:: llvm
%a = load i16* @x, align 2
- %res = call f32 @llvm.convert.from.fp16(i16 %a)
+ %res = call float @llvm.convert.from.fp16(i16 %a)
Debugger Intrinsics
-------------------
""""""""""
On some architectures the address of the code to be executed needs to be
-different to the address where the trampoline is actually stored. This
+different than the address where the trampoline is actually stored. This
intrinsic returns the executable address corresponding to ``tramp``
after performing the required machine specific adjustments. The pointer
returned can then be :ref:`bitcast and executed <int_trampoline>`.
Memory Use Markers
------------------
-This class of intrinsics exists to information about the lifetime of
+This class of intrinsics provides information about the lifetime of
memory objects and ranges where variables are immutable.
.. _int_lifestart:
This intrinsic is lowered to the ``val``.
+'``llvm.assume``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare void @llvm.assume(i1 %cond)
+
+Overview:
+"""""""""
+
+The ``llvm.assume`` allows the optimizer to assume that the provided
+condition is true. This information can then be used in simplifying other parts
+of the code.
+
+Arguments:
+""""""""""
+
+The condition which the optimizer may assume is always true.
+
+Semantics:
+""""""""""
+
+The intrinsic allows the optimizer to assume that the provided condition is
+always true whenever the control flow reaches the intrinsic call. No code is
+generated for this intrinsic, and instructions that contribute only to the
+provided condition are not used for code generation. If the condition is
+violated during execution, the behavior is undefined.
+
+Please note that optimizer might limit the transformations performed on values
+used by the ``llvm.assume`` intrinsic in order to preserve the instructions
+only used to form the intrinsic's input argument. This might prove undesirable
+if the extra information provided by the ``llvm.assume`` intrinsic does cause
+sufficient overall improvement in code quality. For this reason,
+``llvm.assume`` should not be used to document basic mathematical invariants
+that the optimizer can otherwise deduce or facts that are of little use to the
+optimizer.
+
'``llvm.donothing``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^