X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FLangRef.html;h=23911e2540f43504fd172e0c536f61e35f9d8141;hb=120bc6da2fe9b6706de2f5ee890a844cb91ae2d5;hp=a664f328d5d54596850162c028d22f03c2038306;hpb=478921b11a9e9f6ffac0c4eb96c4275e2b2ce3cd;p=oota-llvm.git diff --git a/docs/LangRef.html b/docs/LangRef.html index a664f328d5d..23911e2540f 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,865 +1,1307 @@ - -
LLVM Language Reference Manual | -
Written by Chris Lattner and Vikram Adve
- - +
-Abstract - |
- This document is a reference manual for the LLVM assembly language. LLVM is - an SSA based representation that provides type safety, low-level operations, - flexibility, and the capability of representing 'all' high-level languages - cleanly. It is the common code representation used throughout all phases of - the LLVM compilation strategy. -- - - +
This document is a reference manual for the LLVM assembly language. +LLVM is an SSA based representation that provides type safety, +low-level operations, flexibility, and the capability of representing +'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy.
+-Introduction - |
- -The LLVM representation aims to be a light-weight and low-level while being -expressive, typed, and extensible at the same time. It aims to be a "universal -IR" of sorts, by being at a low enough level that high-level ideas may be -cleanly mapped to it (similar to how microprocessors are "universal IR's", -allowing many source languages to be mapped to them). By providing type -information, LLVM can be used as the target of optimizations: for example, -through pointer analysis, it can be proven that a C automatic variable is never -accessed outside of the current function... allowing it to be promoted to a -simple SSA value instead of a memory location.
+
The LLVM code representation is designed to be used in three +different forms: as an in-memory compiler IR, as an on-disk bytecode +representation (suitable for fast loading by a Just-In-Time compiler), +and as a human readable assembly language representation. This allows +LLVM to provide a powerful intermediate representation for efficient +compiler transformations and analysis, while providing a natural means +to debug and visualize the transformations. The three different forms +of LLVM are all equivalent. This document describes the human readable +representation and notation.
+ +The LLVM representation aims to be light-weight and low-level +while being expressive, typed, and extensible at the same time. It +aims to be a "universal IR" of sorts, by being at a low enough level +that high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function... allowing it to be promoted to a simple SSA +value instead of a memory location.
+ ++
It is important to note that this document describes 'well formed' +LLVM assembly language. There is a difference between what the parser +accepts and what is considered 'well formed'. For example, the +following instruction is syntactically okay, but not well formed:
%x = add int 1, %x-...because the definition of %x does not dominate all of its uses. The -LLVM infrastructure provides a verification pass that may be used to verify that -an LLVM module is well formed. This pass is automatically run by the parser -after parsing input assembly, and by the optimizer before it outputs bytecode. -The violations pointed out by the verifier pass indicate bugs in transformation -passes or input to the parser.
- - +
...because the definition of %x does not dominate all of +its uses. The LLVM infrastructure provides a verification pass that may +be used to verify that an LLVM module is well formed. This pass is +automatically run by the parser after parsing input assembly and by +the optimizer before it outputs bytecode. The violations pointed out +by the verifier pass indicate bugs in transformation passes or input to +the parser.
+-Identifiers - |
+
LLVM uses three different forms of identifiers, for different +purposes:
- -LLVM requires the values start with a '%' sign for two reasons: Compilers don't -need to worry about name clashes with reserved words, and the set of reserved -words may be expanded in the future without penalty. Additionally, unnamed -identifiers allow a compiler to quickly come up with a temporary variable -without having to avoid symbol table conflicts.
- -Reserved words in LLVM are very similar to reserved words in other languages. -There are keywords for different opcodes ('add', -'cast', 'ret', -etc...), for primitive type names ('void', -'uint', etc...), and others. These reserved -words cannot conflict with variable names, because none of them start with a '%' -character.
- -Here is an example of LLVM code to multiply the integer variable '%X' -by 8:
- -The easy way: +
LLVM requires that values start with a '%' sign for two reasons: Compilers +don't need to worry about name clashes with reserved words, and the set of +reserved words may be expanded in the future without penalty. Additionally, +unnamed identifiers allow a compiler to quickly come up with a temporary +variable without having to avoid symbol table conflicts.
+ +Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', etc...), +and others. These reserved words cannot conflict with variable names, because +none of them start with a '%' character.
+ +Here is an example of LLVM code to multiply the integer variable +'%X' by 8:
+ +The easy way:
+%result = mul uint %X, 8-After strength reduction: +
After strength reduction:
+%result = shl uint %X, ubyte 3-And the hard way: +
And the hard way:
+add uint %X, %X ; yields {uint}:%0 add uint %0, %0 ; yields {uint}:%1 %result = add uint %1, %1-This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:
+
This last way of multiplying %X by 8 illustrates several +important lexical features of LLVM:
-...and it also show a convention that we follow in this document. When +
...and it also shows a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic -text.
- -The one non-intuitive notation for constants is the optional hexidecimal form of -floating point constants. For example, the form 'double -0x432ff973cafa8000' is equivalent to (but harder to read than) 'double -4.5e+15' which is also supported by the parser. The only time hexadecimal -floating point constants are useful (and the only time that they are generated -by the disassembler) is when an FP constant has to be emitted that is not -representable as a decimal floating point number exactly. For example, NaN's, -infinities, and other special cases are represented in their IEEE hexadecimal -format so that assembly and disassembly do not cause any bits to change in the -constants.
+text.
+-Type System - |
+ +
+ +LLVM programs are composed of "Module"s, each of which is a +translation unit of the input programs. Each module consists of +functions, global variables, and symbol table entries. Modules may be +combined together with the LLVM linker, which merges function (and +global variable) definitions, resolves forward declarations, and merges +symbol table entries. Here is an example of the "hello world" module:
+ +; Declare the string constant as a global constant... +%.LC0 = internal constant [13 x sbyte] c"hello world\0A\00" ; [13 x sbyte]* + +; External declaration of the puts function +declare int %puts(sbyte*) ; int(sbyte*)* + +; Definition of main function +int %main() { ; int()* + ; Convert [13x sbyte]* to sbyte *... + %cast210 = getelementptr [13 x sbyte]* %.LC0, long 0, long 0 ; sbyte* + + ; Call puts function to write out the string to stdout... + call int %puts(sbyte* %cast210) ; int + ret int 0- +
}
This example is made up of a global variable +named ".LC0", an external declaration of the "puts" +function, and a function definition +for "main".
+In general, a module is made up of a list of global values, +where both functions and global variables are global values. Global values are +represented by a pointer to a memory location (in this case, a pointer to an +array of char, and a pointer to a function), and have one of the following linkage types.
+-Primitive Types - |
- -
-
-
+ Linkage Types
+
- |
+
-
+All Global Variables and Functions have one of the following types of linkage: + + +
|
+
+
signed | sbyte, short, int, long, float, double |
unsigned | ubyte, ushort, uint, ulong |
integer | ubyte, sbyte, ushort, short, uint, int, ulong, long |
integral | bool, ubyte, sbyte, ushort, short, uint, int, ulong, long |
floating point | float, double |
first class | bool, ubyte, sbyte, ushort, short, uint, int, ulong, long, float, double, pointer |
+
+
-Derived Types - |
+
LLVM functions, calls +and invokes can all have an optional calling convention +specified for the call. The calling convention of any pair of dynamic +caller/callee must match, or the behavior of the program is undefined. The +following calling conventions are supported by LLVM, and more may be added in +the future:
+ - -More calling conventions can be added/defined on an as-needed basis, to +support pascal conventions or any other well-known target-independent +convention.
-+ +
- <returntype> (<parameter list>) -+
Global variables define regions of memory allocated at compilation time +instead of run-time. Global variables may optionally be initialized. A +variable may be defined as a global "constant", which indicates that the +contents of the variable will never be modified (enabling better +optimization, allowing the global data to be placed in the read-only section of +an executable, etc). Note that variables that need runtime initialization +cannot be marked "constant", as there is a store to the variable.
-Where '<parameter list>' is a comma-separated list of type -specifiers. Optionally, the parameter list may include a type ..., -which indicates that the function takes a variable number of arguments. -Variable argument functions can access their arguments with the variable argument handling intrinsic functions.+LLVM explicitly allows declarations of global variables to be marked +constant, even if the final definition of the global is not. This capability +can be used to enable slightly better optimization of the program, but requires +the language definition to guarantee that optimizations based on the +'constantness' are valid for the translation units that do not include the +definition. +
+ +As SSA values, global variables define pointer values that are in +scope (i.e. they dominate) all basic blocks in the program. Global +variables always define a pointer to their "content" type because they +describe a region of memory, and all memory objects in LLVM are +accessed through pointers.
+ +LLVM function definitions consist of an optional linkage +type, an optional calling convention, a return +type, a function name, a (possibly empty) argument list, an opening curly brace, +a list of basic blocks, and a closing curly brace. LLVM function declarations +are defined with the "declare" keyword, an optional calling convention, a return type, a function name, and +a possibly empty list of arguments.
+ +A function definition contains a list of basic blocks, forming the CFG for +the function. Each basic block may optionally start with a label (giving the +basic block a symbol table entry), contains a list of instructions, and ends +with a terminator instruction (such as a branch or +function return).
+ +The first basic block in a program is special in two ways: it is immediately +executed on entrance to the function, and it is not allowed to have predecessor +basic blocks (i.e. there can not be any branches to the entry block of a +function). Because the block can have no predecessors, it also cannot have any +PHI nodes.
+ +LLVM functions are identified by their name and type signature. Hence, two +functions with the same name but different parameter lists or return values are +considered different functions, and LLVM will resolve references to each +appropriately.
+ +