X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FLangRef.html;h=e8c1f01cee16ccc4b0f886d16113a3cc8daf0955;hb=10610646cbe02d5a018ddc1092f3b17e6325a94d;hp=ffae318b29f0e148820b5061185ad88c3c3a49df;hpb=a1a209791982cd19831fbc9cc9152b222263bcd2;p=oota-llvm.git diff --git a/docs/LangRef.html b/docs/LangRef.html index ffae318b29f..e8c1f01cee1 100644 --- a/docs/LangRef.html +++ b/docs/LangRef.html @@ -1,1458 +1,1429 @@ - -
LLVM Language Reference Manual | -
Written by Chris Lattner and Vikram Adve
- - +
Written by Chris Lattner +and Vikram Adve
++
-Abstract - |
- This document is a reference manual for the LLVM assembly language. LLVM is - an SSA based representation that provides type safety, low level operations, - flexibility, and the capability of representing 'all' high level languages - cleanly. It is the common code representation used throughout all phases of - the LLVM compilation strategy. -- - - - +
This document is a reference manual for the LLVM assembly language. +LLVM is an SSA based representation that provides type safety, +low-level operations, flexibility, and the capability of representing +'all' high-level languages cleanly. It is the common code +representation used throughout all phases of the LLVM compilation +strategy.
+-Introduction - |
- -The LLVM representation aims to be a light weight and low level while being -expressive, typed, and extensible at the same time. It aims to be a "universal -IR" of sorts, by being at a low enough level that high level ideas may be -cleanly mapped to it (similar to how microprocessors are "universal IR's", -allowing many source languages to be mapped to them). By providing type -information, LLVM can be used as the target of optimizations: for example, -through pointer analysis, it can be proven that a C automatic variable is never -accessed outside of the current function... allowing it to be promoted to a -simple SSA value instead of a memory location.
- +
The LLVM code representation is designed to be used in three +different forms: as an in-memory compiler IR, as an on-disk bytecode +representation (suitable for fast loading by a Just-In-Time compiler), +and as a human readable assembly language representation. This allows +LLVM to provide a powerful intermediate representation for efficient +compiler transformations and analysis, while providing a natural means +to debug and visualize the transformations. The three different forms +of LLVM are all equivalent. This document describes the human readable +representation and notation.
+The LLVM representation aims to be a light-weight and low-level +while being expressive, typed, and extensible at the same time. It +aims to be a "universal IR" of sorts, by being at a low enough level +that high-level ideas may be cleanly mapped to it (similar to how +microprocessors are "universal IR's", allowing many source languages to +be mapped to them). By providing type information, LLVM can be used as +the target of optimizations: for example, through pointer analysis, it +can be proven that a C automatic variable is never accessed outside of +the current function... allowing it to be promoted to a simple SSA +value instead of a memory location.
+- -
- %x = add int 1, %x -- -...because the definition of %x does not dominate all of its uses. The -LLVM infrastructure provides a verification pass that may be used to verify that -an LLVM module is well formed. This pass is automatically run by the parser -after parsing input assembly, and by the optimizer before it outputs bytecode. -The violations pointed out by the verifier pass indicate bugs in transformation -passes or input to the parser.
- - - - +
+It is important to note that this document describes 'well formed' +LLVM assembly language. There is a difference between what the parser +accepts and what is considered 'well formed'. For example, the +following instruction is syntactically okay, but not well formed:
+%x = add int 1, %x+
...because the definition of %x does not dominate all of +its uses. The LLVM infrastructure provides a verification pass that may +be used to verify that an LLVM module is well formed. This pass is +automatically run by the parser after parsing input assembly, and by +the optimizer before it outputs bytecode. The violations pointed out +by the verifier pass indicate bugs in transformation passes or input to +the parser.
+-Identifiers - |
- +
LLVM uses three different forms of identifiers, for different +purposes:
- -LLVM requires the values start with a '%' sign for two reasons: Compilers don't -need to worry about name clashes with reserved words, and the set of reserved -words may be expanded in the future without penalty. Additionally, unnamed -identifiers allow a compiler to quickly come up with a temporary variable -without having to avoid symbol table conflicts.
- -Reserved words in LLVM are very similar to reserved words in other languages. -There are keywords for different opcodes ('add', -'cast', 'ret', -etc...), for primitive type names ('void', -'uint', etc...), and others. These reserved -words cannot conflict with variable names, because none of them start with a '%' -character.
- -Here is an example of LLVM code to multiply the integer variable '%X' -by 8:
- -The easy way: -
- %result = mul uint %X, 8 -- -After strength reduction: -
- %result = shl uint %X, ubyte 3 -- -And the hard way: -
- add uint %X, %X ; yields {uint}:%0 - add uint %0, %0 ; yields {uint}:%1 - %result = add uint %1, %1 -- -This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:
- +
LLVM requires the values start with a '%' sign for two reasons: +Compilers don't need to worry about name clashes with reserved words, +and the set of reserved words may be expanded in the future without +penalty. Additionally, unnamed identifiers allow a compiler to quickly +come up with a temporary variable without having to avoid symbol table +conflicts.
+Reserved words in LLVM are very similar to reserved words in other +languages. There are keywords for different opcodes ('add', 'cast', 'ret', etc...), for primitive type names ('void', 'uint', +etc...), and others. These reserved words cannot conflict with +variable names, because none of them start with a '%' character.
+Here is an example of LLVM code to multiply the integer variable '%X' +by 8:
+The easy way:
+%result = mul uint %X, 8+
After strength reduction:
+%result = shl uint %X, ubyte 3+
And the hard way:
+add uint %X, %X ; yields {uint}:%0 + add uint %0, %0 ; yields {uint}:%1 + %result = add uint %1, %1+
This last way of multiplying %X by 8 illustrates several +important lexical features of LLVM:
- -...and it also show a convention that we follow in this document. When -demonstrating instructions, we will follow an instruction with a comment that -defines the type and name of value produced. Comments are shown in italic -text.
-
-The one non-intuitive notation for constants is the optional hexidecimal form of
-floating point constants. For example, the form 'double
+
...and it also show a convention that we follow in this document. +When demonstrating instructions, we will follow an instruction with a +comment that defines the type and name of value produced. Comments are +shown in italic text.
+The one non-intuitive notation for constants is the optional +hexidecimal form of floating point constants. For example, the form 'double 0x432ff973cafa8000' is equivalent to (but harder to read than) 'double -4.5e+15' which is also supported by the parser. The only time hexadecimal -floating point constants are useful (and the only time that they are generated -by the disassembler) is when an FP constant has to be emitted that is not -representable as a decimal floating point number exactly. For example, NaN's, -infinities, and other special cases are represented in their IEEE hexadecimal -format so that assembly and disassembly do not cause any bits to change in the -constants.
- - +4.5e+15' which is also supported by the parser. The only time +hexadecimal floating point constants are useful (and the only time that +they are generated by the disassembler) is when an FP constant has to +be emitted that is not representable as a decimal floating point number +exactly. For example, NaN's, infinities, and other special cases are +represented in their IEEE hexadecimal format so that assembly and +disassembly do not cause any bits to change in the constants.
+-Type System - |
- +
The LLVM type system is one of the most important features of the +intermediate representation. Being typed enables a number of +optimizations to be performed on the IR directly, without having to do +extra analyses on the side before the transformation. A strong type +system makes it easier to read the generated code and enables novel +analyses and transformations that are not feasible to perform on normal +three address code representations.
- - - +href="#rw_stroustrup">1.-->
-Primitive Types - |
- -