From: Brian Gaeke Date: Mon, 24 Nov 2003 17:03:38 +0000 (+0000) Subject: Apply doc patch from PR136. X-Git-Url: http://plrg.eecs.uci.edu/git/?a=commitdiff_plain;h=07e89e43df34ea6c1bfff9e247040f07f59d0d6c;p=oota-llvm.git Apply doc patch from PR136. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@10198 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/Stacker.html b/docs/Stacker.html index 81ad60e8fd6..eabccdf6cf1 100644 --- a/docs/Stacker.html +++ b/docs/Stacker.html @@ -6,9 +6,21 @@
Stacker: An Example Of Using LLVM
+
  1. Abstract
  2. Introduction
  3. +
  4. Lessons I Learned About LLVM +
      +
    1. Everything's a Value!
    2. +
    3. Terminate Those Blocks!
    4. +
    5. Concrete Blocks
    6. +
    7. push_back Is Your Friend
    8. +
    9. The Wily GetElementPtrInst
    10. +
    11. Getting Linkage Types Right
    12. +
    13. Constants Are Easier Than That!
    14. +
    +
  5. The Stacker Lexicon
    1. The Stack @@ -18,12 +30,24 @@
    2. Built-Ins
  6. -
  7. The Directory Structure +
  8. Prime: A Complete Example
  9. +
  10. Internal Code Details +
      +
    1. The Directory Structure
    2. +
    3. The Lexer
    4. +
    5. The Parser
    6. +
    7. The Compiler
    8. +
    9. The Runtime
    10. +
    11. Compiler Driver
    12. +
    13. Test Programs
    14. +
    +

Written by Reid Spencer

+
Abstract
@@ -80,31 +104,266 @@ written Stacker definitions have that characteristic.

Exercise for the reader: how could you make this a one line program?

-
Lessons Learned About LLVM
+
Lessons I Learned About LLVM

Stacker was written for two purposes: (a) to get the author over the learning curve and (b) to provide a simple example of how to write a compiler using LLVM. During the development of Stacker, many lessons about LLVM were learned. Those lessons are described in the following subsections.

+ +
Everything's a Value!
+
+

Although I knew that LLVM used a Single Static Assignment (SSA) format, +it wasn't obvious to me how prevalent this idea was in LLVM until I really +started using it. Reading the Programmer's Manual and Language Reference I +noted that most of the important LLVM IR (Intermediate Representation) C++ +classes were derived from the Value class. The full power of that simple +design only became fully understood once I started constructing executable +expressions for Stacker.

+

This really makes your programming go faster. Think about compiling code +for the following C/C++ expression: (a|b)*((x+1)/(y+1)). You could write a +function using LLVM that does exactly that, this way:

+

+Value* 
+expression(BasicBlock*bb, Value* a, Value* b, Value* x, Value* y )
+{
+    Instruction* tail = bb->getTerminator();
+    ConstantSInt* one = ConstantSInt::get( Type::IntTy, 1);
+    BinaryOperator* or1 = 
+	new BinaryOperator::create( Instruction::Or, a, b, "", tail );
+    BinaryOperator* add1 = 
+	new BinaryOperator::create( Instruction::Add, x, one, "", tail );
+    BinaryOperator* add2 =
+	new BinaryOperator::create( Instruction::Add, y, one, "", tail );
+    BinaryOperator* div1 = 
+	new BinaryOperator::create( Instruction::Div, add1, add2, "", tail);
+    BinaryOperator* mult1 = 
+	new BinaryOperator::create( Instruction::Mul, or1, div1, "", tail );
+
+    return mult1;
+}
+
+

"Okay, big deal," you say. It is a big deal. Here's why. Note that I didn't +have to tell this function which kinds of Values are being passed in. They could be +instructions, Constants, Global Variables, etc. Furthermore, if you specify Values +that are incorrect for this sequence of operations, LLVM will either notice right +away (at compilation time) or the LLVM Verifier will pick up the inconsistency +when the compiler runs. In no case will you make a type error that gets passed +through to the generated program. This really helps you write a compiler +that always generates correct code!

+

The second point is that we don't have to worry about branching, registers, +stack variables, saving partial results, etc. The instructions we create +are the values we use. Note that all that was created in the above +code is a Constant value and five operators. Each of the instructions is +the resulting value of that instruction.

+

The lesson is this: SSA form is very powerful: there is no difference + between a value and the instruction that created it. This is fully +enforced by the LLVM IR. Use it to your best advantage.

+
+ +
Terminate Those Blocks!
+
+

I had to learn about terminating blocks the hard way: using the debugger +to figure out what the LLVM verifier was trying to tell me and begging for +help on the LLVMdev mailing list. I hope you avoid this experience.

+

Emblazon this rule in your mind:

+ +

Terminating instructions are a semantic requirement of the LLVM IR. There +is no facility for implicitly chaining together blocks placed into a function +in the order they occur. Indeed, in the general case, blocks will not be +added to the function in the order of execution because of the recursive +way compilers are written.

+

Furthermore, if you don't terminate your blocks, your compiler code will +compile just fine. You won't find out about the problem until you're running +the compiler and the module you just created fails on the LLVM Verifier.

+
+ +
Concrete Blocks
+
+

After a little initial fumbling around, I quickly caught on to how blocks +should be constructed. The use of the standard template library really helps +simply the interface. In general, here's what I learned: +

    +
  1. Create your blocks early. While writing your compiler, you + will encounter several situations where you know apriori that you will + need several blocks. For example, if-then-else, switch, while and for + statements in C/C++ all need multiple blocks for expression in LVVM. + The rule is, create them early.
  2. +
  3. Terminate your blocks early. This just reduces the chances + that you forget to terminate your blocks which is required (go + here for more). +
  4. Use getTerminator() for instruction insertion. I noticed early on + that many of the constructors for the Instruction classes take an optional + insert_before argument. At first, I thought this was a mistake + because clearly the normal mode of inserting instructions would be one at + a time after some other instruction, not before. However, + if you hold on to your terminating instruction (or use the handy dandy + getTerminator() method on a BasicBlock), it can + always be used as the insert_before argument to your instruction + constructors. This causes the instruction to automatically be inserted in + the RightPlace&tm; place, just before the terminating instruction. The + nice thing about this design is that you can pass blocks around and insert + new instructions into them without ever known what instructions came + before. This makes for some very clean compiler design.
  5. +
+

The foregoing is such an important principal, its worth making an idiom:

+
+
+BasicBlock* bb = new BasicBlock();
+bb->getInstList().push_back( new Branch( ... ) );
+new Instruction(..., bb->getTerminator() );
+
+
+

To make this clear, consider the typical if-then-else statement +(see StackerCompiler::handle_if() method). We can set this up +in a single function using LLVM in the following way:

+
+using namespace llvm;
+BasicBlock*
+MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition )
+{
+    // Create the blocks to contain code in the structure of if/then/else
+    BasicBlock* then = new BasicBlock(); 
+    BasicBlock* else = new BasicBlock();
+    BasicBlock* exit = new BasicBlock();
+
+    // Insert the branch instruction for the "if"
+    bb->getInstList().push_back( new BranchInst( then, else, condition ) );
+
+    // Set up the terminating instructions
+    then->getInstList().push_back( new BranchInst( exit ) );
+    else->getInstList().push_back( new BranchInst( exit ) );
+
+    // Fill in the then part .. details excised for brevity
+    this->fill_in( then );
+
+    // Fill in the else part .. details excised for brevity
+    this->fill_in( else );
+
+    // Return a block to the caller that can be filled in with the code
+    // that follows the if/then/else construct.
+    return exit;
+}
+
+

Presumably in the foregoing, the calls to the "fill_in" method would add +the instructions for the "then" and "else" parts. They would use the third part +of the idiom almost exclusively (inserting new instructions before the +terminator). Furthermore, they could even recurse back to handle_if +should they encounter another if/then/else statement and it will all "just work". +

+

Note how cleanly this all works out. In particular, the push_back methods on +the BasicBlock's instruction list. These are lists of type +Instruction which also happen to be Values. To create +the "if" branch we merely instantiate a BranchInst that takes as +arguments the blocks to branch to and the condition to branch on. The blocks +act like branch labels! This new BranchInst terminates +the BasicBlock provided as an argument. To give the caller a way +to keep inserting after calling handle_if we create an "exit" block +which is returned to the caller. Note that the "exit" block is used as the +terminator for both the "then" and the "else" blocks. This gaurantees that no +matter what else "handle_if" or "fill_in" does, they end up at the "exit" block. +

+
+ +
push_back Is Your Friend
+
+

+One of the first things I noticed is the frequent use of the "push_back" +method on the various lists. This is so common that it is worth mentioning. +The "push_back" inserts a value into an STL list, vector, array, etc. at the +end. The method might have also been named "insert_tail" or "append". +Althought I've used STL quite frequently, my use of push_back wasn't very +high in other programs. In LLVM, you'll use it all the time. +

+
+ +
The Wily GetElementPtrInst
+
+

+It took a little getting used to and several rounds of postings to the LLVM +mail list to wrap my head around this instruction correctly. Even though I had +read the Language Reference and Programmer's Manual a couple times each, I still +missed a few very key points: +

+ +

This means that when you look up an element in the global variable (assuming +its a struct or array), you must deference the pointer first! For many +things, this leads to the idiom: +

+

+std::vector index_vector;
+index_vector.push_back( ConstantSInt::get( Type::LongTy, 0 );
+// ... push other indices ...
+GetElementPtrInst* gep = new GetElementPtrInst( ptr, index_vector );
+
+

For example, suppose we have a global variable whose type is [24 x int]. The +variable itself represents a pointer to that array. To subscript the +array, we need two indices, not just one. The first index (0) dereferences the +pointer. The second index subscripts the array. If you're a "C" programmer, this +will run against your grain because you'll naturally think of the global array +variable and the address of its first element as the same. That tripped me up +for a while until I realized that they really do differ .. by type. +Remember that LLVM is a strongly typed language itself. Absolutely everything +has a type. The "type" of the global variable is [24 x int]*. That is, its +a pointer to an array of 24 ints. When you dereference that global variable with +a single index, you now have a " [24 x int]" type, the pointer is gone. Although +the pointer value of the dereferenced global and the address of the zero'th element +in the array will be the same, they differ in their type. The zero'th element has +type "int" while the pointer value has type "[24 x int]".

+

Get this one aspect of LLVM right in your head and you'll save yourself +a lot of compiler writing headaches down the road.

+
+
Getting Linkage Types Right
-

To be completed.

-
Everything's a Value!
-

To be completed.

-
The Wily GetElementPtrInst
-

To be completed.

-
Constants Are Easier Than That!
-

To be completed.

-
Terminate Those Blocks!
-

To be completed.

-
new,get,create .. Its All The Same
-

To be completed.

-
Utility Functions To The Rescue
-

To be completed.

-
push_back Is Your Friend
-

To be completed.

-
Block Heads Come First
-

To be completed.

+
+

Linkage types in LLVM can be a little confusing, especially if your compiler +writing mind has affixed very hard concepts to particular words like "weak", +"external", "global", "linkonce", etc. LLVM does not use the precise +definitions of say ELF or GCC even though they share common terms. To be fair, +the concepts are related and similar but not precisely the same. This can lead +you to think you know what a linkage type represents but in fact it is slightly +different. I recommend you read the + Language Reference on this topic very +carefully.

+

Here are some handy tips that I discovered along the way:

+ +
+ +
Constants Are Easier Than That!
+
+

+Constants in LLVM took a little getting used to until I discovered a few utility +functions in the LLVM IR that make things easier. Here's what I learned:

+ +
The Stacker Lexicon
The Stack
@@ -184,7 +443,7 @@ depending on what they do. The groups are as follows:

their operands.
The words are: ABS NEG + - * / MOD */ ++ -- MIN MAX
  • StackThese words manipulate the stack directly by moving its elements around.
    The words are: DROP DUP SWAP OVER ROT DUP2 DROP2 PICK TUCK
  • -
  • Memory>These words allocate, free and manipulate memory +
  • MemoryThese words allocate, free and manipulate memory areas outside the stack.
    The words are: MALLOC FREE GET PUT
  • ControlThese words alter the normal left to right flow of execution.
    The words are: IF ELSE ENDIF WHILE END RETURN EXIT RECURSE
  • @@ -696,39 +955,19 @@ using the following construction:

    -
    Directory Structure
    -
    -

    The source code, test programs, and sample programs can all be found -under the LLVM "projects" directory. You will need to obtain the LLVM sources -to find it (either via anonymous CVS or a tarball. See the -Getting Started document).

    -

    Under the "projects" directory there is a directory named "stacker". That -directory contains everything, as follows:

    -
    - -
    Prime: A Complete Example
    +
    Prime: A Complete Example
    -

    The following fully documented program highlights many of features of both -the Stacker language and what is possible with LLVM. The program simply -prints out the prime numbers until it reaches +

    The following fully documented program highlights many features of both +the Stacker language and what is possible with LLVM. The program has two modes +of operations. If you provide numeric arguments to the program, it checks to see +if those arguments are prime numbers, prints out the results. Without any +aruments, the program prints out any prime numbers it finds between 1 and one +million (there's a log of them!). The source code comments below tell the +remainder of the story.

    -

    - ################################################################################ # # Brute force prime number generator @@ -964,19 +1203,68 @@ prints out the prime numbers until it reaches ENDIF 0 ( push return code ) ; -]]> -

    +
    - -

    To be completed.

    -
    The Lexer
    -
    The Parser
    -
    The Compiler
    -
    The Stack
    -
    Definitions Are Functions
    -
    Words Are BasicBlocks
    + +
    +

    This section is under construction. +

    In the mean time, you can always read the code! It has comments!

    +
    + + +
    +

    The source code, test programs, and sample programs can all be found +under the LLVM "projects" directory. You will need to obtain the LLVM sources +to find it (either via anonymous CVS or a tarball. See the +Getting Started document).

    +

    Under the "projects" directory there is a directory named "stacker". That +directory contains everything, as follows:

    +
      +
    • lib - contains most of the source code +
        +
      • lib/compiler - contains the compiler library +
      • lib/runtime - contains the runtime library +
    • +
    • test - contains the test programs
    • +
    • tools - contains the Stacker compiler main program, stkrc +
        +
      • lib/stkrc - contains the Stacker compiler main program + +
      • sample - contains the sample programs
      • +
      +
    + +
    The Lexer
    +
    +

    See projects/Stacker/lib/compiler/Lexer.l

    +

    + +
    The Parser
    +
    +

    See projects/Stacker/lib/compiler/StackerParser.y

    +

    + +
    The Compiler
    +
    +

    See projects/Stacker/lib/compiler/StackerCompiler.cpp

    +

    + +
    The Runtime
    +
    +

    See projects/Stacker/lib/runtime/stacker_rt.c

    +

    + +
    Compiler Driver
    +
    +

    See projects/Stacker/tools/stkrc/stkrc.cpp

    +

    + +
    Test Programs
    +
    +

    See projects/Stacker/test/*.st

    +