From: Chris Lattner Date: Wed, 31 Oct 2007 06:30:21 +0000 (+0000) Subject: Add the first half of chapter 5: if/then/else. X-Git-Url: http://plrg.eecs.uci.edu/git/?a=commitdiff_plain;h=602c832c208f64484e83282f0e80fcb8edda11a7;p=oota-llvm.git Add the first half of chapter 5: if/then/else. To come: for statement. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43546 91177308-0d34-0410-b5e6-96231b3b80d8 --- diff --git a/docs/tutorial/LangImpl5-cfg.png b/docs/tutorial/LangImpl5-cfg.png new file mode 100644 index 00000000000..cdba92ff6c5 Binary files /dev/null and b/docs/tutorial/LangImpl5-cfg.png differ diff --git a/docs/tutorial/LangImpl5.html b/docs/tutorial/LangImpl5.html new file mode 100644 index 00000000000..d1f81efb866 --- /dev/null +++ b/docs/tutorial/LangImpl5.html @@ -0,0 +1,523 @@ + + + + + Kaleidoscope: Extending the Language: Control Flow + + + + + + + +
Kaleidoscope: Extending the Language: Control Flow
+ +
+

Written by Chris Lattner

+
+ + +
Part 5 Introduction
+ + +
+ +

Welcome to Part 5 of the "Implementing a language with +LLVM" tutorial. Parts 1-4 described the implementation of the simple +Kaleidoscope language and included support for generating LLVM IR, following by +optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is +mostly useless: it has no control flow other than call and return. This means +that you can't have conditional branches in the code, significantly limiting its +power. In this episode of "build that compiler", we'll extend Kaleidoscope to +have an if/then/else expression plus a simple looping construct.

+ +
+ + +
If/Then/Else
+ + +
+ +

+Extending Kaleidoscope to support if/then/else is quite straight-forward. It +basically requires adding lexer support for this "new" concept to the lexer, +parser, AST, and LLVM code emitter. This example is nice, because it shows how +easy it is to "grow" a language over time, incrementally extending it as new +ideas are discovered.

+ +

Before we get going on "how" we do this extension, lets talk about what we +want. The basic idea is that we want to be able to write this sort of thing: +

+ +
+
+def fib(x)
+  if x < 3 then
+    1
+  else
+    fib(x-1)+fib(x-2);
+
+
+ +

In Kaleidoscope, every construct is an expression: there are no statements. +As such, the if/then/else expression needs to return a value like any other. +Since we're using a mostly functional form, we'll have it evaluate its +conditional, then return the 'then' or 'else' value based on how the condition +was resolved. This is very similar to the C "?:" expression.

+ +

The semantics of the if/then/else expression is that it first evaluates the +condition to a boolean equality value: 0.0 is false and everything else is true. +If the condition is true, the first subexpression is evaluated and returned, if +the condition is false, the second subexpression is evaluated and returned. +Since Kaleidoscope allows side-effects, this behavior is important to nail down. +

+ +

Now that we know what we want, lets break this down into its constituent +pieces.

+ +
+ + +
Lexer Extensions for +If/Then/Else
+ + + +
+ +

The lexer extensions are straight-forward. First we add new enum values +for the relevant tokens:

+ +
+
+  // control
+  tok_if = -6, tok_then = -7, tok_else = -8,
+
+
+ +

Once we have that, we recognize the new keywords in the lexer, pretty simple +stuff:

+ +
+
+    ...
+    if (IdentifierStr == "def") return tok_def;
+    if (IdentifierStr == "extern") return tok_extern;
+    if (IdentifierStr == "if") return tok_if;
+    if (IdentifierStr == "then") return tok_then;
+    if (IdentifierStr == "else") return tok_else;
+    return tok_identifier;
+
+
+ +
+ + +
AST Extensions for + If/Then/Else
+ + +
+ +

To represent the new expression we add a new AST node for it:

+ +
+
+/// IfExprAST - Expression class for if/then/else.
+class IfExprAST : public ExprAST {
+  ExprAST *Cond, *Then, *Else;
+public:
+  IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
+    : Cond(cond), Then(then), Else(_else) {}
+  virtual Value *Codegen();
+};
+
+
+ +

The AST node just has pointers to the various subexpressions.

+ +
+ + +
Parser Extensions for +If/Then/Else
+ + +
+ +

Now that we have the relevant tokens coming from the lexer and we have the +AST node to build, our parsing logic is relatively straight-forward. First we +define a new parsing function:

+ +
+
+/// ifexpr ::= 'if' expression 'then' expression 'else' expression
+static ExprAST *ParseIfExpr() {
+  getNextToken();  // eat the if.
+  
+  // condition.
+  ExprAST *Cond = ParseExpression();
+  if (!Cond) return 0;
+  
+  if (CurTok != tok_then)
+    return Error("expected then");
+  getNextToken();  // eat the then
+  
+  ExprAST *Then = ParseExpression();
+  if (Then == 0) return 0;
+  
+  if (CurTok != tok_else)
+    return Error("expected else");
+  
+  getNextToken();
+  
+  ExprAST *Else = ParseExpression();
+  if (!Else) return 0;
+  
+  return new IfExprAST(Cond, Then, Else);
+}
+
+
+ +

Next we hook it up as a primary expression:

+ +
+
+static ExprAST *ParsePrimary() {
+  switch (CurTok) {
+  default: return Error("unknown token when expecting an expression");
+  case tok_identifier: return ParseIdentifierExpr();
+  case tok_number:     return ParseNumberExpr();
+  case '(':            return ParseParenExpr();
+  case tok_if:         return ParseIfExpr();
+  }
+}
+
+
+ +
+ + +
LLVM IR for If/Then/Else
+ + +
+ +

Now that we have it parsing and building the AST, the final piece is adding +LLVM code generation support. This is the most interesting part of the +if/then/else example, because this is where it starts to introduce new concepts. +All of the code above has been described in previous chapters fairly thoroughly. +

+ +

To motivate the code we want to produce, lets take a look at a simple +example. Consider:

+ +
+
+extern foo();
+extern bar();
+def baz(x) if x then foo() else bar();
+
+
+ +

If you disable optimizations, the code you'll (soon) get from Kaleidoscope +looks like this:

+ +
+
+declare double @foo()
+
+declare double @bar()
+
+define double @baz(double %x) {
+entry:
+	%ifcond = fcmp one double %x, 0.000000e+00
+	br i1 %ifcond, label %then, label %else
+
+then:		; preds = %entry
+	%calltmp = call double @foo()
+	br label %ifcont
+
+else:		; preds = %entry
+	%calltmp1 = call double @bar()
+	br label %ifcont
+
+ifcont:		; preds = %else, %then
+	%iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
+	ret double %iftmp
+}
+
+
+ +

To visualize the control flow graph, you can use a nifty feature of the LLVM +'opt' tool. If you put this LLVM IR +into "t.ll" and run "llvm-as < t.ll | opt -analyze -view-cfg", a window will pop up and you'll +see this graph:

+ +
Example CFG
+ +

Another way to get this is to call "F->viewCFG()" or +"F->viewCFGOnly()" (where F is a "Function*") either by +inserting actual calls into the code and recompiling or by calling these in the +debugger. LLVM has many nice features for visualizing various graphs.

+ +

Coming back to the generated code, it is fairly simple: the entry block +evaluates the conditional expression ("x" in our case here) and compares the +result to 0.0 with the "fcmp one" +instruction ('one' is "ordered and not equal"). Based on the result of this +expression, the code jumps to either the "then" or "else" blocks, which contain +the expressions for the true/false case.

+ +

Once the then/else blocks is finished executing, they both branch back to the +else block to execute the code that happens after the if/then/else. In this +case the only thing left to do is to return to the caller of the function. The +question then becomes: how does the code know which expression to return?

+ +

The answer to this question involves an important SSA operation: the +Phi +operation. If you're not familiar with SSA, the wikipedia +article is a good introduction and there are various other introductions to +it available on your favorite search engine. The short version is that +"execution" of the Phi operation requires "remembering" which block control came +from. The Phi operation takes on the value corresponding to the input control +block. In this case, if control comes in from the "then" block, it gets the +value of "calltmp". If control comes from the "else" block, it gets the value +of "calltmp1".

+ +

At this point, you are probably starting to think "on no! this means my +simple and elegant front-end will have to start generating SSA form in order to +use LLVM!". Fortunately, this is not the case, and we strongly advise +not implementing an SSA construction algorithm in your front-end +unless there is an amazingly good reason to do so. In practice, there are two +sorts of values that float around in code written in your average imperative +programming language that might need Phi nodes:

+ +
    +
  1. Code that involves user variables: x = 1; x = x + 1;
  2. +
  3. Values that are implicit in the structure of your AST, such as the phi node +in this case.
  4. +
+ +

At a future point in this tutorial ("mutable variables"), we'll talk about #1 +in depth. For now, just believe me that you don't need SSA construction to +handle them. For #2, you have the choice of using the techniques that we will +describe for #1, or you can insert Phi nodes directly if convenient. In this +case, it is really really easy to generate the Phi node, so we choose to do it +directly.

+ +

Okay, enough of the motivation and overview, lets generate code!

+ +
+ + +
Code Generation for +If/Then/Else
+ + +
+ +

In order to generate code for this, we implement the Codegen method +for IfExprAST:

+ +
+
+Value *IfExprAST::Codegen() {
+  Value *CondV = Cond->Codegen();
+  if (CondV == 0) return 0;
+  
+  // Convert condition to a bool by comparing equal to 0.0.
+  CondV = Builder.CreateFCmpONE(CondV, 
+                                ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
+                                "ifcond");
+
+
+ +

This code is straight-forward and similar to what we saw before. We emit the +expression for the condition, then compare that value to zero to get a truth +value as a 1-bit (bool) value.

+ +
+
+  Function *TheFunction = Builder.GetInsertBlock()->getParent();
+  
+  // Create blocks for the then and else cases.  Insert the 'then' block at the
+  // end of the function.
+  BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
+  BasicBlock *ElseBB = new BasicBlock("else");
+  BasicBlock *MergeBB = new BasicBlock("ifcont");
+
+  Builder.CreateCondBr(CondV, ThenBB, ElseBB);
+
+
+ +

This code creates the basic blocks that are related to the if/then/else +statement, and correspond directly to the blocks in the example above. The +first line of this gets the current Function object that is being built. It +gets this by asking the builder for the current BasicBlock, and asking that +block for its "parent" (the function it is currently embedded into).

+ +

Once it has that, it creates three blocks. Note that it passes "TheFunction" +into the constructor for the "then" block. This causes the constructor to +automatically insert the new block onto the end of the specified function. The +other two blocks are created, but aren't yet inserted into the function.

+ +

Once the blocks are created, we can emit the conditional branch that chooses +between them. Note that creating new blocks does not implicitly affect the +LLVMBuilder, so it is still inserting into the block that the condition +went into. Also note that it is creating a branch to the "then" block and the +"else" block, even though the "else" block isn't inserted into the function yet. +This is all ok: it is the standard way that LLVM supports forward +references.

+ +
+
+  // Emit then value.
+  Builder.SetInsertPoint(ThenBB);
+  
+  Value *ThenV = Then->Codegen();
+  if (ThenV == 0) return 0;
+  
+  Builder.CreateBr(MergeBB);
+  // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
+  ThenBB = Builder.GetInsertBlock();
+
+
+ +

After the conditional branch is inserted, we move the builder to start +inserting into the "then" block. Strictly speaking, this call moves the +insertion point to be at the end of the specified block. However, since the +"then" block is empty, it also starts out by inserting at the beginning of the +block. :)

+ +

Once the insertion point is set, we recursively codegen the "then" expression +from the AST. To finish off the then block, we create an unconditional branch +to the merge block. One interesting (and very important) aspect of the LLVM IR +is that it requires all basic blocks +to be "terminated" with a control flow +instruction such as return or branch. This means that all control flow, +including fall throughs must be made explicit in the LLVM IR. If you +violate this rule, the verifier will emit an error.

+ +

The final line here is quite subtle, but is very important. The basic issue +is that when we create the Phi node in the merge block, we need to set up the +block/value pairs that indicate how the Phi will work. Importantly, the Phi +node expects to have an extry for each predecessor of the block in the CFG. Why +then are we getting the current block when we just set it to ThenBB 5 lines +above? The problem is that the "Then" expression may actually itself change the +block that the Builder is emitting into if, for example, it contains a nested +"if/then/else" expression. Because calling Codegen recursively could +arbitrarily change the notion of the current block, we are required to get an +up-to-date value for code that will set up the Phi node.

+ +
+
+  // Emit else block.
+  TheFunction->getBasicBlockList().push_back(ElseBB);
+  Builder.SetInsertPoint(ElseBB);
+  
+  Value *ElseV = Else->Codegen();
+  if (ElseV == 0) return 0;
+  
+  Builder.CreateBr(MergeBB);
+  // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
+  ElseBB = Builder.GetInsertBlock();
+
+
+ +

Code generation for the 'else' block is basically identical to codegen for +the 'then' block. The only significant difference is the first line, which adds +the 'else' block to the function. Recall previously that the 'else' block was +created, but not added to the function. Now that the 'then' and 'else' blocks +are emitted, we can finish up with the merge code:

+ +
+
+  // Emit merge block.
+  TheFunction->getBasicBlockList().push_back(MergeBB);
+  Builder.SetInsertPoint(MergeBB);
+  PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
+  
+  PN->addIncoming(ThenV, ThenBB);
+  PN->addIncoming(ElseV, ElseBB);
+  return PN;
+}
+
+
+ +

The first two lines here are now familiar: the first adds the "merge" block +to the Function object (it was previously floating, like the else block above). +The second block changes the insertion point so that newly created code will go +into the "merge" block. Once that is done, we need to create the PHI node and +set up the block/value pairs for the PHI.

+ +

Finally, the CodeGen function returns the phi node as the value computed by +the if/then/else expression. In our example above, this returned value will +feed into the code for the top-level function, which will create the return +instruction.

+ +

Overall, we now have the ability to execution conditional code in +Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language +that can calculate a wide variety of numeric functions. Next up we'll add +another useful expression that is familiar from non-functional languages...

+ +
+ + +
'for' Loop Expression
+ + +
+ +

...

+ +
+ + +
Full Code Listing
+ + +
+ +

+Here is the complete code listing for our running example, enhanced with the +if/then/else and for expressions.. To build this example, use: +

+ +
+
+   # Compile
+   g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
+   # Run
+   ./toy
+
+
+ +

Here is the code:

+ +
+
+...
+
+
+ +
+ + +
+
+ Valid CSS! + Valid HTML 4.01! + + Chris Lattner
+ The LLVM Compiler Infrastructure
+ Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $ +
+ + diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html index 99278fae89a..11e93d3ccd5 100644 --- a/docs/tutorial/index.html +++ b/docs/tutorial/index.html @@ -31,7 +31,7 @@
  • Implementing a Parser and AST
  • Implementing Code Generation to LLVM IR
  • Adding JIT and Optimizer Support
  • -
  • Extending the language: if/then/else
  • +
  • Extending the language: control flow
  • Extending the language: operator overloading
  • Extending the language: mutable variables
  • Thoughts and ideas for extensions