1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Extending the Language: Control Flow</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
14 <div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
16 <div class="doc_author">
17 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
20 <!-- *********************************************************************** -->
21 <div class="doc_section"><a name="intro">Part 5 Introduction</a></div>
22 <!-- *********************************************************************** -->
24 <div class="doc_text">
26 <p>Welcome to Part 5 of the "<a href="index.html">Implementing a language with
27 LLVM</a>" tutorial. Parts 1-4 described the implementation of the simple
28 Kaleidoscope language and included support for generating LLVM IR, following by
29 optimizations and a JIT compiler. Unfortunately, as presented, Kaleidoscope is
30 mostly useless: it has no control flow other than call and return. This means
31 that you can't have conditional branches in the code, significantly limiting its
32 power. In this episode of "build that compiler", we'll extend Kaleidoscope to
33 have an if/then/else expression plus a simple looping construct.</p>
37 <!-- *********************************************************************** -->
38 <div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
39 <!-- *********************************************************************** -->
41 <div class="doc_text">
44 Extending Kaleidoscope to support if/then/else is quite straight-forward. It
45 basically requires adding lexer support for this "new" concept to the lexer,
46 parser, AST, and LLVM code emitter. This example is nice, because it shows how
47 easy it is to "grow" a language over time, incrementally extending it as new
48 ideas are discovered.</p>
50 <p>Before we get going on "how" we do this extension, lets talk about what we
51 want. The basic idea is that we want to be able to write this sort of thing:
54 <div class="doc_code">
64 <p>In Kaleidoscope, every construct is an expression: there are no statements.
65 As such, the if/then/else expression needs to return a value like any other.
66 Since we're using a mostly functional form, we'll have it evaluate its
67 conditional, then return the 'then' or 'else' value based on how the condition
68 was resolved. This is very similar to the C "?:" expression.</p>
70 <p>The semantics of the if/then/else expression is that it first evaluates the
71 condition to a boolean equality value: 0.0 is false and everything else is true.
72 If the condition is true, the first subexpression is evaluated and returned, if
73 the condition is false, the second subexpression is evaluated and returned.
74 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
77 <p>Now that we know what we want, lets break this down into its constituent
82 <!-- ======================================================================= -->
83 <div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
84 If/Then/Else</a></div>
85 <!-- ======================================================================= -->
88 <div class="doc_text">
90 <p>The lexer extensions are straight-forward. First we add new enum values
91 for the relevant tokens:</p>
93 <div class="doc_code">
96 tok_if = -6, tok_then = -7, tok_else = -8,
100 <p>Once we have that, we recognize the new keywords in the lexer, pretty simple
103 <div class="doc_code">
106 if (IdentifierStr == "def") return tok_def;
107 if (IdentifierStr == "extern") return tok_extern;
108 <b>if (IdentifierStr == "if") return tok_if;
109 if (IdentifierStr == "then") return tok_then;
110 if (IdentifierStr == "else") return tok_else;</b>
111 return tok_identifier;
117 <!-- ======================================================================= -->
118 <div class="doc_subsubsection"><a name="ifast">AST Extensions for
119 If/Then/Else </a></div>
120 <!-- ======================================================================= -->
122 <div class="doc_text">
124 <p>To represent the new expression we add a new AST node for it:</p>
126 <div class="doc_code">
128 /// IfExprAST - Expression class for if/then/else.
129 class IfExprAST : public ExprAST {
130 ExprAST *Cond, *Then, *Else;
132 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
133 : Cond(cond), Then(then), Else(_else) {}
134 virtual Value *Codegen();
139 <p>The AST node just has pointers to the various subexpressions.</p>
143 <!-- ======================================================================= -->
144 <div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
145 If/Then/Else </a></div>
146 <!-- ======================================================================= -->
148 <div class="doc_text">
150 <p>Now that we have the relevant tokens coming from the lexer and we have the
151 AST node to build, our parsing logic is relatively straight-forward. First we
152 define a new parsing function:</p>
154 <div class="doc_code">
156 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
157 static ExprAST *ParseIfExpr() {
158 getNextToken(); // eat the if.
161 ExprAST *Cond = ParseExpression();
164 if (CurTok != tok_then)
165 return Error("expected then");
166 getNextToken(); // eat the then
168 ExprAST *Then = ParseExpression();
169 if (Then == 0) return 0;
171 if (CurTok != tok_else)
172 return Error("expected else");
176 ExprAST *Else = ParseExpression();
179 return new IfExprAST(Cond, Then, Else);
184 <p>Next we hook it up as a primary expression:</p>
186 <div class="doc_code">
188 static ExprAST *ParsePrimary() {
190 default: return Error("unknown token when expecting an expression");
191 case tok_identifier: return ParseIdentifierExpr();
192 case tok_number: return ParseNumberExpr();
193 case '(': return ParseParenExpr();
194 <b>case tok_if: return ParseIfExpr();</b>
202 <!-- ======================================================================= -->
203 <div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
204 <!-- ======================================================================= -->
206 <div class="doc_text">
208 <p>Now that we have it parsing and building the AST, the final piece is adding
209 LLVM code generation support. This is the most interesting part of the
210 if/then/else example, because this is where it starts to introduce new concepts.
211 All of the code above has been described in previous chapters fairly thoroughly.
214 <p>To motivate the code we want to produce, lets take a look at a simple
215 example. Consider:</p>
217 <div class="doc_code">
221 def baz(x) if x then foo() else bar();
225 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
228 <div class="doc_code">
230 declare double @foo()
232 declare double @bar()
234 define double @baz(double %x) {
236 %ifcond = fcmp one double %x, 0.000000e+00
237 br i1 %ifcond, label %then, label %else
239 then: ; preds = %entry
240 %calltmp = call double @foo()
243 else: ; preds = %entry
244 %calltmp1 = call double @bar()
247 ifcont: ; preds = %else, %then
248 %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
254 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
255 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool. If you put this LLVM IR
256 into "t.ll" and run "<tt>llvm-as < t.ll | opt -analyze -view-cfg</tt>", <a
257 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
260 <center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
261 height="315"></center>
263 <p>Another way to get this is to call "<tt>F->viewCFG()</tt>" or
264 "<tt>F->viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
265 inserting actual calls into the code and recompiling or by calling these in the
266 debugger. LLVM has many nice features for visualizing various graphs.</p>
268 <p>Coming back to the generated code, it is fairly simple: the entry block
269 evaluates the conditional expression ("x" in our case here) and compares the
270 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
271 instruction ('one' is "ordered and not equal"). Based on the result of this
272 expression, the code jumps to either the "then" or "else" blocks, which contain
273 the expressions for the true/false case.</p>
275 <p>Once the then/else blocks is finished executing, they both branch back to the
276 else block to execute the code that happens after the if/then/else. In this
277 case the only thing left to do is to return to the caller of the function. The
278 question then becomes: how does the code know which expression to return?</p>
280 <p>The answer to this question involves an important SSA operation: the
281 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
282 operation</a>. If you're not familiar with SSA, <a
283 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
284 article</a> is a good introduction and there are various other introductions to
285 it available on your favorite search engine. The short version is that
286 "execution" of the Phi operation requires "remembering" which block control came
287 from. The Phi operation takes on the value corresponding to the input control
288 block. In this case, if control comes in from the "then" block, it gets the
289 value of "calltmp". If control comes from the "else" block, it gets the value
292 <p>At this point, you are probably starting to think "on no! this means my
293 simple and elegant front-end will have to start generating SSA form in order to
294 use LLVM!". Fortunately, this is not the case, and we strongly advise
295 <em>not</em> implementing an SSA construction algorithm in your front-end
296 unless there is an amazingly good reason to do so. In practice, there are two
297 sorts of values that float around in code written in your average imperative
298 programming language that might need Phi nodes:</p>
301 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
302 <li>Values that are implicit in the structure of your AST, such as the phi node
306 <p>At a future point in this tutorial ("mutable variables"), we'll talk about #1
307 in depth. For now, just believe me that you don't need SSA construction to
308 handle them. For #2, you have the choice of using the techniques that we will
309 describe for #1, or you can insert Phi nodes directly if convenient. In this
310 case, it is really really easy to generate the Phi node, so we choose to do it
313 <p>Okay, enough of the motivation and overview, lets generate code!</p>
317 <!-- ======================================================================= -->
318 <div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
319 If/Then/Else</a></div>
320 <!-- ======================================================================= -->
322 <div class="doc_text">
324 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
325 for <tt>IfExprAST</tt>:</p>
327 <div class="doc_code">
329 Value *IfExprAST::Codegen() {
330 Value *CondV = Cond->Codegen();
331 if (CondV == 0) return 0;
333 // Convert condition to a bool by comparing equal to 0.0.
334 CondV = Builder.CreateFCmpONE(CondV,
335 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
340 <p>This code is straight-forward and similar to what we saw before. We emit the
341 expression for the condition, then compare that value to zero to get a truth
342 value as a 1-bit (bool) value.</p>
344 <div class="doc_code">
346 Function *TheFunction = Builder.GetInsertBlock()->getParent();
348 // Create blocks for the then and else cases. Insert the 'then' block at the
349 // end of the function.
350 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
351 BasicBlock *ElseBB = new BasicBlock("else");
352 BasicBlock *MergeBB = new BasicBlock("ifcont");
354 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
358 <p>This code creates the basic blocks that are related to the if/then/else
359 statement, and correspond directly to the blocks in the example above. The
360 first line of this gets the current Function object that is being built. It
361 gets this by asking the builder for the current BasicBlock, and asking that
362 block for its "parent" (the function it is currently embedded into).</p>
364 <p>Once it has that, it creates three blocks. Note that it passes "TheFunction"
365 into the constructor for the "then" block. This causes the constructor to
366 automatically insert the new block onto the end of the specified function. The
367 other two blocks are created, but aren't yet inserted into the function.</p>
369 <p>Once the blocks are created, we can emit the conditional branch that chooses
370 between them. Note that creating new blocks does not implicitly affect the
371 LLVMBuilder, so it is still inserting into the block that the condition
372 went into. Also note that it is creating a branch to the "then" block and the
373 "else" block, even though the "else" block isn't inserted into the function yet.
374 This is all ok: it is the standard way that LLVM supports forward
377 <div class="doc_code">
380 Builder.SetInsertPoint(ThenBB);
382 Value *ThenV = Then->Codegen();
383 if (ThenV == 0) return 0;
385 Builder.CreateBr(MergeBB);
386 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
387 ThenBB = Builder.GetInsertBlock();
391 <p>After the conditional branch is inserted, we move the builder to start
392 inserting into the "then" block. Strictly speaking, this call moves the
393 insertion point to be at the end of the specified block. However, since the
394 "then" block is empty, it also starts out by inserting at the beginning of the
397 <p>Once the insertion point is set, we recursively codegen the "then" expression
398 from the AST. To finish off the then block, we create an unconditional branch
399 to the merge block. One interesting (and very important) aspect of the LLVM IR
400 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
401 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
402 instruction</a> such as return or branch. This means that all control flow,
403 <em>including fall throughs</em> must be made explicit in the LLVM IR. If you
404 violate this rule, the verifier will emit an error.</p>
406 <p>The final line here is quite subtle, but is very important. The basic issue
407 is that when we create the Phi node in the merge block, we need to set up the
408 block/value pairs that indicate how the Phi will work. Importantly, the Phi
409 node expects to have an extry for each predecessor of the block in the CFG. Why
410 then are we getting the current block when we just set it to ThenBB 5 lines
411 above? The problem is that the "Then" expression may actually itself change the
412 block that the Builder is emitting into if, for example, it contains a nested
413 "if/then/else" expression. Because calling Codegen recursively could
414 arbitrarily change the notion of the current block, we are required to get an
415 up-to-date value for code that will set up the Phi node.</p>
417 <div class="doc_code">
420 TheFunction->getBasicBlockList().push_back(ElseBB);
421 Builder.SetInsertPoint(ElseBB);
423 Value *ElseV = Else->Codegen();
424 if (ElseV == 0) return 0;
426 Builder.CreateBr(MergeBB);
427 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
428 ElseBB = Builder.GetInsertBlock();
432 <p>Code generation for the 'else' block is basically identical to codegen for
433 the 'then' block. The only significant difference is the first line, which adds
434 the 'else' block to the function. Recall previously that the 'else' block was
435 created, but not added to the function. Now that the 'then' and 'else' blocks
436 are emitted, we can finish up with the merge code:</p>
438 <div class="doc_code">
441 TheFunction->getBasicBlockList().push_back(MergeBB);
442 Builder.SetInsertPoint(MergeBB);
443 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
445 PN->addIncoming(ThenV, ThenBB);
446 PN->addIncoming(ElseV, ElseBB);
452 <p>The first two lines here are now familiar: the first adds the "merge" block
453 to the Function object (it was previously floating, like the else block above).
454 The second block changes the insertion point so that newly created code will go
455 into the "merge" block. Once that is done, we need to create the PHI node and
456 set up the block/value pairs for the PHI.</p>
458 <p>Finally, the CodeGen function returns the phi node as the value computed by
459 the if/then/else expression. In our example above, this returned value will
460 feed into the code for the top-level function, which will create the return
463 <p>Overall, we now have the ability to execution conditional code in
464 Kaleidoscope. With this extension, Kaleidoscope is a fairly complete language
465 that can calculate a wide variety of numeric functions. Next up we'll add
466 another useful expression that is familiar from non-functional languages...</p>
470 <!-- *********************************************************************** -->
471 <div class="doc_section"><a name="for">'for' Loop Expression</a></div>
472 <!-- *********************************************************************** -->
474 <div class="doc_text">
476 <p>Now that we know how to add basic control flow constructs to the language,
477 we have the tools to add more powerful things. Lets add something more
478 aggressive, a 'for' expression:</p>
480 <div class="doc_code">
482 # print 100 '*' (ascii 42) characters
483 extern putchard(char)
484 for x = 1, x < 100, 1.0 in putchard(42);
488 <p>This expression defines a new variable ("x" in this case) which iterates from
489 a starting value, while the condition ("x < 100" in this case) is true,
490 incrementing by an optional step value ("1.0" in this case). If the step value
491 is omitted, it defaults to 1.0. While the loop is true, it executes its
492 body expression. Because we don't have anything better to return, we'll just
493 define the loop as always returning 0.0. In the future when we have mutable
494 variables, it will get more useful.</p>
496 <p>As before, lets talk about the changes that we need to Kaleidoscope to
501 <!-- ======================================================================= -->
502 <div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
503 the 'for' Loop</a></div>
504 <!-- ======================================================================= -->
506 <div class="doc_text">
508 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
510 <div class="doc_code">
512 ... in enum Token ...
514 tok_if = -6, tok_then = -7, tok_else = -8,
515 <b> tok_for = -9, tok_in = -10</b>
518 if (IdentifierStr == "def") return tok_def;
519 if (IdentifierStr == "extern") return tok_extern;
520 if (IdentifierStr == "if") return tok_if;
521 if (IdentifierStr == "then") return tok_then;
522 if (IdentifierStr == "else") return tok_else;
523 <b>if (IdentifierStr == "for") return tok_for;
524 if (IdentifierStr == "in") return tok_in;</b>
525 return tok_identifier;
531 <!-- ======================================================================= -->
532 <div class="doc_subsubsection"><a name="forast">AST Extensions for
533 the 'for' Loop</a></div>
534 <!-- ======================================================================= -->
536 <div class="doc_text">
538 <p>The AST node is similarly simple. It basically boils down to capturing
539 the variable name and the consituent expressions in the node.</p>
541 <div class="doc_code">
543 /// ForExprAST - Expression class for for/in.
544 class ForExprAST : public ExprAST {
546 ExprAST *Start, *End, *Step, *Body;
548 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
549 ExprAST *step, ExprAST *body)
550 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
551 virtual Value *Codegen();
558 <!-- ======================================================================= -->
559 <div class="doc_subsubsection"><a name="forparser">Parser Extensions for
560 the 'for' Loop</a></div>
561 <!-- ======================================================================= -->
563 <div class="doc_text">
565 <p>The parser code is also fairly standard. The only interesting thing here is
566 handling of the optional step value. The parser code handles it by checking to
567 see if the second comma is present. If not, it sets the step value to null in
570 <div class="doc_code">
572 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
573 static ExprAST *ParseForExpr() {
574 getNextToken(); // eat the for.
576 if (CurTok != tok_identifier)
577 return Error("expected identifier after for");
579 std::string IdName = IdentifierStr;
580 getNextToken(); // eat identifer.
583 return Error("expected '=' after for");
584 getNextToken(); // eat '='.
587 ExprAST *Start = ParseExpression();
588 if (Start == 0) return 0;
590 return Error("expected ',' after for start value");
593 ExprAST *End = ParseExpression();
594 if (End == 0) return 0;
596 // The step value is optional.
600 Step = ParseExpression();
601 if (Step == 0) return 0;
604 if (CurTok != tok_in)
605 return Error("expected 'in' after for");
606 getNextToken(); // eat 'in'.
608 ExprAST *Body = ParseExpression();
609 if (Body == 0) return 0;
611 return new ForExprAST(IdName, Start, End, Step, Body);
618 <!-- ======================================================================= -->
619 <div class="doc_subsubsection"><a name="forir">LLVM IR for
620 the 'for' Loop</a></div>
621 <!-- ======================================================================= -->
623 <div class="doc_text">
625 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
632 <!-- ======================================================================= -->
633 <div class="doc_subsubsection"><a name="forcodegen">Code Generation for
634 the 'for' Loop</a></div>
635 <!-- ======================================================================= -->
637 <div class="doc_text">
640 <div class="doc_code">
642 Value *ForExprAST::Codegen() {
648 // variable = phi [start, loopheader], [nextvariable, loopend]
654 // nextvariable = variable + step
656 // br endcond, loop, endloop
659 // Emit the start code first, without 'variable' in scope.
660 Value *StartVal = Start->Codegen();
661 if (StartVal == 0) return 0;
663 // Make the new basic block for the loop header, inserting after current
665 Function *TheFunction = Builder.GetInsertBlock()->getParent();
666 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
667 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
669 // Insert an explicit fall through from the current block to the LoopBB.
670 // Start insertion in LoopBB.
671 Builder.CreateBr(LoopBB);
672 Builder.SetInsertPoint(LoopBB);
674 // Start the PHI node with an entry for Start.
675 PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
676 Variable->addIncoming(StartVal, PreheaderBB);
678 // Within the loop, the variable is defined equal to the PHI node. If it
679 // shadows an existing variable, we have to restore it, so save it now.
680 Value *OldVal = NamedValues[VarName];
681 NamedValues[VarName] = Variable;
683 // Emit the body of the loop. This, like any other expr, can change the
684 // current BB. Note that we ignore the value computed by the body, but don't
686 if (Body->Codegen() == 0)
689 // Emit the step value.
692 StepVal = Step->Codegen();
693 if (StepVal == 0) return 0;
695 // If not specified, use 1.0.
696 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
699 Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
701 // When evaluating the end condition, the value of the variable is the
702 // incremented value.
703 NamedValues[VarName] = Variable;
706 // Compute the end condition.
707 Value *EndCond = End->Codegen();
708 if (EndCond == 0) return EndCond;
710 // Convert condition to a bool by comparing equal to 0.0.
711 EndCond = Builder.CreateFCmpONE(EndCond,
712 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
715 // Create the "after loop" block and insert it.
716 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
717 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
719 // Insert the conditional branch into the end of LoopEndBB.
720 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
722 // Any new code will be inserted in AfterBB.
723 Builder.SetInsertPoint(AfterBB);
725 // Add a new entry to the PHI node for the backedge.
726 Variable->addIncoming(NextVar, LoopEndBB);
728 // Restore the unshadowed variable.
730 NamedValues[VarName] = OldVal;
732 NamedValues.erase(VarName);
735 // for expr always returns 0.0.
736 return Constant::getNullValue(Type::DoubleTy);
744 <!-- *********************************************************************** -->
745 <div class="doc_section"><a name="code">Full Code Listing</a></div>
746 <!-- *********************************************************************** -->
748 <div class="doc_text">
751 Here is the complete code listing for our running example, enhanced with the
752 if/then/else and for expressions.. To build this example, use:
755 <div class="doc_code">
758 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
764 <p>Here is the code:</p>
766 <div class="doc_code">
774 <!-- *********************************************************************** -->
777 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
778 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
779 <a href="http://validator.w3.org/check/referer"><img
780 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
782 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
783 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
784 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $