docs/tutorial/LangImpl5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Control Flow</title>
   7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   8   <meta name="author" content="Chris Lattner">
   9   <link rel="stylesheet" href="../llvm.css" type="text/css">
  10 </head>
  11
  12 <body>
  13
  14 <div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
  15
  16 <div class="doc_author">
  17   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  18 </div>
  19
  20 <!-- *********************************************************************** -->
  21 <div class="doc_section"><a name="intro">Part 5 Introduction</a></div>
  22 <!-- *********************************************************************** -->
  23
  24 <div class="doc_text">
  25
  26 <p>Welcome to Part 5 of the "<a href="index.html">Implementing a language with
  27 LLVM</a>" tutorial.  Parts 1-4 described the implementation of the simple
  28 Kaleidoscope language and included support for generating LLVM IR, following by
  29 optimizations and a JIT compiler.  Unfortunately, as presented, Kaleidoscope is
  30 mostly useless: it has no control flow other than call and return.  This means
  31 that you can't have conditional branches in the code, significantly limiting its
  32 power.  In this episode of "build that compiler", we'll extend Kaleidoscope to
  33 have an if/then/else expression plus a simple looping construct.</p>
  34
  35 </div>
  36
  37 <!-- *********************************************************************** -->
  38 <div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
  39 <!-- *********************************************************************** -->
  40
  41 <div class="doc_text">
  42
  43 <p>
  44 Extending Kaleidoscope to support if/then/else is quite straight-forward.  It
  45 basically requires adding lexer support for this "new" concept to the lexer,
  46 parser, AST, and LLVM code emitter.  This example is nice, because it shows how
  47 easy it is to "grow" a language over time, incrementally extending it as new
  48 ideas are discovered.</p>
  49
  50 <p>Before we get going on "how" we do this extension, lets talk about what we
  51 want.  The basic idea is that we want to be able to write this sort of thing:
  52 </p>
  53
  54 <div class="doc_code">
  55 <pre>
  56 def fib(x)
  57   if x &lt; 3 then
  58     1
  59   else
  60     fib(x-1)+fib(x-2);
  61 </pre>
  62 </div>
  63
  64 <p>In Kaleidoscope, every construct is an expression: there are no statements.
  65 As such, the if/then/else expression needs to return a value like any other.
  66 Since we're using a mostly functional form, we'll have it evaluate its
  67 conditional, then return the 'then' or 'else' value based on how the condition
  68 was resolved.  This is very similar to the C "?:" expression.</p>
  69
  70 <p>The semantics of the if/then/else expression is that it first evaluates the
  71 condition to a boolean equality value: 0.0 is false and everything else is true.
  72 If the condition is true, the first subexpression is evaluated and returned, if
  73 the condition is false, the second subexpression is evaluated and returned.
  74 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
  75 </p>
  76
  77 <p>Now that we know what we want, lets break this down into its constituent
  78 pieces.</p>
  79
  80 </div>
  81
  82 <!-- ======================================================================= -->
  83 <div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
  84 If/Then/Else</a></div>
  85 <!-- ======================================================================= -->
  86
  87
  88 <div class="doc_text">
  89
  90 <p>The lexer extensions are straight-forward.  First we add new enum values
  91 for the relevant tokens:</p>
  92
  93 <div class="doc_code">
  94 <pre>
  95   // control
  96   tok_if = -6, tok_then = -7, tok_else = -8,
  97 </pre>
  98 </div>
  99
 100 <p>Once we have that, we recognize the new keywords in the lexer, pretty simple
 101 stuff:</p>
 102
 103 <div class="doc_code">
 104 <pre>
 105     ...
 106     if (IdentifierStr == "def") return tok_def;
 107     if (IdentifierStr == "extern") return tok_extern;
 108     <b>if (IdentifierStr == "if") return tok_if;
 109     if (IdentifierStr == "then") return tok_then;
 110     if (IdentifierStr == "else") return tok_else;</b>
 111     return tok_identifier;
 112 </pre>
 113 </div>
 114
 115 </div>
 116
 117 <!-- ======================================================================= -->
 118 <div class="doc_subsubsection"><a name="ifast">AST Extensions for
 119  If/Then/Else </a></div>
 120 <!-- ======================================================================= -->
 121
 122 <div class="doc_text">
 123
 124 <p>To represent the new expression we add a new AST node for it:</p>
 125
 126 <div class="doc_code">
 127 <pre>
 128 /// IfExprAST - Expression class for if/then/else.
 129 class IfExprAST : public ExprAST {
 130   ExprAST *Cond, *Then, *Else;
 131 public:
 132   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
 133     : Cond(cond), Then(then), Else(_else) {}
 134   virtual Value *Codegen();
 135 };
 136 </pre>
 137 </div>
 138
 139 <p>The AST node just has pointers to the various subexpressions.</p>
 140
 141 </div>
 142
 143 <!-- ======================================================================= -->
 144 <div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
 145 If/Then/Else </a></div>
 146 <!-- ======================================================================= -->
 147
 148 <div class="doc_text">
 149
 150 <p>Now that we have the relevant tokens coming from the lexer and we have the
 151 AST node to build, our parsing logic is relatively straight-forward.  First we
 152 define a new parsing function:</p>
 153
 154 <div class="doc_code">
 155 <pre>
 156 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
 157 static ExprAST *ParseIfExpr() {
 158   getNextToken();  // eat the if.
 159
 160   // condition.
 161   ExprAST *Cond = ParseExpression();
 162   if (!Cond) return 0;
 163
 164   if (CurTok != tok_then)
 165     return Error("expected then");
 166   getNextToken();  // eat the then
 167
 168   ExprAST *Then = ParseExpression();
 169   if (Then == 0) return 0;
 170
 171   if (CurTok != tok_else)
 172     return Error("expected else");
 173
 174   getNextToken();
 175
 176   ExprAST *Else = ParseExpression();
 177   if (!Else) return 0;
 178
 179   return new IfExprAST(Cond, Then, Else);
 180 }
 181 </pre>
 182 </div>
 183
 184 <p>Next we hook it up as a primary expression:</p>
 185
 186 <div class="doc_code">
 187 <pre>
 188 static ExprAST *ParsePrimary() {
 189   switch (CurTok) {
 190   default: return Error("unknown token when expecting an expression");
 191   case tok_identifier: return ParseIdentifierExpr();
 192   case tok_number:     return ParseNumberExpr();
 193   case '(':            return ParseParenExpr();
 194   <b>case tok_if:         return ParseIfExpr();</b>
 195   }
 196 }
 197 </pre>
 198 </div>
 199
 200 </div>
 201
 202 <!-- ======================================================================= -->
 203 <div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
 204 <!-- ======================================================================= -->
 205
 206 <div class="doc_text">
 207
 208 <p>Now that we have it parsing and building the AST, the final piece is adding
 209 LLVM code generation support.  This is the most interesting part of the
 210 if/then/else example, because this is where it starts to introduce new concepts.
 211 All of the code above has been described in previous chapters fairly thoroughly.
 212 </p>
 213
 214 <p>To motivate the code we want to produce, lets take a look at a simple
 215 example.  Consider:</p>
 216
 217 <div class="doc_code">
 218 <pre>
 219 extern foo();
 220 extern bar();
 221 def baz(x) if x then foo() else bar();
 222 </pre>
 223 </div>
 224
 225 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
 226 looks like this:</p>
 227
 228 <div class="doc_code">
 229 <pre>
 230 declare double @foo()
 231
 232 declare double @bar()
 233
 234 define double @baz(double %x) {
 235 entry:
 236         %ifcond = fcmp one double %x, 0.000000e+00
 237         br i1 %ifcond, label %then, label %else
 238
 239 then:           ; preds = %entry
 240         %calltmp = call double @foo()
 241         br label %ifcont
 242
 243 else:           ; preds = %entry
 244         %calltmp1 = call double @bar()
 245         br label %ifcont
 246
 247 ifcont:         ; preds = %else, %then
 248         %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
 249         ret double %iftmp
 250 }
 251 </pre>
 252 </div>
 253
 254 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
 255 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool.  If you put this LLVM IR
 256 into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
 257 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
 258 see this graph:</p>
 259
 260 <center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
 261 height="315"></center>
 262
 263 <p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
 264 "<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
 265 inserting actual calls into the code and recompiling or by calling these in the
 266 debugger.  LLVM has many nice features for visualizing various graphs.</p>
 267
 268 <p>Coming back to the generated code, it is fairly simple: the entry block
 269 evaluates the conditional expression ("x" in our case here) and compares the
 270 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
 271 instruction ('one' is "ordered and not equal").  Based on the result of this
 272 expression, the code jumps to either the "then" or "else" blocks, which contain
 273 the expressions for the true/false case.</p>
 274
 275 <p>Once the then/else blocks is finished executing, they both branch back to the
 276 else block to execute the code that happens after the if/then/else.  In this
 277 case the only thing left to do is to return to the caller of the function.  The
 278 question then becomes: how does the code know which expression to return?</p>
 279
 280 <p>The answer to this question involves an important SSA operation: the
 281 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
 282 operation</a>.  If you're not familiar with SSA, <a
 283 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
 284 article</a> is a good introduction and there are various other introductions to
 285 it available on your favorite search engine.  The short version is that
 286 "execution" of the Phi operation requires "remembering" which block control came
 287 from.  The Phi operation takes on the value corresponding to the input control
 288 block.  In this case, if control comes in from the "then" block, it gets the
 289 value of "calltmp".  If control comes from the "else" block, it gets the value
 290 of "calltmp1".</p>
 291
 292 <p>At this point, you are probably starting to think "on no! this means my
 293 simple and elegant front-end will have to start generating SSA form in order to
 294 use LLVM!".  Fortunately, this is not the case, and we strongly advise
 295 <em>not</em> implementing an SSA construction algorithm in your front-end
 296 unless there is an amazingly good reason to do so.  In practice, there are two
 297 sorts of values that float around in code written in your average imperative
 298 programming language that might need Phi nodes:</p>
 299
 300 <ol>
 301 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
 302 <li>Values that are implicit in the structure of your AST, such as the phi node
 303 in this case.</li>
 304 </ol>
 305
 306 <p>At a future point in this tutorial ("mutable variables"), we'll talk about #1
 307 in depth.  For now, just believe me that you don't need SSA construction to
 308 handle them.  For #2, you have the choice of using the techniques that we will
 309 describe for #1, or you can insert Phi nodes directly if convenient.  In this
 310 case, it is really really easy to generate the Phi node, so we choose to do it
 311 directly.</p>
 312
 313 <p>Okay, enough of the motivation and overview, lets generate code!</p>
 314
 315 </div>
 316
 317 <!-- ======================================================================= -->
 318 <div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
 319 If/Then/Else</a></div>
 320 <!-- ======================================================================= -->
 321
 322 <div class="doc_text">
 323
 324 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
 325 for <tt>IfExprAST</tt>:</p>
 326
 327 <div class="doc_code">
 328 <pre>
 329 Value *IfExprAST::Codegen() {
 330   Value *CondV = Cond-&gt;Codegen();
 331   if (CondV == 0) return 0;
 332
 333   // Convert condition to a bool by comparing equal to 0.0.
 334   CondV = Builder.CreateFCmpONE(CondV,
 335                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 336                                 "ifcond");
 337 </pre>
 338 </div>
 339
 340 <p>This code is straight-forward and similar to what we saw before.  We emit the
 341 expression for the condition, then compare that value to zero to get a truth
 342 value as a 1-bit (bool) value.</p>
 343
 344 <div class="doc_code">
 345 <pre>
 346   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 347
 348   // Create blocks for the then and else cases.  Insert the 'then' block at the
 349   // end of the function.
 350   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
 351   BasicBlock *ElseBB = new BasicBlock("else");
 352   BasicBlock *MergeBB = new BasicBlock("ifcont");
 353
 354   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
 355 </pre>
 356 </div>
 357
 358 <p>This code creates the basic blocks that are related to the if/then/else
 359 statement, and correspond directly to the blocks in the example above.  The
 360 first line of this gets the current Function object that is being built.  It
 361 gets this by asking the builder for the current BasicBlock, and asking that
 362 block for its "parent" (the function it is currently embedded into).</p>
 363
 364 <p>Once it has that, it creates three blocks.  Note that it passes "TheFunction"
 365 into the constructor for the "then" block.  This causes the constructor to
 366 automatically insert the new block onto the end of the specified function.  The
 367 other two blocks are created, but aren't yet inserted into the function.</p>
 368
 369 <p>Once the blocks are created, we can emit the conditional branch that chooses
 370 between them.  Note that creating new blocks does not implicitly affect the
 371 LLVMBuilder, so it is still inserting into the block that the condition
 372 went into.  Also note that it is creating a branch to the "then" block and the
 373 "else" block, even though the "else" block isn't inserted into the function yet.
 374 This is all ok: it is the standard way that LLVM supports forward
 375 references.</p>
 376
 377 <div class="doc_code">
 378 <pre>
 379   // Emit then value.
 380   Builder.SetInsertPoint(ThenBB);
 381
 382   Value *ThenV = Then-&gt;Codegen();
 383   if (ThenV == 0) return 0;
 384
 385   Builder.CreateBr(MergeBB);
 386   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
 387   ThenBB = Builder.GetInsertBlock();
 388 </pre>
 389 </div>
 390
 391 <p>After the conditional branch is inserted, we move the builder to start
 392 inserting into the "then" block.  Strictly speaking, this call moves the
 393 insertion point to be at the end of the specified block.  However, since the
 394 "then" block is empty, it also starts out by inserting at the beginning of the
 395 block.  :)</p>
 396
 397 <p>Once the insertion point is set, we recursively codegen the "then" expression
 398 from the AST.  To finish off the then block, we create an unconditional branch
 399 to the merge block.  One interesting (and very important) aspect of the LLVM IR
 400 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
 401 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
 402 instruction</a> such as return or branch.  This means that all control flow,
 403 <em>including fall throughs</em> must be made explicit in the LLVM IR.  If you
 404 violate this rule, the verifier will emit an error.</p>
 405
 406 <p>The final line here is quite subtle, but is very important.  The basic issue
 407 is that when we create the Phi node in the merge block, we need to set up the
 408 block/value pairs that indicate how the Phi will work.  Importantly, the Phi
 409 node expects to have an extry for each predecessor of the block in the CFG.  Why
 410 then are we getting the current block when we just set it to ThenBB 5 lines
 411 above?  The problem is that the "Then" expression may actually itself change the
 412 block that the Builder is emitting into if, for example, it contains a nested
 413 "if/then/else" expression.  Because calling Codegen recursively could
 414 arbitrarily change the notion of the current block, we are required to get an
 415 up-to-date value for code that will set up the Phi node.</p>
 416
 417 <div class="doc_code">
 418 <pre>
 419   // Emit else block.
 420   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
 421   Builder.SetInsertPoint(ElseBB);
 422
 423   Value *ElseV = Else-&gt;Codegen();
 424   if (ElseV == 0) return 0;
 425
 426   Builder.CreateBr(MergeBB);
 427   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
 428   ElseBB = Builder.GetInsertBlock();
 429 </pre>
 430 </div>
 431
 432 <p>Code generation for the 'else' block is basically identical to codegen for
 433 the 'then' block.  The only significant difference is the first line, which adds
 434 the 'else' block to the function.  Recall previously that the 'else' block was
 435 created, but not added to the function.  Now that the 'then' and 'else' blocks
 436 are emitted, we can finish up with the merge code:</p>
 437
 438 <div class="doc_code">
 439 <pre>
 440   // Emit merge block.
 441   TheFunction->getBasicBlockList().push_back(MergeBB);
 442   Builder.SetInsertPoint(MergeBB);
 443   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
 444
 445   PN->addIncoming(ThenV, ThenBB);
 446   PN->addIncoming(ElseV, ElseBB);
 447   return PN;
 448 }
 449 </pre>
 450 </div>
 451
 452 <p>The first two lines here are now familiar: the first adds the "merge" block
 453 to the Function object (it was previously floating, like the else block above).
 454 The second block changes the insertion point so that newly created code will go
 455 into the "merge" block.  Once that is done, we need to create the PHI node and
 456 set up the block/value pairs for the PHI.</p>
 457
 458 <p>Finally, the CodeGen function returns the phi node as the value computed by
 459 the if/then/else expression.  In our example above, this returned value will
 460 feed into the code for the top-level function, which will create the return
 461 instruction.</p>
 462
 463 <p>Overall, we now have the ability to execution conditional code in
 464 Kaleidoscope.  With this extension, Kaleidoscope is a fairly complete language
 465 that can calculate a wide variety of numeric functions.  Next up we'll add
 466 another useful expression that is familiar from non-functional languages...</p>
 467
 468 </div>
 469
 470 <!-- *********************************************************************** -->
 471 <div class="doc_section"><a name="for">'for' Loop Expression</a></div>
 472 <!-- *********************************************************************** -->
 473
 474 <div class="doc_text">
 475
 476 <p>Now that we know how to add basic control flow constructs to the language,
 477 we have the tools to add more powerful things.  Lets add something more
 478 aggressive, a 'for' expression:</p>
 479
 480 <div class="doc_code">
 481 <pre>
 482  extern putchard(char)
 483  def printstar(n)
 484    for i = 1, i &lt; n, 1.0 in
 485      putchard(42);  # ascii 42 = '*'
 486
 487  # print 100 '*' characters
 488  printstar(100);
 489 </pre>
 490 </div>
 491
 492 <p>This expression defines a new variable ("i" in this case) which iterates from
 493 a starting value, while the condition ("i &lt; n" in this case) is true,
 494 incrementing by an optional step value ("1.0" in this case).  If the step value
 495 is omitted, it defaults to 1.0.  While the loop is true, it executes its
 496 body expression.  Because we don't have anything better to return, we'll just
 497 define the loop as always returning 0.0.  In the future when we have mutable
 498 variables, it will get more useful.</p>
 499
 500 <p>As before, lets talk about the changes that we need to Kaleidoscope to
 501 support this.</p>
 502
 503 </div>
 504
 505 <!-- ======================================================================= -->
 506 <div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
 507 the 'for' Loop</a></div>
 508 <!-- ======================================================================= -->
 509
 510 <div class="doc_text">
 511
 512 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
 513
 514 <div class="doc_code">
 515 <pre>
 516   ... in enum Token ...
 517   // control
 518   tok_if = -6, tok_then = -7, tok_else = -8,
 519 <b>  tok_for = -9, tok_in = -10</b>
 520
 521   ... in gettok ...
 522   if (IdentifierStr == "def") return tok_def;
 523   if (IdentifierStr == "extern") return tok_extern;
 524   if (IdentifierStr == "if") return tok_if;
 525   if (IdentifierStr == "then") return tok_then;
 526   if (IdentifierStr == "else") return tok_else;
 527   <b>if (IdentifierStr == "for") return tok_for;
 528   if (IdentifierStr == "in") return tok_in;</b>
 529   return tok_identifier;
 530 </pre>
 531 </div>
 532
 533 </div>
 534
 535 <!-- ======================================================================= -->
 536 <div class="doc_subsubsection"><a name="forast">AST Extensions for
 537 the 'for' Loop</a></div>
 538 <!-- ======================================================================= -->
 539
 540 <div class="doc_text">
 541
 542 <p>The AST node is similarly simple.  It basically boils down to capturing
 543 the variable name and the consituent expressions in the node.</p>
 544
 545 <div class="doc_code">
 546 <pre>
 547 /// ForExprAST - Expression class for for/in.
 548 class ForExprAST : public ExprAST {
 549   std::string VarName;
 550   ExprAST *Start, *End, *Step, *Body;
 551 public:
 552   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
 553              ExprAST *step, ExprAST *body)
 554     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
 555   virtual Value *Codegen();
 556 };
 557 </pre>
 558 </div>
 559
 560 </div>
 561
 562 <!-- ======================================================================= -->
 563 <div class="doc_subsubsection"><a name="forparser">Parser Extensions for
 564 the 'for' Loop</a></div>
 565 <!-- ======================================================================= -->
 566
 567 <div class="doc_text">
 568
 569 <p>The parser code is also fairly standard.  The only interesting thing here is
 570 handling of the optional step value.  The parser code handles it by checking to
 571 see if the second comma is present.  If not, it sets the step value to null in
 572 the AST node:</p>
 573
 574 <div class="doc_code">
 575 <pre>
 576 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
 577 static ExprAST *ParseForExpr() {
 578   getNextToken();  // eat the for.
 579
 580   if (CurTok != tok_identifier)
 581     return Error("expected identifier after for");
 582
 583   std::string IdName = IdentifierStr;
 584   getNextToken();  // eat identifer.
 585
 586   if (CurTok != '=')
 587     return Error("expected '=' after for");
 588   getNextToken();  // eat '='.
 589
 590
 591   ExprAST *Start = ParseExpression();
 592   if (Start == 0) return 0;
 593   if (CurTok != ',')
 594     return Error("expected ',' after for start value");
 595   getNextToken();
 596
 597   ExprAST *End = ParseExpression();
 598   if (End == 0) return 0;
 599
 600   // The step value is optional.
 601   ExprAST *Step = 0;
 602   if (CurTok == ',') {
 603     getNextToken();
 604     Step = ParseExpression();
 605     if (Step == 0) return 0;
 606   }
 607
 608   if (CurTok != tok_in)
 609     return Error("expected 'in' after for");
 610   getNextToken();  // eat 'in'.
 611
 612   ExprAST *Body = ParseExpression();
 613   if (Body == 0) return 0;
 614
 615   return new ForExprAST(IdName, Start, End, Step, Body);
 616 }
 617 </pre>
 618 </div>
 619
 620 </div>
 621
 622 <!-- ======================================================================= -->
 623 <div class="doc_subsubsection"><a name="forir">LLVM IR for
 624 the 'for' Loop</a></div>
 625 <!-- ======================================================================= -->
 626
 627 <div class="doc_text">
 628
 629 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
 630 With the simple example above, we get this LLVM IR (note that this dump is
 631 generated with optimizations disabled):
 632 </p>
 633
 634 <div class="doc_code">
 635 <pre>
 636 declare double @putchard(double)
 637
 638 define double @printstar(double %n) {
 639 entry:
 640         ; initial value = 1.0 (inlined into phi)
 641         br label %loop
 642
 643 loop:           ; preds = %loop, %entry
 644         %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
 645         ; body
 646         %calltmp = call double @putchard( double 4.200000e+01 )
 647         ; increment
 648         %nextvar = add double %i, 1.000000e+00
 649
 650         ; termination test
 651         %multmp = fcmp ult double %i, %n
 652         %booltmp = uitofp i1 %multmp to double
 653         %loopcond = fcmp one double %booltmp, 0.000000e+00
 654         br i1 %loopcond, label %loop, label %afterloop
 655
 656 afterloop:              ; preds = %loop
 657         ; loop always returns 0.0
 658         ret double 0.000000e+00
 659 }
 660 </pre>
 661 </div>
 662
 663 <p>This loop contains all the same constructs we saw before: a phi node, several
 664 expressions, and some basic blocks.  Lets see how this fits together.</p>
 665
 666 </div>
 667
 668 <!-- ======================================================================= -->
 669 <div class="doc_subsubsection"><a name="forcodegen">Code Generation for
 670 the 'for' Loop</a></div>
 671 <!-- ======================================================================= -->
 672
 673 <div class="doc_text">
 674
 675 <p>The first part of codegen is very simple: we just output the start expression
 676 for the loop value:</p>
 677
 678 <div class="doc_code">
 679 <pre>
 680 Value *ForExprAST::Codegen() {
 681   // Emit the start code first, without 'variable' in scope.
 682   Value *StartVal = Start-&gt;Codegen();
 683   if (StartVal == 0) return 0;
 684 </pre>
 685 </div>
 686
 687 <p>With this out of the way, the next step is to set up the LLVM basic block
 688 for the start of the loop body.  In the case above, the whole loop body is one
 689 block, but remember that the body code itself could consist of multiple blocks
 690 (e.g. if it is a if/then/else expression).</p>
 691
 692 <div class="doc_code">
 693 <pre>
 694   // Make the new basic block for the loop header, inserting after current
 695   // block.
 696   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 697   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
 698   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
 699
 700   // Insert an explicit fall through from the current block to the LoopBB.
 701   Builder.CreateBr(LoopBB);
 702 </pre>
 703 </div>
 704
 705 <p>This code is similar to what we saw for if/then/else.  Because we will need
 706 it to create the Phi node, we remember the block that falls through into the
 707 loop.  Once we have that, we create the actual block that starts the loop and
 708 create an unconditional branch for the fall-through between the two blocks.</p>
 709
 710 <div class="doc_code">
 711 <pre>
 712   // Start insertion in LoopBB.
 713   Builder.SetInsertPoint(LoopBB);
 714
 715   // Start the PHI node with an entry for Start.
 716   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
 717   Variable-&gt;addIncoming(StartVal, PreheaderBB);
 718 </pre>
 719 </div>
 720
 721 <p>Now that the "preheader" for the loop is set up, we switch to emitting code
 722 for the loop body.  To begin with, we move the insertion point and create the
 723 PHI node for the loop induction variable.  SInce we already know the incoming
 724 value for the starting value, we add it to the Phi node.  Note that the Phi will
 725 eventually get a second value for the backedge, but we can't set it up yet
 726 (because it doesn't exist!).</p>
 727
 728 <div class="doc_code">
 729 <pre>
 730   // Within the loop, the variable is defined equal to the PHI node.  If it
 731   // shadows an existing variable, we have to restore it, so save it now.
 732   Value *OldVal = NamedValues[VarName];
 733   NamedValues[VarName] = Variable;
 734
 735   // Emit the body of the loop.  This, like any other expr, can change the
 736   // current BB.  Note that we ignore the value computed by the body, but don't
 737   // allow an error.
 738   if (Body-&gt;Codegen() == 0)
 739     return 0;
 740 </pre>
 741 </div>
 742
 743 <p>Now the code starts to get more interesting.  Our 'for' loop introduces a new
 744 variable to the symbol table.  This means that our symbol table can now contain
 745 either function arguments or loop variables.  To handle this, before we codegen
 746 the body of the loop, we add the loop variable as the current value for its
 747 name.  Note that it is possible that there is a variable of the same name in the
 748 outer scope.  It would be easy to make this an error (emit an error and return
 749 null if there is already an entry for VarName) but we choose to allow shadowing
 750 of variables.  In order to handle this correctly, we remember the Value that
 751 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
 752 no shadowed variable).</p>
 753
 754 <p>Once the loop variable is set into the symbol table, the code recursively
 755 codegen's the body.  This allows the body to use the loop variable: any
 756 references to it will naturally find it in the symbol table.</p>
 757
 758 <div class="doc_code">
 759 <pre>
 760   // Emit the step value.
 761   Value *StepVal;
 762   if (Step) {
 763     StepVal = Step-&gt;Codegen();
 764     if (StepVal == 0) return 0;
 765   } else {
 766     // If not specified, use 1.0.
 767     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
 768   }
 769
 770   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
 771 </pre>
 772 </div>
 773
 774 <p>Now that the body is emitted, we compute the next value of the iteration
 775 variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
 776 will be the value of the loop variable on the next iteration of the loop.</p>
 777
 778 <div class="doc_code">
 779 <pre>
 780   // Compute the end condition.
 781   Value *EndCond = End-&gt;Codegen();
 782   if (EndCond == 0) return EndCond;
 783
 784   // Convert condition to a bool by comparing equal to 0.0.
 785   EndCond = Builder.CreateFCmpONE(EndCond,
 786                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 787                                   "loopcond");
 788 </pre>
 789 </div>
 790
 791 <p>Finally, we evaluate the exit value of the loop, to determine whether the
 792 loop should exit.  This mirrors the condition evaluation for the if/then/else
 793 statement.</p>
 794
 795 <div class="doc_code">
 796 <pre>
 797   // Create the "after loop" block and insert it.
 798   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
 799   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
 800
 801   // Insert the conditional branch into the end of LoopEndBB.
 802   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
 803
 804   // Any new code will be inserted in AfterBB.
 805   Builder.SetInsertPoint(AfterBB);
 806 </pre>
 807 </div>
 808
 809 <p>With the code for the body of the loop complete, we just need to finish up
 810 the control flow for it.  This remembers the end block (for the phi node), then
 811 creates the block for the loop exit ("afterloop").  Based on the value of the
 812 exit condition, it creates a conditional branch that chooses between executing
 813 the loop again and exiting the loop.  Any future code is emitted in the
 814 "afterloop" block, so it sets the insertion position to it.</p>
 815
 816 <div class="doc_code">
 817 <pre>
 818   // Add a new entry to the PHI node for the backedge.
 819   Variable-&gt;addIncoming(NextVar, LoopEndBB);
 820
 821   // Restore the unshadowed variable.
 822   if (OldVal)
 823     NamedValues[VarName] = OldVal;
 824   else
 825     NamedValues.erase(VarName);
 826
 827   // for expr always returns 0.0.
 828   return Constant::getNullValue(Type::DoubleTy);
 829 }
 830 </pre>
 831 </div>
 832
 833 <p>The final code handles various cleanups: now that we have the "NextVar"
 834 value, we can add the incoming value to the loop PHI node.  After that, we
 835 remove the loop variable from the symbol table, so that it isn't in scope after
 836 the for loop.  Finally, code generation of the for loop always returns 0.0, so
 837 that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
 838
 839 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
 840 the tutorial.  We added two control flow constructs, and used them to motivate
 841 a couple of aspects of the LLVM IR that are important for front-end implementors
 842 to know.  In the next chapter of our saga, we will get a bit crazier and add
 843 operator overloading to our poor innocent language.</p>
 844
 845 </div>
 846
 847 <!-- *********************************************************************** -->
 848 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 849 <!-- *********************************************************************** -->
 850
 851 <div class="doc_text">
 852
 853 <p>
 854 Here is the complete code listing for our running example, enhanced with the
 855 if/then/else and for expressions..  To build this example, use:
 856 </p>
 857
 858 <div class="doc_code">
 859 <pre>
 860    # Compile
 861    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 862    # Run
 863    ./toy
 864 </pre>
 865 </div>
 866
 867 <p>Here is the code:</p>
 868
 869 <div class="doc_code">
 870 <pre>
 871 #include "llvm/DerivedTypes.h"
 872 #include "llvm/ExecutionEngine/ExecutionEngine.h"
 873 #include "llvm/Module.h"
 874 #include "llvm/ModuleProvider.h"
 875 #include "llvm/PassManager.h"
 876 #include "llvm/Analysis/Verifier.h"
 877 #include "llvm/Target/TargetData.h"
 878 #include "llvm/Transforms/Scalar.h"
 879 #include "llvm/Support/LLVMBuilder.h"
 880 #include &lt;cstdio&gt;
 881 #include &lt;string&gt;
 882 #include &lt;map&gt;
 883 #include &lt;vector&gt;
 884 using namespace llvm;
 885
 886 //===----------------------------------------------------------------------===//
 887 // Lexer
 888 //===----------------------------------------------------------------------===//
 889
 890 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 891 // of these for known things.
 892 enum Token {
 893   tok_eof = -1,
 894
 895   // commands
 896   tok_def = -2, tok_extern = -3,
 897
 898   // primary
 899   tok_identifier = -4, tok_number = -5,
 900
 901   // control
 902   tok_if = -6, tok_then = -7, tok_else = -8,
 903   tok_for = -9, tok_in = -10
 904 };
 905
 906 static std::string IdentifierStr;  // Filled in if tok_identifier
 907 static double NumVal;              // Filled in if tok_number
 908
 909 /// gettok - Return the next token from standard input.
 910 static int gettok() {
 911   static int LastChar = ' ';
 912
 913   // Skip any whitespace.
 914   while (isspace(LastChar))
 915     LastChar = getchar();
 916
 917   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
 918     IdentifierStr = LastChar;
 919     while (isalnum((LastChar = getchar())))
 920       IdentifierStr += LastChar;
 921
 922     if (IdentifierStr == "def") return tok_def;
 923     if (IdentifierStr == "extern") return tok_extern;
 924     if (IdentifierStr == "if") return tok_if;
 925     if (IdentifierStr == "then") return tok_then;
 926     if (IdentifierStr == "else") return tok_else;
 927     if (IdentifierStr == "for") return tok_for;
 928     if (IdentifierStr == "in") return tok_in;
 929     return tok_identifier;
 930   }
 931
 932   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
 933     std::string NumStr;
 934     do {
 935       NumStr += LastChar;
 936       LastChar = getchar();
 937     } while (isdigit(LastChar) || LastChar == '.');
 938
 939     NumVal = strtod(NumStr.c_str(), 0);
 940     return tok_number;
 941   }
 942
 943   if (LastChar == '#') {
 944     // Comment until end of line.
 945     do LastChar = getchar();
 946     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
 947
 948     if (LastChar != EOF)
 949       return gettok();
 950   }
 951
 952   // Check for end of file.  Don't eat the EOF.
 953   if (LastChar == EOF)
 954     return tok_eof;
 955
 956   // Otherwise, just return the character as its ascii value.
 957   int ThisChar = LastChar;
 958   LastChar = getchar();
 959   return ThisChar;
 960 }
 961
 962 //===----------------------------------------------------------------------===//
 963 // Abstract Syntax Tree (aka Parse Tree)
 964 //===----------------------------------------------------------------------===//
 965
 966 /// ExprAST - Base class for all expression nodes.
 967 class ExprAST {
 968 public:
 969   virtual ~ExprAST() {}
 970   virtual Value *Codegen() = 0;
 971 };
 972
 973 /// NumberExprAST - Expression class for numeric literals like "1.0".
 974 class NumberExprAST : public ExprAST {
 975   double Val;
 976 public:
 977   NumberExprAST(double val) : Val(val) {}
 978   virtual Value *Codegen();
 979 };
 980
 981 /// VariableExprAST - Expression class for referencing a variable, like "a".
 982 class VariableExprAST : public ExprAST {
 983   std::string Name;
 984 public:
 985   VariableExprAST(const std::string &amp;name) : Name(name) {}
 986   virtual Value *Codegen();
 987 };
 988
 989 /// BinaryExprAST - Expression class for a binary operator.
 990 class BinaryExprAST : public ExprAST {
 991   char Op;
 992   ExprAST *LHS, *RHS;
 993 public:
 994   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
 995     : Op(op), LHS(lhs), RHS(rhs) {}
 996   virtual Value *Codegen();
 997 };
 998
 999 /// CallExprAST - Expression class for function calls.
1000 class CallExprAST : public ExprAST {
1001   std::string Callee;
1002   std::vector&lt;ExprAST*&gt; Args;
1003 public:
1004   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1005     : Callee(callee), Args(args) {}
1006   virtual Value *Codegen();
1007 };
1008
1009 /// IfExprAST - Expression class for if/then/else.
1010 class IfExprAST : public ExprAST {
1011   ExprAST *Cond, *Then, *Else;
1012 public:
1013   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1014   : Cond(cond), Then(then), Else(_else) {}
1015   virtual Value *Codegen();
1016 };
1017
1018 /// ForExprAST - Expression class for for/in.
1019 class ForExprAST : public ExprAST {
1020   std::string VarName;
1021   ExprAST *Start, *End, *Step, *Body;
1022 public:
1023   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1024              ExprAST *step, ExprAST *body)
1025     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1026   virtual Value *Codegen();
1027 };
1028
1029 /// PrototypeAST - This class represents the "prototype" for a function,
1030 /// which captures its argument names as well as if it is an operator.
1031 class PrototypeAST {
1032   std::string Name;
1033   std::vector&lt;std::string&gt; Args;
1034 public:
1035   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1036     : Name(name), Args(args) {}
1037
1038   Function *Codegen();
1039 };
1040
1041 /// FunctionAST - This class represents a function definition itself.
1042 class FunctionAST {
1043   PrototypeAST *Proto;
1044   ExprAST *Body;
1045 public:
1046   FunctionAST(PrototypeAST *proto, ExprAST *body)
1047     : Proto(proto), Body(body) {}
1048
1049   Function *Codegen();
1050 };
1051
1052 //===----------------------------------------------------------------------===//
1053 // Parser
1054 //===----------------------------------------------------------------------===//
1055
1056 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1057 /// token the parser it looking at.  getNextToken reads another token from the
1058 /// lexer and updates CurTok with its results.
1059 static int CurTok;
1060 static int getNextToken() {
1061   return CurTok = gettok();
1062 }
1063
1064 /// BinopPrecedence - This holds the precedence for each binary operator that is
1065 /// defined.
1066 static std::map&lt;char, int&gt; BinopPrecedence;
1067
1068 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1069 static int GetTokPrecedence() {
1070   if (!isascii(CurTok))
1071     return -1;
1072
1073   // Make sure it's a declared binop.
1074   int TokPrec = BinopPrecedence[CurTok];
1075   if (TokPrec &lt;= 0) return -1;
1076   return TokPrec;
1077 }
1078
1079 /// Error* - These are little helper functions for error handling.
1080 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1081 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1082 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1083
1084 static ExprAST *ParseExpression();
1085
1086 /// identifierexpr
1087 ///   ::= identifer
1088 ///   ::= identifer '(' expression* ')'
1089 static ExprAST *ParseIdentifierExpr() {
1090   std::string IdName = IdentifierStr;
1091
1092   getNextToken();  // eat identifer.
1093
1094   if (CurTok != '(') // Simple variable ref.
1095     return new VariableExprAST(IdName);
1096
1097   // Call.
1098   getNextToken();  // eat (
1099   std::vector&lt;ExprAST*&gt; Args;
1100   if (CurTok != ')') {
1101     while (1) {
1102       ExprAST *Arg = ParseExpression();
1103       if (!Arg) return 0;
1104       Args.push_back(Arg);
1105
1106       if (CurTok == ')') break;
1107
1108       if (CurTok != ',')
1109         return Error("Expected ')'");
1110       getNextToken();
1111     }
1112   }
1113
1114   // Eat the ')'.
1115   getNextToken();
1116
1117   return new CallExprAST(IdName, Args);
1118 }
1119
1120 /// numberexpr ::= number
1121 static ExprAST *ParseNumberExpr() {
1122   ExprAST *Result = new NumberExprAST(NumVal);
1123   getNextToken(); // consume the number
1124   return Result;
1125 }
1126
1127 /// parenexpr ::= '(' expression ')'
1128 static ExprAST *ParseParenExpr() {
1129   getNextToken();  // eat (.
1130   ExprAST *V = ParseExpression();
1131   if (!V) return 0;
1132
1133   if (CurTok != ')')
1134     return Error("expected ')'");
1135   getNextToken();  // eat ).
1136   return V;
1137 }
1138
1139 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1140 static ExprAST *ParseIfExpr() {
1141   getNextToken();  // eat the if.
1142
1143   // condition.
1144   ExprAST *Cond = ParseExpression();
1145   if (!Cond) return 0;
1146
1147   if (CurTok != tok_then)
1148     return Error("expected then");
1149   getNextToken();  // eat the then
1150
1151   ExprAST *Then = ParseExpression();
1152   if (Then == 0) return 0;
1153
1154   if (CurTok != tok_else)
1155     return Error("expected else");
1156
1157   getNextToken();
1158
1159   ExprAST *Else = ParseExpression();
1160   if (!Else) return 0;
1161
1162   return new IfExprAST(Cond, Then, Else);
1163 }
1164
1165 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1166 static ExprAST *ParseForExpr() {
1167   getNextToken();  // eat the for.
1168
1169   if (CurTok != tok_identifier)
1170     return Error("expected identifier after for");
1171
1172   std::string IdName = IdentifierStr;
1173   getNextToken();  // eat identifer.
1174
1175   if (CurTok != '=')
1176     return Error("expected '=' after for");
1177   getNextToken();  // eat '='.
1178
1179
1180   ExprAST *Start = ParseExpression();
1181   if (Start == 0) return 0;
1182   if (CurTok != ',')
1183     return Error("expected ',' after for start value");
1184   getNextToken();
1185
1186   ExprAST *End = ParseExpression();
1187   if (End == 0) return 0;
1188
1189   // The step value is optional.
1190   ExprAST *Step = 0;
1191   if (CurTok == ',') {
1192     getNextToken();
1193     Step = ParseExpression();
1194     if (Step == 0) return 0;
1195   }
1196
1197   if (CurTok != tok_in)
1198     return Error("expected 'in' after for");
1199   getNextToken();  // eat 'in'.
1200
1201   ExprAST *Body = ParseExpression();
1202   if (Body == 0) return 0;
1203
1204   return new ForExprAST(IdName, Start, End, Step, Body);
1205 }
1206
1207
1208 /// primary
1209 ///   ::= identifierexpr
1210 ///   ::= numberexpr
1211 ///   ::= parenexpr
1212 ///   ::= ifexpr
1213 ///   ::= forexpr
1214 static ExprAST *ParsePrimary() {
1215   switch (CurTok) {
1216   default: return Error("unknown token when expecting an expression");
1217   case tok_identifier: return ParseIdentifierExpr();
1218   case tok_number:     return ParseNumberExpr();
1219   case '(':            return ParseParenExpr();
1220   case tok_if:         return ParseIfExpr();
1221   case tok_for:        return ParseForExpr();
1222   }
1223 }
1224
1225 /// binoprhs
1226 ///   ::= ('+' primary)*
1227 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1228   // If this is a binop, find its precedence.
1229   while (1) {
1230     int TokPrec = GetTokPrecedence();
1231
1232     // If this is a binop that binds at least as tightly as the current binop,
1233     // consume it, otherwise we are done.
1234     if (TokPrec &lt; ExprPrec)
1235       return LHS;
1236
1237     // Okay, we know this is a binop.
1238     int BinOp = CurTok;
1239     getNextToken();  // eat binop
1240
1241     // Parse the primary expression after the binary operator.
1242     ExprAST *RHS = ParsePrimary();
1243     if (!RHS) return 0;
1244
1245     // If BinOp binds less tightly with RHS than the operator after RHS, let
1246     // the pending operator take RHS as its LHS.
1247     int NextPrec = GetTokPrecedence();
1248     if (TokPrec &lt; NextPrec) {
1249       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1250       if (RHS == 0) return 0;
1251     }
1252
1253     // Merge LHS/RHS.
1254     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1255   }
1256 }
1257
1258 /// expression
1259 ///   ::= primary binoprhs
1260 ///
1261 static ExprAST *ParseExpression() {
1262   ExprAST *LHS = ParsePrimary();
1263   if (!LHS) return 0;
1264
1265   return ParseBinOpRHS(0, LHS);
1266 }
1267
1268 /// prototype
1269 ///   ::= id '(' id* ')'
1270 static PrototypeAST *ParsePrototype() {
1271   if (CurTok != tok_identifier)
1272     return ErrorP("Expected function name in prototype");
1273
1274   std::string FnName = IdentifierStr;
1275   getNextToken();
1276
1277   if (CurTok != '(')
1278     return ErrorP("Expected '(' in prototype");
1279
1280   std::vector&lt;std::string&gt; ArgNames;
1281   while (getNextToken() == tok_identifier)
1282     ArgNames.push_back(IdentifierStr);
1283   if (CurTok != ')')
1284     return ErrorP("Expected ')' in prototype");
1285
1286   // success.
1287   getNextToken();  // eat ')'.
1288
1289   return new PrototypeAST(FnName, ArgNames);
1290 }
1291
1292 /// definition ::= 'def' prototype expression
1293 static FunctionAST *ParseDefinition() {
1294   getNextToken();  // eat def.
1295   PrototypeAST *Proto = ParsePrototype();
1296   if (Proto == 0) return 0;
1297
1298   if (ExprAST *E = ParseExpression())
1299     return new FunctionAST(Proto, E);
1300   return 0;
1301 }
1302
1303 /// toplevelexpr ::= expression
1304 static FunctionAST *ParseTopLevelExpr() {
1305   if (ExprAST *E = ParseExpression()) {
1306     // Make an anonymous proto.
1307     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1308     return new FunctionAST(Proto, E);
1309   }
1310   return 0;
1311 }
1312
1313 /// external ::= 'extern' prototype
1314 static PrototypeAST *ParseExtern() {
1315   getNextToken();  // eat extern.
1316   return ParsePrototype();
1317 }
1318
1319 //===----------------------------------------------------------------------===//
1320 // Code Generation
1321 //===----------------------------------------------------------------------===//
1322
1323 static Module *TheModule;
1324 static LLVMFoldingBuilder Builder;
1325 static std::map&lt;std::string, Value*&gt; NamedValues;
1326 static FunctionPassManager *TheFPM;
1327
1328 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1329
1330 Value *NumberExprAST::Codegen() {
1331   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1332 }
1333
1334 Value *VariableExprAST::Codegen() {
1335   // Look this variable up in the function.
1336   Value *V = NamedValues[Name];
1337   return V ? V : ErrorV("Unknown variable name");
1338 }
1339
1340 Value *BinaryExprAST::Codegen() {
1341   Value *L = LHS-&gt;Codegen();
1342   Value *R = RHS-&gt;Codegen();
1343   if (L == 0 || R == 0) return 0;
1344
1345   switch (Op) {
1346   case '+': return Builder.CreateAdd(L, R, "addtmp");
1347   case '-': return Builder.CreateSub(L, R, "subtmp");
1348   case '*': return Builder.CreateMul(L, R, "multmp");
1349   case '&lt;':
1350     L = Builder.CreateFCmpULT(L, R, "multmp");
1351     // Convert bool 0/1 to double 0.0 or 1.0
1352     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1353   default: return ErrorV("invalid binary operator");
1354   }
1355 }
1356
1357 Value *CallExprAST::Codegen() {
1358   // Look up the name in the global module table.
1359   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1360   if (CalleeF == 0)
1361     return ErrorV("Unknown function referenced");
1362
1363   // If argument mismatch error.
1364   if (CalleeF-&gt;arg_size() != Args.size())
1365     return ErrorV("Incorrect # arguments passed");
1366
1367   std::vector&lt;Value*&gt; ArgsV;
1368   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1369     ArgsV.push_back(Args[i]-&gt;Codegen());
1370     if (ArgsV.back() == 0) return 0;
1371   }
1372
1373   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1374 }
1375
1376 Value *IfExprAST::Codegen() {
1377   Value *CondV = Cond-&gt;Codegen();
1378   if (CondV == 0) return 0;
1379
1380   // Convert condition to a bool by comparing equal to 0.0.
1381   CondV = Builder.CreateFCmpONE(CondV,
1382                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1383                                 "ifcond");
1384
1385   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1386
1387   // Create blocks for the then and else cases.  Insert the 'then' block at the
1388   // end of the function.
1389   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1390   BasicBlock *ElseBB = new BasicBlock("else");
1391   BasicBlock *MergeBB = new BasicBlock("ifcont");
1392
1393   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1394
1395   // Emit then value.
1396   Builder.SetInsertPoint(ThenBB);
1397
1398   Value *ThenV = Then-&gt;Codegen();
1399   if (ThenV == 0) return 0;
1400
1401   Builder.CreateBr(MergeBB);
1402   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1403   ThenBB = Builder.GetInsertBlock();
1404
1405   // Emit else block.
1406   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1407   Builder.SetInsertPoint(ElseBB);
1408
1409   Value *ElseV = Else-&gt;Codegen();
1410   if (ElseV == 0) return 0;
1411
1412   Builder.CreateBr(MergeBB);
1413   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1414   ElseBB = Builder.GetInsertBlock();
1415
1416   // Emit merge block.
1417   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1418   Builder.SetInsertPoint(MergeBB);
1419   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1420
1421   PN-&gt;addIncoming(ThenV, ThenBB);
1422   PN-&gt;addIncoming(ElseV, ElseBB);
1423   return PN;
1424 }
1425
1426 Value *ForExprAST::Codegen() {
1427   // Output this as:
1428   //   ...
1429   //   start = startexpr
1430   //   goto loop
1431   // loop:
1432   //   variable = phi [start, loopheader], [nextvariable, loopend]
1433   //   ...
1434   //   bodyexpr
1435   //   ...
1436   // loopend:
1437   //   step = stepexpr
1438   //   nextvariable = variable + step
1439   //   endcond = endexpr
1440   //   br endcond, loop, endloop
1441   // outloop:
1442
1443   // Emit the start code first, without 'variable' in scope.
1444   Value *StartVal = Start-&gt;Codegen();
1445   if (StartVal == 0) return 0;
1446
1447   // Make the new basic block for the loop header, inserting after current
1448   // block.
1449   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1450   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1451   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1452
1453   // Insert an explicit fall through from the current block to the LoopBB.
1454   Builder.CreateBr(LoopBB);
1455
1456   // Start insertion in LoopBB.
1457   Builder.SetInsertPoint(LoopBB);
1458
1459   // Start the PHI node with an entry for Start.
1460   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1461   Variable-&gt;addIncoming(StartVal, PreheaderBB);
1462
1463   // Within the loop, the variable is defined equal to the PHI node.  If it
1464   // shadows an existing variable, we have to restore it, so save it now.
1465   Value *OldVal = NamedValues[VarName];
1466   NamedValues[VarName] = Variable;
1467
1468   // Emit the body of the loop.  This, like any other expr, can change the
1469   // current BB.  Note that we ignore the value computed by the body, but don't
1470   // allow an error.
1471   if (Body-&gt;Codegen() == 0)
1472     return 0;
1473
1474   // Emit the step value.
1475   Value *StepVal;
1476   if (Step) {
1477     StepVal = Step-&gt;Codegen();
1478     if (StepVal == 0) return 0;
1479   } else {
1480     // If not specified, use 1.0.
1481     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1482   }
1483
1484   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1485
1486   // Compute the end condition.
1487   Value *EndCond = End-&gt;Codegen();
1488   if (EndCond == 0) return EndCond;
1489
1490   // Convert condition to a bool by comparing equal to 0.0.
1491   EndCond = Builder.CreateFCmpONE(EndCond,
1492                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1493                                   "loopcond");
1494
1495   // Create the "after loop" block and insert it.
1496   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1497   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1498
1499   // Insert the conditional branch into the end of LoopEndBB.
1500   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1501
1502   // Any new code will be inserted in AfterBB.
1503   Builder.SetInsertPoint(AfterBB);
1504
1505   // Add a new entry to the PHI node for the backedge.
1506   Variable-&gt;addIncoming(NextVar, LoopEndBB);
1507
1508   // Restore the unshadowed variable.
1509   if (OldVal)
1510     NamedValues[VarName] = OldVal;
1511   else
1512     NamedValues.erase(VarName);
1513
1514
1515   // for expr always returns 0.0.
1516   return Constant::getNullValue(Type::DoubleTy);
1517 }
1518
1519 Function *PrototypeAST::Codegen() {
1520   // Make the function type:  double(double,double) etc.
1521   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1522   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1523
1524   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1525
1526   // If F conflicted, there was already something named 'Name'.  If it has a
1527   // body, don't allow redefinition or reextern.
1528   if (F-&gt;getName() != Name) {
1529     // Delete the one we just made and get the existing one.
1530     F-&gt;eraseFromParent();
1531     F = TheModule-&gt;getFunction(Name);
1532
1533     // If F already has a body, reject this.
1534     if (!F-&gt;empty()) {
1535       ErrorF("redefinition of function");
1536       return 0;
1537     }
1538
1539     // If F took a different number of args, reject.
1540     if (F-&gt;arg_size() != Args.size()) {
1541       ErrorF("redefinition of function with different # args");
1542       return 0;
1543     }
1544   }
1545
1546   // Set names for all arguments.
1547   unsigned Idx = 0;
1548   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1549        ++AI, ++Idx) {
1550     AI-&gt;setName(Args[Idx]);
1551
1552     // Add arguments to variable symbol table.
1553     NamedValues[Args[Idx]] = AI;
1554   }
1555
1556   return F;
1557 }
1558
1559 Function *FunctionAST::Codegen() {
1560   NamedValues.clear();
1561
1562   Function *TheFunction = Proto-&gt;Codegen();
1563   if (TheFunction == 0)
1564     return 0;
1565
1566   // Create a new basic block to start insertion into.
1567   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1568   Builder.SetInsertPoint(BB);
1569
1570   if (Value *RetVal = Body-&gt;Codegen()) {
1571     // Finish off the function.
1572     Builder.CreateRet(RetVal);
1573
1574     // Validate the generated code, checking for consistency.
1575     verifyFunction(*TheFunction);
1576
1577     // Optimize the function.
1578     TheFPM-&gt;run(*TheFunction);
1579
1580     return TheFunction;
1581   }
1582
1583   // Error reading body, remove function.
1584   TheFunction-&gt;eraseFromParent();
1585   return 0;
1586 }
1587
1588 //===----------------------------------------------------------------------===//
1589 // Top-Level parsing and JIT Driver
1590 //===----------------------------------------------------------------------===//
1591
1592 static ExecutionEngine *TheExecutionEngine;
1593
1594 static void HandleDefinition() {
1595   if (FunctionAST *F = ParseDefinition()) {
1596     if (Function *LF = F-&gt;Codegen()) {
1597       fprintf(stderr, "Read function definition:");
1598       LF-&gt;dump();
1599     }
1600   } else {
1601     // Skip token for error recovery.
1602     getNextToken();
1603   }
1604 }
1605
1606 static void HandleExtern() {
1607   if (PrototypeAST *P = ParseExtern()) {
1608     if (Function *F = P-&gt;Codegen()) {
1609       fprintf(stderr, "Read extern: ");
1610       F-&gt;dump();
1611     }
1612   } else {
1613     // Skip token for error recovery.
1614     getNextToken();
1615   }
1616 }
1617
1618 static void HandleTopLevelExpression() {
1619   // Evaluate a top level expression into an anonymous function.
1620   if (FunctionAST *F = ParseTopLevelExpr()) {
1621     if (Function *LF = F-&gt;Codegen()) {
1622       // JIT the function, returning a function pointer.
1623       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1624
1625       // Cast it to the right type (takes no arguments, returns a double) so we
1626       // can call it as a native function.
1627       double (*FP)() = (double (*)())FPtr;
1628       fprintf(stderr, "Evaluated to %f\n", FP());
1629     }
1630   } else {
1631     // Skip token for error recovery.
1632     getNextToken();
1633   }
1634 }
1635
1636 /// top ::= definition | external | expression | ';'
1637 static void MainLoop() {
1638   while (1) {
1639     fprintf(stderr, "ready&gt; ");
1640     switch (CurTok) {
1641     case tok_eof:    return;
1642     case ';':        getNextToken(); break;  // ignore top level semicolons.
1643     case tok_def:    HandleDefinition(); break;
1644     case tok_extern: HandleExtern(); break;
1645     default:         HandleTopLevelExpression(); break;
1646     }
1647   }
1648 }
1649
1650
1651
1652 //===----------------------------------------------------------------------===//
1653 // "Library" functions that can be "extern'd" from user code.
1654 //===----------------------------------------------------------------------===//
1655
1656 /// putchard - putchar that takes a double and returns 0.
1657 extern "C"
1658 double putchard(double X) {
1659   putchar((char)X);
1660   return 0;
1661 }
1662
1663 //===----------------------------------------------------------------------===//
1664 // Main driver code.
1665 //===----------------------------------------------------------------------===//
1666
1667 int main() {
1668   // Install standard binary operators.
1669   // 1 is lowest precedence.
1670   BinopPrecedence['&lt;'] = 10;
1671   BinopPrecedence['+'] = 20;
1672   BinopPrecedence['-'] = 20;
1673   BinopPrecedence['*'] = 40;  // highest.
1674
1675   // Prime the first token.
1676   fprintf(stderr, "ready&gt; ");
1677   getNextToken();
1678
1679   // Make the module, which holds all the code.
1680   TheModule = new Module("my cool jit");
1681
1682   // Create the JIT.
1683   TheExecutionEngine = ExecutionEngine::create(TheModule);
1684
1685   {
1686     ExistingModuleProvider OurModuleProvider(TheModule);
1687     FunctionPassManager OurFPM(&amp;OurModuleProvider);
1688
1689     // Set up the optimizer pipeline.  Start with registering info about how the
1690     // target lays out data structures.
1691     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1692     // Do simple "peephole" optimizations and bit-twiddling optzns.
1693     OurFPM.add(createInstructionCombiningPass());
1694     // Reassociate expressions.
1695     OurFPM.add(createReassociatePass());
1696     // Eliminate Common SubExpressions.
1697     OurFPM.add(createGVNPass());
1698     // Simplify the control flow graph (deleting unreachable blocks, etc).
1699     OurFPM.add(createCFGSimplificationPass());
1700     // Set the global so the code gen can use this.
1701     TheFPM = &amp;OurFPM;
1702
1703     // Run the main "interpreter loop" now.
1704     MainLoop();
1705
1706     TheFPM = 0;
1707   }  // Free module provider and pass manager.
1708
1709
1710   // Print out all of the generated code.
1711   TheModule-&gt;dump();
1712   return 0;
1713 }
1714 </pre>
1715 </div>
1716
1717 </div>
1718
1719 <!-- *********************************************************************** -->
1720 <hr>
1721 <address>
1722   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1723   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1724   <a href="http://validator.w3.org/check/referer"><img
1725   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1726
1727   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1728   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1729   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1730 </address>
1731 </body>
1732 </html>