docs/tutorial/LangImpl5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Control Flow</title>
   7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   8   <meta name="author" content="Chris Lattner">
   9   <link rel="stylesheet" href="../llvm.css" type="text/css">
  10 </head>
  11
  12 <body>
  13
  14 <div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
  15
  16 <ul>
  17 <li><a href="index.html">Up to Tutorial Index</a></li>
  18 <li>Chapter 5
  19   <ol>
  20     <li><a href="#intro">Chapter 5 Introduction</a></li>
  21     <li><a href="#ifthen">If/Then/Else</a>
  22     <ol>
  23       <li><a href="#iflexer">Lexer Extensions</a></li>
  24       <li><a href="#ifast">AST Extensions</a></li>
  25       <li><a href="#ifparser">Parser Extensions</a></li>
  26       <li><a href="#ifir">LLVM IR</a></li>
  27       <li><a href="#ifcodegen">Code Generation</a></li>
  28     </ol>
  29     </li>
  30     <li><a href="#for">'for' Loop Expression</a>
  31     <ol>
  32       <li><a href="#forlexer">Lexer Extensions</a></li>
  33       <li><a href="#forast">AST Extensions</a></li>
  34       <li><a href="#forparser">Parser Extensions</a></li>
  35       <li><a href="#forir">LLVM IR</a></li>
  36       <li><a href="#forcodegen">Code Generation</a></li>
  37     </ol>
  38     </li>
  39     <li><a href="#code">Full Code Listing</a></li>
  40   </ol>
  41 </li>
  42 <li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language:
  43 User-defined Operators</li>
  44 </ul>
  45
  46 <div class="doc_author">
  47   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  48 </div>
  49
  50 <!-- *********************************************************************** -->
  51 <div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
  52 <!-- *********************************************************************** -->
  53
  54 <div class="doc_text">
  55
  56 <p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
  57 with LLVM</a>" tutorial.  Parts 1-4 described the implementation of the simple
  58 Kaleidoscope language and included support for generating LLVM IR, following by
  59 optimizations and a JIT compiler.  Unfortunately, as presented, Kaleidoscope is
  60 mostly useless: it has no control flow other than call and return.  This means
  61 that you can't have conditional branches in the code, significantly limiting its
  62 power.  In this episode of "build that compiler", we'll extend Kaleidoscope to
  63 have an if/then/else expression plus a simple 'for' loop.</p>
  64
  65 </div>
  66
  67 <!-- *********************************************************************** -->
  68 <div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
  69 <!-- *********************************************************************** -->
  70
  71 <div class="doc_text">
  72
  73 <p>
  74 Extending Kaleidoscope to support if/then/else is quite straight-forward.  It
  75 basically requires adding lexer support for this "new" concept to the lexer,
  76 parser, AST, and LLVM code emitter.  This example is nice, because it shows how
  77 easy it is to "grow" a language over time, incrementally extending it as new
  78 ideas are discovered.</p>
  79
  80 <p>Before we get going on "how" we do this extension, lets talk about what we
  81 want.  The basic idea is that we want to be able to write this sort of thing:
  82 </p>
  83
  84 <div class="doc_code">
  85 <pre>
  86 def fib(x)
  87   if x &lt; 3 then
  88     1
  89   else
  90     fib(x-1)+fib(x-2);
  91 </pre>
  92 </div>
  93
  94 <p>In Kaleidoscope, every construct is an expression: there are no statements.
  95 As such, the if/then/else expression needs to return a value like any other.
  96 Since we're using a mostly functional form, we'll have it evaluate its
  97 conditional, then return the 'then' or 'else' value based on how the condition
  98 was resolved.  This is very similar to the C "?:" expression.</p>
  99
 100 <p>The semantics of the if/then/else expression is that it first evaluates the
 101 condition to a boolean equality value: 0.0 is considered to be false and
 102 everything else is considered to be true.
 103 If the condition is true, the first subexpression is evaluated and returned, if
 104 the condition is false, the second subexpression is evaluated and returned.
 105 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
 106 </p>
 107
 108 <p>Now that we know what we want, lets break this down into its constituent
 109 pieces.</p>
 110
 111 </div>
 112
 113 <!-- ======================================================================= -->
 114 <div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
 115 If/Then/Else</a></div>
 116 <!-- ======================================================================= -->
 117
 118
 119 <div class="doc_text">
 120
 121 <p>The lexer extensions are straight-forward.  First we add new enum values
 122 for the relevant tokens:</p>
 123
 124 <div class="doc_code">
 125 <pre>
 126   // control
 127   tok_if = -6, tok_then = -7, tok_else = -8,
 128 </pre>
 129 </div>
 130
 131 <p>Once we have that, we recognize the new keywords in the lexer, pretty simple
 132 stuff:</p>
 133
 134 <div class="doc_code">
 135 <pre>
 136     ...
 137     if (IdentifierStr == "def") return tok_def;
 138     if (IdentifierStr == "extern") return tok_extern;
 139     <b>if (IdentifierStr == "if") return tok_if;
 140     if (IdentifierStr == "then") return tok_then;
 141     if (IdentifierStr == "else") return tok_else;</b>
 142     return tok_identifier;
 143 </pre>
 144 </div>
 145
 146 </div>
 147
 148 <!-- ======================================================================= -->
 149 <div class="doc_subsubsection"><a name="ifast">AST Extensions for
 150  If/Then/Else</a></div>
 151 <!-- ======================================================================= -->
 152
 153 <div class="doc_text">
 154
 155 <p>To represent the new expression we add a new AST node for it:</p>
 156
 157 <div class="doc_code">
 158 <pre>
 159 /// IfExprAST - Expression class for if/then/else.
 160 class IfExprAST : public ExprAST {
 161   ExprAST *Cond, *Then, *Else;
 162 public:
 163   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
 164     : Cond(cond), Then(then), Else(_else) {}
 165   virtual Value *Codegen();
 166 };
 167 </pre>
 168 </div>
 169
 170 <p>The AST node just has pointers to the various subexpressions.</p>
 171
 172 </div>
 173
 174 <!-- ======================================================================= -->
 175 <div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
 176 If/Then/Else</a></div>
 177 <!-- ======================================================================= -->
 178
 179 <div class="doc_text">
 180
 181 <p>Now that we have the relevant tokens coming from the lexer and we have the
 182 AST node to build, our parsing logic is relatively straight-forward.  First we
 183 define a new parsing function:</p>
 184
 185 <div class="doc_code">
 186 <pre>
 187 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
 188 static ExprAST *ParseIfExpr() {
 189   getNextToken();  // eat the if.
 190
 191   // condition.
 192   ExprAST *Cond = ParseExpression();
 193   if (!Cond) return 0;
 194
 195   if (CurTok != tok_then)
 196     return Error("expected then");
 197   getNextToken();  // eat the then
 198
 199   ExprAST *Then = ParseExpression();
 200   if (Then == 0) return 0;
 201
 202   if (CurTok != tok_else)
 203     return Error("expected else");
 204
 205   getNextToken();
 206
 207   ExprAST *Else = ParseExpression();
 208   if (!Else) return 0;
 209
 210   return new IfExprAST(Cond, Then, Else);
 211 }
 212 </pre>
 213 </div>
 214
 215 <p>Next we hook it up as a primary expression:</p>
 216
 217 <div class="doc_code">
 218 <pre>
 219 static ExprAST *ParsePrimary() {
 220   switch (CurTok) {
 221   default: return Error("unknown token when expecting an expression");
 222   case tok_identifier: return ParseIdentifierExpr();
 223   case tok_number:     return ParseNumberExpr();
 224   case '(':            return ParseParenExpr();
 225   <b>case tok_if:         return ParseIfExpr();</b>
 226   }
 227 }
 228 </pre>
 229 </div>
 230
 231 </div>
 232
 233 <!-- ======================================================================= -->
 234 <div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
 235 <!-- ======================================================================= -->
 236
 237 <div class="doc_text">
 238
 239 <p>Now that we have it parsing and building the AST, the final piece is adding
 240 LLVM code generation support.  This is the most interesting part of the
 241 if/then/else example, because this is where it starts to introduce new concepts.
 242 All of the code above has been thoroughly described in previous chapters.
 243 </p>
 244
 245 <p>To motivate the code we want to produce, lets take a look at a simple
 246 example.  Consider:</p>
 247
 248 <div class="doc_code">
 249 <pre>
 250 extern foo();
 251 extern bar();
 252 def baz(x) if x then foo() else bar();
 253 </pre>
 254 </div>
 255
 256 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
 257 looks like this:</p>
 258
 259 <div class="doc_code">
 260 <pre>
 261 declare double @foo()
 262
 263 declare double @bar()
 264
 265 define double @baz(double %x) {
 266 entry:
 267         %ifcond = fcmp one double %x, 0.000000e+00
 268         br i1 %ifcond, label %then, label %else
 269
 270 then:           ; preds = %entry
 271         %calltmp = call double @foo()
 272         br label %ifcont
 273
 274 else:           ; preds = %entry
 275         %calltmp1 = call double @bar()
 276         br label %ifcont
 277
 278 ifcont:         ; preds = %else, %then
 279         %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
 280         ret double %iftmp
 281 }
 282 </pre>
 283 </div>
 284
 285 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
 286 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool.  If you put this LLVM IR
 287 into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
 288 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
 289 see this graph:</p>
 290
 291 <center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
 292 height="315"></center>
 293
 294 <p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
 295 "<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
 296 inserting actual calls into the code and recompiling or by calling these in the
 297 debugger.  LLVM has many nice features for visualizing various graphs.</p>
 298
 299 <p>Coming back to the generated code, it is fairly simple: the entry block
 300 evaluates the conditional expression ("x" in our case here) and compares the
 301 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
 302 instruction ('one' is "Ordered and Not Equal").  Based on the result of this
 303 expression, the code jumps to either the "then" or "else" blocks, which contain
 304 the expressions for the true/false cases.</p>
 305
 306 <p>Once the then/else blocks is finished executing, they both branch back to the
 307 'ifcont' block to execute the code that happens after the if/then/else.  In this
 308 case the only thing left to do is to return to the caller of the function.  The
 309 question then becomes: how does the code know which expression to return?</p>
 310
 311 <p>The answer to this question involves an important SSA operation: the
 312 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
 313 operation</a>.  If you're not familiar with SSA, <a
 314 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
 315 article</a> is a good introduction and there are various other introductions to
 316 it available on your favorite search engine.  The short version is that
 317 "execution" of the Phi operation requires "remembering" which block control came
 318 from.  The Phi operation takes on the value corresponding to the input control
 319 block.  In this case, if control comes in from the "then" block, it gets the
 320 value of "calltmp".  If control comes from the "else" block, it gets the value
 321 of "calltmp1".</p>
 322
 323 <p>At this point, you are probably starting to think "oh no! this means my
 324 simple and elegant front-end will have to start generating SSA form in order to
 325 use LLVM!".  Fortunately, this is not the case, and we strongly advise
 326 <em>not</em> implementing an SSA construction algorithm in your front-end
 327 unless there is an amazingly good reason to do so.  In practice, there are two
 328 sorts of values that float around in code written in your average imperative
 329 programming language that might need Phi nodes:</p>
 330
 331 <ol>
 332 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
 333 <li>Values that are implicit in the structure of your AST, such as the phi node
 334 in this case.</li>
 335 </ol>
 336
 337 <p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
 338 variables"), we'll talk about #1
 339 in depth.  For now, just believe me that you don't need SSA construction to
 340 handle them.  For #2, you have the choice of using the techniques that we will
 341 describe for #1, or you can insert Phi nodes directly if convenient.  In this
 342 case, it is really really easy to generate the Phi node, so we choose to do it
 343 directly.</p>
 344
 345 <p>Okay, enough of the motivation and overview, lets generate code!</p>
 346
 347 </div>
 348
 349 <!-- ======================================================================= -->
 350 <div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
 351 If/Then/Else</a></div>
 352 <!-- ======================================================================= -->
 353
 354 <div class="doc_text">
 355
 356 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
 357 for <tt>IfExprAST</tt>:</p>
 358
 359 <div class="doc_code">
 360 <pre>
 361 Value *IfExprAST::Codegen() {
 362   Value *CondV = Cond-&gt;Codegen();
 363   if (CondV == 0) return 0;
 364
 365   // Convert condition to a bool by comparing equal to 0.0.
 366   CondV = Builder.CreateFCmpONE(CondV,
 367                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 368                                 "ifcond");
 369 </pre>
 370 </div>
 371
 372 <p>This code is straight-forward and similar to what we saw before.  We emit the
 373 expression for the condition, then compare that value to zero to get a truth
 374 value as a 1-bit (bool) value.</p>
 375
 376 <div class="doc_code">
 377 <pre>
 378   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 379
 380   // Create blocks for the then and else cases.  Insert the 'then' block at the
 381   // end of the function.
 382   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
 383   BasicBlock *ElseBB = new BasicBlock("else");
 384   BasicBlock *MergeBB = new BasicBlock("ifcont");
 385
 386   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
 387 </pre>
 388 </div>
 389
 390 <p>This code creates the basic blocks that are related to the if/then/else
 391 statement, and correspond directly to the blocks in the example above.  The
 392 first line gets the current Function object that is being built.  It
 393 gets this by asking the builder for the current BasicBlock, and asking that
 394 block for its "parent" (the function it is currently embedded into).</p>
 395
 396 <p>Once it has that, it creates three blocks.  Note that it passes "TheFunction"
 397 into the constructor for the "then" block.  This causes the constructor to
 398 automatically insert the new block onto the end of the specified function.  The
 399 other two blocks are created, but aren't yet inserted into the function.</p>
 400
 401 <p>Once the blocks are created, we can emit the conditional branch that chooses
 402 between them.  Note that creating new blocks does not implicitly affect the
 403 LLVMBuilder, so it is still inserting into the block that the condition
 404 went into.  Also note that it is creating a branch to the "then" block and the
 405 "else" block, even though the "else" block isn't inserted into the function yet.
 406 This is all ok: it is the standard way that LLVM supports forward
 407 references.</p>
 408
 409 <div class="doc_code">
 410 <pre>
 411   // Emit then value.
 412   Builder.SetInsertPoint(ThenBB);
 413
 414   Value *ThenV = Then-&gt;Codegen();
 415   if (ThenV == 0) return 0;
 416
 417   Builder.CreateBr(MergeBB);
 418   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
 419   ThenBB = Builder.GetInsertBlock();
 420 </pre>
 421 </div>
 422
 423 <p>After the conditional branch is inserted, we move the builder to start
 424 inserting into the "then" block.  Strictly speaking, this call moves the
 425 insertion point to be at the end of the specified block.  However, since the
 426 "then" block is empty, it also starts out by inserting at the beginning of the
 427 block.  :)</p>
 428
 429 <p>Once the insertion point is set, we recursively codegen the "then" expression
 430 from the AST.  To finish off the then block, we create an unconditional branch
 431 to the merge block.  One interesting (and very important) aspect of the LLVM IR
 432 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
 433 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
 434 instruction</a> such as return or branch.  This means that all control flow,
 435 <em>including fall throughs</em> must be made explicit in the LLVM IR.  If you
 436 violate this rule, the verifier will emit an error.</p>
 437
 438 <p>The final line here is quite subtle, but is very important.  The basic issue
 439 is that when we create the Phi node in the merge block, we need to set up the
 440 block/value pairs that indicate how the Phi will work.  Importantly, the Phi
 441 node expects to have an entry for each predecessor of the block in the CFG.  Why
 442 then are we getting the current block when we just set it to ThenBB 5 lines
 443 above?  The problem is that the "Then" expression may actually itself change the
 444 block that the Builder is emitting into if, for example, it contains a nested
 445 "if/then/else" expression.  Because calling Codegen recursively could
 446 arbitrarily change the notion of the current block, we are required to get an
 447 up-to-date value for code that will set up the Phi node.</p>
 448
 449 <div class="doc_code">
 450 <pre>
 451   // Emit else block.
 452   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
 453   Builder.SetInsertPoint(ElseBB);
 454
 455   Value *ElseV = Else-&gt;Codegen();
 456   if (ElseV == 0) return 0;
 457
 458   Builder.CreateBr(MergeBB);
 459   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
 460   ElseBB = Builder.GetInsertBlock();
 461 </pre>
 462 </div>
 463
 464 <p>Code generation for the 'else' block is basically identical to codegen for
 465 the 'then' block.  The only significant difference is the first line, which adds
 466 the 'else' block to the function.  Recall previously that the 'else' block was
 467 created, but not added to the function.  Now that the 'then' and 'else' blocks
 468 are emitted, we can finish up with the merge code:</p>
 469
 470 <div class="doc_code">
 471 <pre>
 472   // Emit merge block.
 473   TheFunction->getBasicBlockList().push_back(MergeBB);
 474   Builder.SetInsertPoint(MergeBB);
 475   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
 476
 477   PN->addIncoming(ThenV, ThenBB);
 478   PN->addIncoming(ElseV, ElseBB);
 479   return PN;
 480 }
 481 </pre>
 482 </div>
 483
 484 <p>The first two lines here are now familiar: the first adds the "merge" block
 485 to the Function object (it was previously floating, like the else block above).
 486 The second block changes the insertion point so that newly created code will go
 487 into the "merge" block.  Once that is done, we need to create the PHI node and
 488 set up the block/value pairs for the PHI.</p>
 489
 490 <p>Finally, the CodeGen function returns the phi node as the value computed by
 491 the if/then/else expression.  In our example above, this returned value will
 492 feed into the code for the top-level function, which will create the return
 493 instruction.</p>
 494
 495 <p>Overall, we now have the ability to execution conditional code in
 496 Kaleidoscope.  With this extension, Kaleidoscope is a fairly complete language
 497 that can calculate a wide variety of numeric functions.  Next up we'll add
 498 another useful expression that is familiar from non-functional languages...</p>
 499
 500 </div>
 501
 502 <!-- *********************************************************************** -->
 503 <div class="doc_section"><a name="for">'for' Loop Expression</a></div>
 504 <!-- *********************************************************************** -->
 505
 506 <div class="doc_text">
 507
 508 <p>Now that we know how to add basic control flow constructs to the language,
 509 we have the tools to add more powerful things.  Lets add something more
 510 aggressive, a 'for' expression:</p>
 511
 512 <div class="doc_code">
 513 <pre>
 514  extern putchard(char)
 515  def printstar(n)
 516    for i = 1, i &lt; n, 1.0 in
 517      putchard(42);  # ascii 42 = '*'
 518
 519  # print 100 '*' characters
 520  printstar(100);
 521 </pre>
 522 </div>
 523
 524 <p>This expression defines a new variable ("i" in this case) which iterates from
 525 a starting value, while the condition ("i &lt; n" in this case) is true,
 526 incrementing by an optional step value ("1.0" in this case).  If the step value
 527 is omitted, it defaults to 1.0.  While the loop is true, it executes its
 528 body expression.  Because we don't have anything better to return, we'll just
 529 define the loop as always returning 0.0.  In the future when we have mutable
 530 variables, it will get more useful.</p>
 531
 532 <p>As before, lets talk about the changes that we need to Kaleidoscope to
 533 support this.</p>
 534
 535 </div>
 536
 537 <!-- ======================================================================= -->
 538 <div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
 539 the 'for' Loop</a></div>
 540 <!-- ======================================================================= -->
 541
 542 <div class="doc_text">
 543
 544 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
 545
 546 <div class="doc_code">
 547 <pre>
 548   ... in enum Token ...
 549   // control
 550   tok_if = -6, tok_then = -7, tok_else = -8,
 551 <b>  tok_for = -9, tok_in = -10</b>
 552
 553   ... in gettok ...
 554   if (IdentifierStr == "def") return tok_def;
 555   if (IdentifierStr == "extern") return tok_extern;
 556   if (IdentifierStr == "if") return tok_if;
 557   if (IdentifierStr == "then") return tok_then;
 558   if (IdentifierStr == "else") return tok_else;
 559   <b>if (IdentifierStr == "for") return tok_for;
 560   if (IdentifierStr == "in") return tok_in;</b>
 561   return tok_identifier;
 562 </pre>
 563 </div>
 564
 565 </div>
 566
 567 <!-- ======================================================================= -->
 568 <div class="doc_subsubsection"><a name="forast">AST Extensions for
 569 the 'for' Loop</a></div>
 570 <!-- ======================================================================= -->
 571
 572 <div class="doc_text">
 573
 574 <p>The AST node is similarly simple.  It basically boils down to capturing
 575 the variable name and the constituent expressions in the node.</p>
 576
 577 <div class="doc_code">
 578 <pre>
 579 /// ForExprAST - Expression class for for/in.
 580 class ForExprAST : public ExprAST {
 581   std::string VarName;
 582   ExprAST *Start, *End, *Step, *Body;
 583 public:
 584   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
 585              ExprAST *step, ExprAST *body)
 586     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
 587   virtual Value *Codegen();
 588 };
 589 </pre>
 590 </div>
 591
 592 </div>
 593
 594 <!-- ======================================================================= -->
 595 <div class="doc_subsubsection"><a name="forparser">Parser Extensions for
 596 the 'for' Loop</a></div>
 597 <!-- ======================================================================= -->
 598
 599 <div class="doc_text">
 600
 601 <p>The parser code is also fairly standard.  The only interesting thing here is
 602 handling of the optional step value.  The parser code handles it by checking to
 603 see if the second comma is present.  If not, it sets the step value to null in
 604 the AST node:</p>
 605
 606 <div class="doc_code">
 607 <pre>
 608 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
 609 static ExprAST *ParseForExpr() {
 610   getNextToken();  // eat the for.
 611
 612   if (CurTok != tok_identifier)
 613     return Error("expected identifier after for");
 614
 615   std::string IdName = IdentifierStr;
 616   getNextToken();  // eat identifier.
 617
 618   if (CurTok != '=')
 619     return Error("expected '=' after for");
 620   getNextToken();  // eat '='.
 621
 622
 623   ExprAST *Start = ParseExpression();
 624   if (Start == 0) return 0;
 625   if (CurTok != ',')
 626     return Error("expected ',' after for start value");
 627   getNextToken();
 628
 629   ExprAST *End = ParseExpression();
 630   if (End == 0) return 0;
 631
 632   // The step value is optional.
 633   ExprAST *Step = 0;
 634   if (CurTok == ',') {
 635     getNextToken();
 636     Step = ParseExpression();
 637     if (Step == 0) return 0;
 638   }
 639
 640   if (CurTok != tok_in)
 641     return Error("expected 'in' after for");
 642   getNextToken();  // eat 'in'.
 643
 644   ExprAST *Body = ParseExpression();
 645   if (Body == 0) return 0;
 646
 647   return new ForExprAST(IdName, Start, End, Step, Body);
 648 }
 649 </pre>
 650 </div>
 651
 652 </div>
 653
 654 <!-- ======================================================================= -->
 655 <div class="doc_subsubsection"><a name="forir">LLVM IR for
 656 the 'for' Loop</a></div>
 657 <!-- ======================================================================= -->
 658
 659 <div class="doc_text">
 660
 661 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
 662 With the simple example above, we get this LLVM IR (note that I generated this
 663 dump is generated with optimizations disabled for clarity):
 664 </p>
 665
 666 <div class="doc_code">
 667 <pre>
 668 declare double @putchard(double)
 669
 670 define double @printstar(double %n) {
 671 entry:
 672         ; initial value = 1.0 (inlined into phi)
 673         br label %loop
 674
 675 loop:           ; preds = %loop, %entry
 676         %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
 677         ; body
 678         %calltmp = call double @putchard( double 4.200000e+01 )
 679         ; increment
 680         %nextvar = add double %i, 1.000000e+00
 681
 682         ; termination test
 683         %cmptmp = fcmp ult double %i, %n
 684         %booltmp = uitofp i1 %cmptmp to double
 685         %loopcond = fcmp one double %booltmp, 0.000000e+00
 686         br i1 %loopcond, label %loop, label %afterloop
 687
 688 afterloop:              ; preds = %loop
 689         ; loop always returns 0.0
 690         ret double 0.000000e+00
 691 }
 692 </pre>
 693 </div>
 694
 695 <p>This loop contains all the same constructs we saw before: a phi node, several
 696 expressions, and some basic blocks.  Lets see how this fits together.</p>
 697
 698 </div>
 699
 700 <!-- ======================================================================= -->
 701 <div class="doc_subsubsection"><a name="forcodegen">Code Generation for
 702 the 'for' Loop</a></div>
 703 <!-- ======================================================================= -->
 704
 705 <div class="doc_text">
 706
 707 <p>The first part of codegen is very simple: we just output the start expression
 708 for the loop value:</p>
 709
 710 <div class="doc_code">
 711 <pre>
 712 Value *ForExprAST::Codegen() {
 713   // Emit the start code first, without 'variable' in scope.
 714   Value *StartVal = Start-&gt;Codegen();
 715   if (StartVal == 0) return 0;
 716 </pre>
 717 </div>
 718
 719 <p>With this out of the way, the next step is to set up the LLVM basic block
 720 for the start of the loop body.  In the case above, the whole loop body is one
 721 block, but remember that the body code itself could consist of multiple blocks
 722 (e.g. if it contains an if/then/else or a for/in expression).</p>
 723
 724 <div class="doc_code">
 725 <pre>
 726   // Make the new basic block for the loop header, inserting after current
 727   // block.
 728   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 729   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
 730   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
 731
 732   // Insert an explicit fall through from the current block to the LoopBB.
 733   Builder.CreateBr(LoopBB);
 734 </pre>
 735 </div>
 736
 737 <p>This code is similar to what we saw for if/then/else.  Because we will need
 738 it to create the Phi node, we remember the block that falls through into the
 739 loop.  Once we have that, we create the actual block that starts the loop and
 740 create an unconditional branch for the fall-through between the two blocks.</p>
 741
 742 <div class="doc_code">
 743 <pre>
 744   // Start insertion in LoopBB.
 745   Builder.SetInsertPoint(LoopBB);
 746
 747   // Start the PHI node with an entry for Start.
 748   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
 749   Variable-&gt;addIncoming(StartVal, PreheaderBB);
 750 </pre>
 751 </div>
 752
 753 <p>Now that the "preheader" for the loop is set up, we switch to emitting code
 754 for the loop body.  To begin with, we move the insertion point and create the
 755 PHI node for the loop induction variable.  Since we already know the incoming
 756 value for the starting value, we add it to the Phi node.  Note that the Phi will
 757 eventually get a second value for the backedge, but we can't set it up yet
 758 (because it doesn't exist!).</p>
 759
 760 <div class="doc_code">
 761 <pre>
 762   // Within the loop, the variable is defined equal to the PHI node.  If it
 763   // shadows an existing variable, we have to restore it, so save it now.
 764   Value *OldVal = NamedValues[VarName];
 765   NamedValues[VarName] = Variable;
 766
 767   // Emit the body of the loop.  This, like any other expr, can change the
 768   // current BB.  Note that we ignore the value computed by the body, but don't
 769   // allow an error.
 770   if (Body-&gt;Codegen() == 0)
 771     return 0;
 772 </pre>
 773 </div>
 774
 775 <p>Now the code starts to get more interesting.  Our 'for' loop introduces a new
 776 variable to the symbol table.  This means that our symbol table can now contain
 777 either function arguments or loop variables.  To handle this, before we codegen
 778 the body of the loop, we add the loop variable as the current value for its
 779 name.  Note that it is possible that there is a variable of the same name in the
 780 outer scope.  It would be easy to make this an error (emit an error and return
 781 null if there is already an entry for VarName) but we choose to allow shadowing
 782 of variables.  In order to handle this correctly, we remember the Value that
 783 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
 784 no shadowed variable).</p>
 785
 786 <p>Once the loop variable is set into the symbol table, the code recursively
 787 codegen's the body.  This allows the body to use the loop variable: any
 788 references to it will naturally find it in the symbol table.</p>
 789
 790 <div class="doc_code">
 791 <pre>
 792   // Emit the step value.
 793   Value *StepVal;
 794   if (Step) {
 795     StepVal = Step-&gt;Codegen();
 796     if (StepVal == 0) return 0;
 797   } else {
 798     // If not specified, use 1.0.
 799     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
 800   }
 801
 802   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
 803 </pre>
 804 </div>
 805
 806 <p>Now that the body is emitted, we compute the next value of the iteration
 807 variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
 808 will be the value of the loop variable on the next iteration of the loop.</p>
 809
 810 <div class="doc_code">
 811 <pre>
 812   // Compute the end condition.
 813   Value *EndCond = End-&gt;Codegen();
 814   if (EndCond == 0) return EndCond;
 815
 816   // Convert condition to a bool by comparing equal to 0.0.
 817   EndCond = Builder.CreateFCmpONE(EndCond,
 818                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 819                                   "loopcond");
 820 </pre>
 821 </div>
 822
 823 <p>Finally, we evaluate the exit value of the loop, to determine whether the
 824 loop should exit.  This mirrors the condition evaluation for the if/then/else
 825 statement.</p>
 826
 827 <div class="doc_code">
 828 <pre>
 829   // Create the "after loop" block and insert it.
 830   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
 831   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
 832
 833   // Insert the conditional branch into the end of LoopEndBB.
 834   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
 835
 836   // Any new code will be inserted in AfterBB.
 837   Builder.SetInsertPoint(AfterBB);
 838 </pre>
 839 </div>
 840
 841 <p>With the code for the body of the loop complete, we just need to finish up
 842 the control flow for it.  This remembers the end block (for the phi node), then
 843 creates the block for the loop exit ("afterloop").  Based on the value of the
 844 exit condition, it creates a conditional branch that chooses between executing
 845 the loop again and exiting the loop.  Any future code is emitted in the
 846 "afterloop" block, so it sets the insertion position to it.</p>
 847
 848 <div class="doc_code">
 849 <pre>
 850   // Add a new entry to the PHI node for the backedge.
 851   Variable-&gt;addIncoming(NextVar, LoopEndBB);
 852
 853   // Restore the unshadowed variable.
 854   if (OldVal)
 855     NamedValues[VarName] = OldVal;
 856   else
 857     NamedValues.erase(VarName);
 858
 859   // for expr always returns 0.0.
 860   return Constant::getNullValue(Type::DoubleTy);
 861 }
 862 </pre>
 863 </div>
 864
 865 <p>The final code handles various cleanups: now that we have the "NextVar"
 866 value, we can add the incoming value to the loop PHI node.  After that, we
 867 remove the loop variable from the symbol table, so that it isn't in scope after
 868 the for loop.  Finally, code generation of the for loop always returns 0.0, so
 869 that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
 870
 871 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
 872 the tutorial.  We added two control flow constructs, and used them to motivate
 873 a couple of aspects of the LLVM IR that are important for front-end implementors
 874 to know.  In the next chapter of our saga, we will get a bit crazier and add
 875 <a href="LangImpl6.html">user-defined operators</a> to our poor innocent
 876 language.</p>
 877
 878 </div>
 879
 880 <!-- *********************************************************************** -->
 881 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 882 <!-- *********************************************************************** -->
 883
 884 <div class="doc_text">
 885
 886 <p>
 887 Here is the complete code listing for our running example, enhanced with the
 888 if/then/else and for expressions..  To build this example, use:
 889 </p>
 890
 891 <div class="doc_code">
 892 <pre>
 893    # Compile
 894    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 895    # Run
 896    ./toy
 897 </pre>
 898 </div>
 899
 900 <p>Here is the code:</p>
 901
 902 <div class="doc_code">
 903 <pre>
 904 #include "llvm/DerivedTypes.h"
 905 #include "llvm/ExecutionEngine/ExecutionEngine.h"
 906 #include "llvm/Module.h"
 907 #include "llvm/ModuleProvider.h"
 908 #include "llvm/PassManager.h"
 909 #include "llvm/Analysis/Verifier.h"
 910 #include "llvm/Target/TargetData.h"
 911 #include "llvm/Transforms/Scalar.h"
 912 #include "llvm/Support/LLVMBuilder.h"
 913 #include &lt;cstdio&gt;
 914 #include &lt;string&gt;
 915 #include &lt;map&gt;
 916 #include &lt;vector&gt;
 917 using namespace llvm;
 918
 919 //===----------------------------------------------------------------------===//
 920 // Lexer
 921 //===----------------------------------------------------------------------===//
 922
 923 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 924 // of these for known things.
 925 enum Token {
 926   tok_eof = -1,
 927
 928   // commands
 929   tok_def = -2, tok_extern = -3,
 930
 931   // primary
 932   tok_identifier = -4, tok_number = -5,
 933
 934   // control
 935   tok_if = -6, tok_then = -7, tok_else = -8,
 936   tok_for = -9, tok_in = -10
 937 };
 938
 939 static std::string IdentifierStr;  // Filled in if tok_identifier
 940 static double NumVal;              // Filled in if tok_number
 941
 942 /// gettok - Return the next token from standard input.
 943 static int gettok() {
 944   static int LastChar = ' ';
 945
 946   // Skip any whitespace.
 947   while (isspace(LastChar))
 948     LastChar = getchar();
 949
 950   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
 951     IdentifierStr = LastChar;
 952     while (isalnum((LastChar = getchar())))
 953       IdentifierStr += LastChar;
 954
 955     if (IdentifierStr == "def") return tok_def;
 956     if (IdentifierStr == "extern") return tok_extern;
 957     if (IdentifierStr == "if") return tok_if;
 958     if (IdentifierStr == "then") return tok_then;
 959     if (IdentifierStr == "else") return tok_else;
 960     if (IdentifierStr == "for") return tok_for;
 961     if (IdentifierStr == "in") return tok_in;
 962     return tok_identifier;
 963   }
 964
 965   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
 966     std::string NumStr;
 967     do {
 968       NumStr += LastChar;
 969       LastChar = getchar();
 970     } while (isdigit(LastChar) || LastChar == '.');
 971
 972     NumVal = strtod(NumStr.c_str(), 0);
 973     return tok_number;
 974   }
 975
 976   if (LastChar == '#') {
 977     // Comment until end of line.
 978     do LastChar = getchar();
 979     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
 980
 981     if (LastChar != EOF)
 982       return gettok();
 983   }
 984
 985   // Check for end of file.  Don't eat the EOF.
 986   if (LastChar == EOF)
 987     return tok_eof;
 988
 989   // Otherwise, just return the character as its ascii value.
 990   int ThisChar = LastChar;
 991   LastChar = getchar();
 992   return ThisChar;
 993 }
 994
 995 //===----------------------------------------------------------------------===//
 996 // Abstract Syntax Tree (aka Parse Tree)
 997 //===----------------------------------------------------------------------===//
 998
 999 /// ExprAST - Base class for all expression nodes.
1000 class ExprAST {
1001 public:
1002   virtual ~ExprAST() {}
1003   virtual Value *Codegen() = 0;
1004 };
1005
1006 /// NumberExprAST - Expression class for numeric literals like "1.0".
1007 class NumberExprAST : public ExprAST {
1008   double Val;
1009 public:
1010   NumberExprAST(double val) : Val(val) {}
1011   virtual Value *Codegen();
1012 };
1013
1014 /// VariableExprAST - Expression class for referencing a variable, like "a".
1015 class VariableExprAST : public ExprAST {
1016   std::string Name;
1017 public:
1018   VariableExprAST(const std::string &amp;name) : Name(name) {}
1019   virtual Value *Codegen();
1020 };
1021
1022 /// BinaryExprAST - Expression class for a binary operator.
1023 class BinaryExprAST : public ExprAST {
1024   char Op;
1025   ExprAST *LHS, *RHS;
1026 public:
1027   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1028     : Op(op), LHS(lhs), RHS(rhs) {}
1029   virtual Value *Codegen();
1030 };
1031
1032 /// CallExprAST - Expression class for function calls.
1033 class CallExprAST : public ExprAST {
1034   std::string Callee;
1035   std::vector&lt;ExprAST*&gt; Args;
1036 public:
1037   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1038     : Callee(callee), Args(args) {}
1039   virtual Value *Codegen();
1040 };
1041
1042 /// IfExprAST - Expression class for if/then/else.
1043 class IfExprAST : public ExprAST {
1044   ExprAST *Cond, *Then, *Else;
1045 public:
1046   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1047   : Cond(cond), Then(then), Else(_else) {}
1048   virtual Value *Codegen();
1049 };
1050
1051 /// ForExprAST - Expression class for for/in.
1052 class ForExprAST : public ExprAST {
1053   std::string VarName;
1054   ExprAST *Start, *End, *Step, *Body;
1055 public:
1056   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1057              ExprAST *step, ExprAST *body)
1058     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1059   virtual Value *Codegen();
1060 };
1061
1062 /// PrototypeAST - This class represents the "prototype" for a function,
1063 /// which captures its argument names as well as if it is an operator.
1064 class PrototypeAST {
1065   std::string Name;
1066   std::vector&lt;std::string&gt; Args;
1067 public:
1068   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1069     : Name(name), Args(args) {}
1070
1071   Function *Codegen();
1072 };
1073
1074 /// FunctionAST - This class represents a function definition itself.
1075 class FunctionAST {
1076   PrototypeAST *Proto;
1077   ExprAST *Body;
1078 public:
1079   FunctionAST(PrototypeAST *proto, ExprAST *body)
1080     : Proto(proto), Body(body) {}
1081
1082   Function *Codegen();
1083 };
1084
1085 //===----------------------------------------------------------------------===//
1086 // Parser
1087 //===----------------------------------------------------------------------===//
1088
1089 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1090 /// token the parser it looking at.  getNextToken reads another token from the
1091 /// lexer and updates CurTok with its results.
1092 static int CurTok;
1093 static int getNextToken() {
1094   return CurTok = gettok();
1095 }
1096
1097 /// BinopPrecedence - This holds the precedence for each binary operator that is
1098 /// defined.
1099 static std::map&lt;char, int&gt; BinopPrecedence;
1100
1101 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1102 static int GetTokPrecedence() {
1103   if (!isascii(CurTok))
1104     return -1;
1105
1106   // Make sure it's a declared binop.
1107   int TokPrec = BinopPrecedence[CurTok];
1108   if (TokPrec &lt;= 0) return -1;
1109   return TokPrec;
1110 }
1111
1112 /// Error* - These are little helper functions for error handling.
1113 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1114 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1115 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1116
1117 static ExprAST *ParseExpression();
1118
1119 /// identifierexpr
1120 ///   ::= identifier
1121 ///   ::= identifier '(' expression* ')'
1122 static ExprAST *ParseIdentifierExpr() {
1123   std::string IdName = IdentifierStr;
1124
1125   getNextToken();  // eat identifier.
1126
1127   if (CurTok != '(') // Simple variable ref.
1128     return new VariableExprAST(IdName);
1129
1130   // Call.
1131   getNextToken();  // eat (
1132   std::vector&lt;ExprAST*&gt; Args;
1133   if (CurTok != ')') {
1134     while (1) {
1135       ExprAST *Arg = ParseExpression();
1136       if (!Arg) return 0;
1137       Args.push_back(Arg);
1138
1139       if (CurTok == ')') break;
1140
1141       if (CurTok != ',')
1142         return Error("Expected ')'");
1143       getNextToken();
1144     }
1145   }
1146
1147   // Eat the ')'.
1148   getNextToken();
1149
1150   return new CallExprAST(IdName, Args);
1151 }
1152
1153 /// numberexpr ::= number
1154 static ExprAST *ParseNumberExpr() {
1155   ExprAST *Result = new NumberExprAST(NumVal);
1156   getNextToken(); // consume the number
1157   return Result;
1158 }
1159
1160 /// parenexpr ::= '(' expression ')'
1161 static ExprAST *ParseParenExpr() {
1162   getNextToken();  // eat (.
1163   ExprAST *V = ParseExpression();
1164   if (!V) return 0;
1165
1166   if (CurTok != ')')
1167     return Error("expected ')'");
1168   getNextToken();  // eat ).
1169   return V;
1170 }
1171
1172 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1173 static ExprAST *ParseIfExpr() {
1174   getNextToken();  // eat the if.
1175
1176   // condition.
1177   ExprAST *Cond = ParseExpression();
1178   if (!Cond) return 0;
1179
1180   if (CurTok != tok_then)
1181     return Error("expected then");
1182   getNextToken();  // eat the then
1183
1184   ExprAST *Then = ParseExpression();
1185   if (Then == 0) return 0;
1186
1187   if (CurTok != tok_else)
1188     return Error("expected else");
1189
1190   getNextToken();
1191
1192   ExprAST *Else = ParseExpression();
1193   if (!Else) return 0;
1194
1195   return new IfExprAST(Cond, Then, Else);
1196 }
1197
1198 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1199 static ExprAST *ParseForExpr() {
1200   getNextToken();  // eat the for.
1201
1202   if (CurTok != tok_identifier)
1203     return Error("expected identifier after for");
1204
1205   std::string IdName = IdentifierStr;
1206   getNextToken();  // eat identifier.
1207
1208   if (CurTok != '=')
1209     return Error("expected '=' after for");
1210   getNextToken();  // eat '='.
1211
1212
1213   ExprAST *Start = ParseExpression();
1214   if (Start == 0) return 0;
1215   if (CurTok != ',')
1216     return Error("expected ',' after for start value");
1217   getNextToken();
1218
1219   ExprAST *End = ParseExpression();
1220   if (End == 0) return 0;
1221
1222   // The step value is optional.
1223   ExprAST *Step = 0;
1224   if (CurTok == ',') {
1225     getNextToken();
1226     Step = ParseExpression();
1227     if (Step == 0) return 0;
1228   }
1229
1230   if (CurTok != tok_in)
1231     return Error("expected 'in' after for");
1232   getNextToken();  // eat 'in'.
1233
1234   ExprAST *Body = ParseExpression();
1235   if (Body == 0) return 0;
1236
1237   return new ForExprAST(IdName, Start, End, Step, Body);
1238 }
1239
1240
1241 /// primary
1242 ///   ::= identifierexpr
1243 ///   ::= numberexpr
1244 ///   ::= parenexpr
1245 ///   ::= ifexpr
1246 ///   ::= forexpr
1247 static ExprAST *ParsePrimary() {
1248   switch (CurTok) {
1249   default: return Error("unknown token when expecting an expression");
1250   case tok_identifier: return ParseIdentifierExpr();
1251   case tok_number:     return ParseNumberExpr();
1252   case '(':            return ParseParenExpr();
1253   case tok_if:         return ParseIfExpr();
1254   case tok_for:        return ParseForExpr();
1255   }
1256 }
1257
1258 /// binoprhs
1259 ///   ::= ('+' primary)*
1260 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1261   // If this is a binop, find its precedence.
1262   while (1) {
1263     int TokPrec = GetTokPrecedence();
1264
1265     // If this is a binop that binds at least as tightly as the current binop,
1266     // consume it, otherwise we are done.
1267     if (TokPrec &lt; ExprPrec)
1268       return LHS;
1269
1270     // Okay, we know this is a binop.
1271     int BinOp = CurTok;
1272     getNextToken();  // eat binop
1273
1274     // Parse the primary expression after the binary operator.
1275     ExprAST *RHS = ParsePrimary();
1276     if (!RHS) return 0;
1277
1278     // If BinOp binds less tightly with RHS than the operator after RHS, let
1279     // the pending operator take RHS as its LHS.
1280     int NextPrec = GetTokPrecedence();
1281     if (TokPrec &lt; NextPrec) {
1282       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1283       if (RHS == 0) return 0;
1284     }
1285
1286     // Merge LHS/RHS.
1287     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1288   }
1289 }
1290
1291 /// expression
1292 ///   ::= primary binoprhs
1293 ///
1294 static ExprAST *ParseExpression() {
1295   ExprAST *LHS = ParsePrimary();
1296   if (!LHS) return 0;
1297
1298   return ParseBinOpRHS(0, LHS);
1299 }
1300
1301 /// prototype
1302 ///   ::= id '(' id* ')'
1303 static PrototypeAST *ParsePrototype() {
1304   if (CurTok != tok_identifier)
1305     return ErrorP("Expected function name in prototype");
1306
1307   std::string FnName = IdentifierStr;
1308   getNextToken();
1309
1310   if (CurTok != '(')
1311     return ErrorP("Expected '(' in prototype");
1312
1313   std::vector&lt;std::string&gt; ArgNames;
1314   while (getNextToken() == tok_identifier)
1315     ArgNames.push_back(IdentifierStr);
1316   if (CurTok != ')')
1317     return ErrorP("Expected ')' in prototype");
1318
1319   // success.
1320   getNextToken();  // eat ')'.
1321
1322   return new PrototypeAST(FnName, ArgNames);
1323 }
1324
1325 /// definition ::= 'def' prototype expression
1326 static FunctionAST *ParseDefinition() {
1327   getNextToken();  // eat def.
1328   PrototypeAST *Proto = ParsePrototype();
1329   if (Proto == 0) return 0;
1330
1331   if (ExprAST *E = ParseExpression())
1332     return new FunctionAST(Proto, E);
1333   return 0;
1334 }
1335
1336 /// toplevelexpr ::= expression
1337 static FunctionAST *ParseTopLevelExpr() {
1338   if (ExprAST *E = ParseExpression()) {
1339     // Make an anonymous proto.
1340     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1341     return new FunctionAST(Proto, E);
1342   }
1343   return 0;
1344 }
1345
1346 /// external ::= 'extern' prototype
1347 static PrototypeAST *ParseExtern() {
1348   getNextToken();  // eat extern.
1349   return ParsePrototype();
1350 }
1351
1352 //===----------------------------------------------------------------------===//
1353 // Code Generation
1354 //===----------------------------------------------------------------------===//
1355
1356 static Module *TheModule;
1357 static LLVMFoldingBuilder Builder;
1358 static std::map&lt;std::string, Value*&gt; NamedValues;
1359 static FunctionPassManager *TheFPM;
1360
1361 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1362
1363 Value *NumberExprAST::Codegen() {
1364   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1365 }
1366
1367 Value *VariableExprAST::Codegen() {
1368   // Look this variable up in the function.
1369   Value *V = NamedValues[Name];
1370   return V ? V : ErrorV("Unknown variable name");
1371 }
1372
1373 Value *BinaryExprAST::Codegen() {
1374   Value *L = LHS-&gt;Codegen();
1375   Value *R = RHS-&gt;Codegen();
1376   if (L == 0 || R == 0) return 0;
1377
1378   switch (Op) {
1379   case '+': return Builder.CreateAdd(L, R, "addtmp");
1380   case '-': return Builder.CreateSub(L, R, "subtmp");
1381   case '*': return Builder.CreateMul(L, R, "multmp");
1382   case '&lt;':
1383     L = Builder.CreateFCmpULT(L, R, "cmptmp");
1384     // Convert bool 0/1 to double 0.0 or 1.0
1385     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1386   default: return ErrorV("invalid binary operator");
1387   }
1388 }
1389
1390 Value *CallExprAST::Codegen() {
1391   // Look up the name in the global module table.
1392   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1393   if (CalleeF == 0)
1394     return ErrorV("Unknown function referenced");
1395
1396   // If argument mismatch error.
1397   if (CalleeF-&gt;arg_size() != Args.size())
1398     return ErrorV("Incorrect # arguments passed");
1399
1400   std::vector&lt;Value*&gt; ArgsV;
1401   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1402     ArgsV.push_back(Args[i]-&gt;Codegen());
1403     if (ArgsV.back() == 0) return 0;
1404   }
1405
1406   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1407 }
1408
1409 Value *IfExprAST::Codegen() {
1410   Value *CondV = Cond-&gt;Codegen();
1411   if (CondV == 0) return 0;
1412
1413   // Convert condition to a bool by comparing equal to 0.0.
1414   CondV = Builder.CreateFCmpONE(CondV,
1415                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1416                                 "ifcond");
1417
1418   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1419
1420   // Create blocks for the then and else cases.  Insert the 'then' block at the
1421   // end of the function.
1422   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1423   BasicBlock *ElseBB = new BasicBlock("else");
1424   BasicBlock *MergeBB = new BasicBlock("ifcont");
1425
1426   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1427
1428   // Emit then value.
1429   Builder.SetInsertPoint(ThenBB);
1430
1431   Value *ThenV = Then-&gt;Codegen();
1432   if (ThenV == 0) return 0;
1433
1434   Builder.CreateBr(MergeBB);
1435   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1436   ThenBB = Builder.GetInsertBlock();
1437
1438   // Emit else block.
1439   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1440   Builder.SetInsertPoint(ElseBB);
1441
1442   Value *ElseV = Else-&gt;Codegen();
1443   if (ElseV == 0) return 0;
1444
1445   Builder.CreateBr(MergeBB);
1446   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1447   ElseBB = Builder.GetInsertBlock();
1448
1449   // Emit merge block.
1450   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1451   Builder.SetInsertPoint(MergeBB);
1452   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1453
1454   PN-&gt;addIncoming(ThenV, ThenBB);
1455   PN-&gt;addIncoming(ElseV, ElseBB);
1456   return PN;
1457 }
1458
1459 Value *ForExprAST::Codegen() {
1460   // Output this as:
1461   //   ...
1462   //   start = startexpr
1463   //   goto loop
1464   // loop:
1465   //   variable = phi [start, loopheader], [nextvariable, loopend]
1466   //   ...
1467   //   bodyexpr
1468   //   ...
1469   // loopend:
1470   //   step = stepexpr
1471   //   nextvariable = variable + step
1472   //   endcond = endexpr
1473   //   br endcond, loop, endloop
1474   // outloop:
1475
1476   // Emit the start code first, without 'variable' in scope.
1477   Value *StartVal = Start-&gt;Codegen();
1478   if (StartVal == 0) return 0;
1479
1480   // Make the new basic block for the loop header, inserting after current
1481   // block.
1482   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1483   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1484   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1485
1486   // Insert an explicit fall through from the current block to the LoopBB.
1487   Builder.CreateBr(LoopBB);
1488
1489   // Start insertion in LoopBB.
1490   Builder.SetInsertPoint(LoopBB);
1491
1492   // Start the PHI node with an entry for Start.
1493   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1494   Variable-&gt;addIncoming(StartVal, PreheaderBB);
1495
1496   // Within the loop, the variable is defined equal to the PHI node.  If it
1497   // shadows an existing variable, we have to restore it, so save it now.
1498   Value *OldVal = NamedValues[VarName];
1499   NamedValues[VarName] = Variable;
1500
1501   // Emit the body of the loop.  This, like any other expr, can change the
1502   // current BB.  Note that we ignore the value computed by the body, but don't
1503   // allow an error.
1504   if (Body-&gt;Codegen() == 0)
1505     return 0;
1506
1507   // Emit the step value.
1508   Value *StepVal;
1509   if (Step) {
1510     StepVal = Step-&gt;Codegen();
1511     if (StepVal == 0) return 0;
1512   } else {
1513     // If not specified, use 1.0.
1514     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1515   }
1516
1517   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1518
1519   // Compute the end condition.
1520   Value *EndCond = End-&gt;Codegen();
1521   if (EndCond == 0) return EndCond;
1522
1523   // Convert condition to a bool by comparing equal to 0.0.
1524   EndCond = Builder.CreateFCmpONE(EndCond,
1525                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1526                                   "loopcond");
1527
1528   // Create the "after loop" block and insert it.
1529   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1530   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1531
1532   // Insert the conditional branch into the end of LoopEndBB.
1533   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1534
1535   // Any new code will be inserted in AfterBB.
1536   Builder.SetInsertPoint(AfterBB);
1537
1538   // Add a new entry to the PHI node for the backedge.
1539   Variable-&gt;addIncoming(NextVar, LoopEndBB);
1540
1541   // Restore the unshadowed variable.
1542   if (OldVal)
1543     NamedValues[VarName] = OldVal;
1544   else
1545     NamedValues.erase(VarName);
1546
1547
1548   // for expr always returns 0.0.
1549   return Constant::getNullValue(Type::DoubleTy);
1550 }
1551
1552 Function *PrototypeAST::Codegen() {
1553   // Make the function type:  double(double,double) etc.
1554   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1555   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1556
1557   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1558
1559   // If F conflicted, there was already something named 'Name'.  If it has a
1560   // body, don't allow redefinition or reextern.
1561   if (F-&gt;getName() != Name) {
1562     // Delete the one we just made and get the existing one.
1563     F-&gt;eraseFromParent();
1564     F = TheModule-&gt;getFunction(Name);
1565
1566     // If F already has a body, reject this.
1567     if (!F-&gt;empty()) {
1568       ErrorF("redefinition of function");
1569       return 0;
1570     }
1571
1572     // If F took a different number of args, reject.
1573     if (F-&gt;arg_size() != Args.size()) {
1574       ErrorF("redefinition of function with different # args");
1575       return 0;
1576     }
1577   }
1578
1579   // Set names for all arguments.
1580   unsigned Idx = 0;
1581   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1582        ++AI, ++Idx) {
1583     AI-&gt;setName(Args[Idx]);
1584
1585     // Add arguments to variable symbol table.
1586     NamedValues[Args[Idx]] = AI;
1587   }
1588
1589   return F;
1590 }
1591
1592 Function *FunctionAST::Codegen() {
1593   NamedValues.clear();
1594
1595   Function *TheFunction = Proto-&gt;Codegen();
1596   if (TheFunction == 0)
1597     return 0;
1598
1599   // Create a new basic block to start insertion into.
1600   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1601   Builder.SetInsertPoint(BB);
1602
1603   if (Value *RetVal = Body-&gt;Codegen()) {
1604     // Finish off the function.
1605     Builder.CreateRet(RetVal);
1606
1607     // Validate the generated code, checking for consistency.
1608     verifyFunction(*TheFunction);
1609
1610     // Optimize the function.
1611     TheFPM-&gt;run(*TheFunction);
1612
1613     return TheFunction;
1614   }
1615
1616   // Error reading body, remove function.
1617   TheFunction-&gt;eraseFromParent();
1618   return 0;
1619 }
1620
1621 //===----------------------------------------------------------------------===//
1622 // Top-Level parsing and JIT Driver
1623 //===----------------------------------------------------------------------===//
1624
1625 static ExecutionEngine *TheExecutionEngine;
1626
1627 static void HandleDefinition() {
1628   if (FunctionAST *F = ParseDefinition()) {
1629     if (Function *LF = F-&gt;Codegen()) {
1630       fprintf(stderr, "Read function definition:");
1631       LF-&gt;dump();
1632     }
1633   } else {
1634     // Skip token for error recovery.
1635     getNextToken();
1636   }
1637 }
1638
1639 static void HandleExtern() {
1640   if (PrototypeAST *P = ParseExtern()) {
1641     if (Function *F = P-&gt;Codegen()) {
1642       fprintf(stderr, "Read extern: ");
1643       F-&gt;dump();
1644     }
1645   } else {
1646     // Skip token for error recovery.
1647     getNextToken();
1648   }
1649 }
1650
1651 static void HandleTopLevelExpression() {
1652   // Evaluate a top level expression into an anonymous function.
1653   if (FunctionAST *F = ParseTopLevelExpr()) {
1654     if (Function *LF = F-&gt;Codegen()) {
1655       // JIT the function, returning a function pointer.
1656       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1657
1658       // Cast it to the right type (takes no arguments, returns a double) so we
1659       // can call it as a native function.
1660       double (*FP)() = (double (*)())FPtr;
1661       fprintf(stderr, "Evaluated to %f\n", FP());
1662     }
1663   } else {
1664     // Skip token for error recovery.
1665     getNextToken();
1666   }
1667 }
1668
1669 /// top ::= definition | external | expression | ';'
1670 static void MainLoop() {
1671   while (1) {
1672     fprintf(stderr, "ready&gt; ");
1673     switch (CurTok) {
1674     case tok_eof:    return;
1675     case ';':        getNextToken(); break;  // ignore top level semicolons.
1676     case tok_def:    HandleDefinition(); break;
1677     case tok_extern: HandleExtern(); break;
1678     default:         HandleTopLevelExpression(); break;
1679     }
1680   }
1681 }
1682
1683
1684
1685 //===----------------------------------------------------------------------===//
1686 // "Library" functions that can be "extern'd" from user code.
1687 //===----------------------------------------------------------------------===//
1688
1689 /// putchard - putchar that takes a double and returns 0.
1690 extern "C"
1691 double putchard(double X) {
1692   putchar((char)X);
1693   return 0;
1694 }
1695
1696 //===----------------------------------------------------------------------===//
1697 // Main driver code.
1698 //===----------------------------------------------------------------------===//
1699
1700 int main() {
1701   // Install standard binary operators.
1702   // 1 is lowest precedence.
1703   BinopPrecedence['&lt;'] = 10;
1704   BinopPrecedence['+'] = 20;
1705   BinopPrecedence['-'] = 20;
1706   BinopPrecedence['*'] = 40;  // highest.
1707
1708   // Prime the first token.
1709   fprintf(stderr, "ready&gt; ");
1710   getNextToken();
1711
1712   // Make the module, which holds all the code.
1713   TheModule = new Module("my cool jit");
1714
1715   // Create the JIT.
1716   TheExecutionEngine = ExecutionEngine::create(TheModule);
1717
1718   {
1719     ExistingModuleProvider OurModuleProvider(TheModule);
1720     FunctionPassManager OurFPM(&amp;OurModuleProvider);
1721
1722     // Set up the optimizer pipeline.  Start with registering info about how the
1723     // target lays out data structures.
1724     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1725     // Do simple "peephole" optimizations and bit-twiddling optzns.
1726     OurFPM.add(createInstructionCombiningPass());
1727     // Reassociate expressions.
1728     OurFPM.add(createReassociatePass());
1729     // Eliminate Common SubExpressions.
1730     OurFPM.add(createGVNPass());
1731     // Simplify the control flow graph (deleting unreachable blocks, etc).
1732     OurFPM.add(createCFGSimplificationPass());
1733     // Set the global so the code gen can use this.
1734     TheFPM = &amp;OurFPM;
1735
1736     // Run the main "interpreter loop" now.
1737     MainLoop();
1738
1739     TheFPM = 0;
1740   }  // Free module provider and pass manager.
1741
1742
1743   // Print out all of the generated code.
1744   TheModule-&gt;dump();
1745   return 0;
1746 }
1747 </pre>
1748 </div>
1749
1750 </div>
1751
1752 <!-- *********************************************************************** -->
1753 <hr>
1754 <address>
1755   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1756   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1757   <a href="http://validator.w3.org/check/referer"><img
1758   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1759
1760   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1761   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1762   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1763 </address>
1764 </body>
1765 </html>