docs/tutorial/LangImpl5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Control Flow</title>
   7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   8   <meta name="author" content="Chris Lattner">
   9   <link rel="stylesheet" href="../llvm.css" type="text/css">
  10 </head>
  11
  12 <body>
  13
  14 <div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
  15
  16 <ul>
  17 <li><a href="index.html">Up to Tutorial Index</a></li>
  18 <li>Chapter 5
  19   <ol>
  20     <li><a href="#intro">Chapter 5 Introduction</a></li>
  21     <li><a href="#ifthen">If/Then/Else</a>
  22     <ol>
  23       <li><a href="#iflexer">Lexer Extensions</a></li>
  24       <li><a href="#ifast">AST Extensions</a></li>
  25       <li><a href="#ifparser">Parser Extensions</a></li>
  26       <li><a href="#ifir">LLVM IR</a></li>
  27       <li><a href="#ifcodegen">Code Generation</a></li>
  28     </ol>
  29     </li>
  30     <li><a href="#for">'for' Loop Expression</a>
  31     <ol>
  32       <li><a href="#forlexer">Lexer Extensions</a></li>
  33       <li><a href="#forast">AST Extensions</a></li>
  34       <li><a href="#forparser">Parser Extensions</a></li>
  35       <li><a href="#forir">LLVM IR</a></li>
  36       <li><a href="#forcodegen">Code Generation</a></li>
  37     </ol>
  38     </li>
  39     <li><a href="#code">Full Code Listing</a></li>
  40   </ol>
  41 </li>
  42 <li><a href="LangImpl6.html">Chapter 6</a>: Extending the Language:
  43 User-defined Operators</li>
  44 </ul>
  45
  46 <div class="doc_author">
  47   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  48 </div>
  49
  50 <!-- *********************************************************************** -->
  51 <div class="doc_section"><a name="intro">Chapter 5 Introduction</a></div>
  52 <!-- *********************************************************************** -->
  53
  54 <div class="doc_text">
  55
  56 <p>Welcome to Chapter 5 of the "<a href="index.html">Implementing a language
  57 with LLVM</a>" tutorial.  Parts 1-4 described the implementation of the simple
  58 Kaleidoscope language and included support for generating LLVM IR, following by
  59 optimizations and a JIT compiler.  Unfortunately, as presented, Kaleidoscope is
  60 mostly useless: it has no control flow other than call and return.  This means
  61 that you can't have conditional branches in the code, significantly limiting its
  62 power.  In this episode of "build that compiler", we'll extend Kaleidoscope to
  63 have an if/then/else expression plus a simple looping construct.</p>
  64
  65 </div>
  66
  67 <!-- *********************************************************************** -->
  68 <div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
  69 <!-- *********************************************************************** -->
  70
  71 <div class="doc_text">
  72
  73 <p>
  74 Extending Kaleidoscope to support if/then/else is quite straight-forward.  It
  75 basically requires adding lexer support for this "new" concept to the lexer,
  76 parser, AST, and LLVM code emitter.  This example is nice, because it shows how
  77 easy it is to "grow" a language over time, incrementally extending it as new
  78 ideas are discovered.</p>
  79
  80 <p>Before we get going on "how" we do this extension, lets talk about what we
  81 want.  The basic idea is that we want to be able to write this sort of thing:
  82 </p>
  83
  84 <div class="doc_code">
  85 <pre>
  86 def fib(x)
  87   if x &lt; 3 then
  88     1
  89   else
  90     fib(x-1)+fib(x-2);
  91 </pre>
  92 </div>
  93
  94 <p>In Kaleidoscope, every construct is an expression: there are no statements.
  95 As such, the if/then/else expression needs to return a value like any other.
  96 Since we're using a mostly functional form, we'll have it evaluate its
  97 conditional, then return the 'then' or 'else' value based on how the condition
  98 was resolved.  This is very similar to the C "?:" expression.</p>
  99
 100 <p>The semantics of the if/then/else expression is that it first evaluates the
 101 condition to a boolean equality value: 0.0 is false and everything else is true.
 102 If the condition is true, the first subexpression is evaluated and returned, if
 103 the condition is false, the second subexpression is evaluated and returned.
 104 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
 105 </p>
 106
 107 <p>Now that we know what we want, lets break this down into its constituent
 108 pieces.</p>
 109
 110 </div>
 111
 112 <!-- ======================================================================= -->
 113 <div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
 114 If/Then/Else</a></div>
 115 <!-- ======================================================================= -->
 116
 117
 118 <div class="doc_text">
 119
 120 <p>The lexer extensions are straight-forward.  First we add new enum values
 121 for the relevant tokens:</p>
 122
 123 <div class="doc_code">
 124 <pre>
 125   // control
 126   tok_if = -6, tok_then = -7, tok_else = -8,
 127 </pre>
 128 </div>
 129
 130 <p>Once we have that, we recognize the new keywords in the lexer, pretty simple
 131 stuff:</p>
 132
 133 <div class="doc_code">
 134 <pre>
 135     ...
 136     if (IdentifierStr == "def") return tok_def;
 137     if (IdentifierStr == "extern") return tok_extern;
 138     <b>if (IdentifierStr == "if") return tok_if;
 139     if (IdentifierStr == "then") return tok_then;
 140     if (IdentifierStr == "else") return tok_else;</b>
 141     return tok_identifier;
 142 </pre>
 143 </div>
 144
 145 </div>
 146
 147 <!-- ======================================================================= -->
 148 <div class="doc_subsubsection"><a name="ifast">AST Extensions for
 149  If/Then/Else</a></div>
 150 <!-- ======================================================================= -->
 151
 152 <div class="doc_text">
 153
 154 <p>To represent the new expression we add a new AST node for it:</p>
 155
 156 <div class="doc_code">
 157 <pre>
 158 /// IfExprAST - Expression class for if/then/else.
 159 class IfExprAST : public ExprAST {
 160   ExprAST *Cond, *Then, *Else;
 161 public:
 162   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
 163     : Cond(cond), Then(then), Else(_else) {}
 164   virtual Value *Codegen();
 165 };
 166 </pre>
 167 </div>
 168
 169 <p>The AST node just has pointers to the various subexpressions.</p>
 170
 171 </div>
 172
 173 <!-- ======================================================================= -->
 174 <div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
 175 If/Then/Else</a></div>
 176 <!-- ======================================================================= -->
 177
 178 <div class="doc_text">
 179
 180 <p>Now that we have the relevant tokens coming from the lexer and we have the
 181 AST node to build, our parsing logic is relatively straight-forward.  First we
 182 define a new parsing function:</p>
 183
 184 <div class="doc_code">
 185 <pre>
 186 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
 187 static ExprAST *ParseIfExpr() {
 188   getNextToken();  // eat the if.
 189
 190   // condition.
 191   ExprAST *Cond = ParseExpression();
 192   if (!Cond) return 0;
 193
 194   if (CurTok != tok_then)
 195     return Error("expected then");
 196   getNextToken();  // eat the then
 197
 198   ExprAST *Then = ParseExpression();
 199   if (Then == 0) return 0;
 200
 201   if (CurTok != tok_else)
 202     return Error("expected else");
 203
 204   getNextToken();
 205
 206   ExprAST *Else = ParseExpression();
 207   if (!Else) return 0;
 208
 209   return new IfExprAST(Cond, Then, Else);
 210 }
 211 </pre>
 212 </div>
 213
 214 <p>Next we hook it up as a primary expression:</p>
 215
 216 <div class="doc_code">
 217 <pre>
 218 static ExprAST *ParsePrimary() {
 219   switch (CurTok) {
 220   default: return Error("unknown token when expecting an expression");
 221   case tok_identifier: return ParseIdentifierExpr();
 222   case tok_number:     return ParseNumberExpr();
 223   case '(':            return ParseParenExpr();
 224   <b>case tok_if:         return ParseIfExpr();</b>
 225   }
 226 }
 227 </pre>
 228 </div>
 229
 230 </div>
 231
 232 <!-- ======================================================================= -->
 233 <div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
 234 <!-- ======================================================================= -->
 235
 236 <div class="doc_text">
 237
 238 <p>Now that we have it parsing and building the AST, the final piece is adding
 239 LLVM code generation support.  This is the most interesting part of the
 240 if/then/else example, because this is where it starts to introduce new concepts.
 241 All of the code above has been described in previous chapters fairly thoroughly.
 242 </p>
 243
 244 <p>To motivate the code we want to produce, lets take a look at a simple
 245 example.  Consider:</p>
 246
 247 <div class="doc_code">
 248 <pre>
 249 extern foo();
 250 extern bar();
 251 def baz(x) if x then foo() else bar();
 252 </pre>
 253 </div>
 254
 255 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
 256 looks like this:</p>
 257
 258 <div class="doc_code">
 259 <pre>
 260 declare double @foo()
 261
 262 declare double @bar()
 263
 264 define double @baz(double %x) {
 265 entry:
 266         %ifcond = fcmp one double %x, 0.000000e+00
 267         br i1 %ifcond, label %then, label %else
 268
 269 then:           ; preds = %entry
 270         %calltmp = call double @foo()
 271         br label %ifcont
 272
 273 else:           ; preds = %entry
 274         %calltmp1 = call double @bar()
 275         br label %ifcont
 276
 277 ifcont:         ; preds = %else, %then
 278         %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
 279         ret double %iftmp
 280 }
 281 </pre>
 282 </div>
 283
 284 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
 285 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool.  If you put this LLVM IR
 286 into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
 287 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
 288 see this graph:</p>
 289
 290 <center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
 291 height="315"></center>
 292
 293 <p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
 294 "<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
 295 inserting actual calls into the code and recompiling or by calling these in the
 296 debugger.  LLVM has many nice features for visualizing various graphs.</p>
 297
 298 <p>Coming back to the generated code, it is fairly simple: the entry block
 299 evaluates the conditional expression ("x" in our case here) and compares the
 300 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
 301 instruction ('one' is "ordered and not equal").  Based on the result of this
 302 expression, the code jumps to either the "then" or "else" blocks, which contain
 303 the expressions for the true/false case.</p>
 304
 305 <p>Once the then/else blocks is finished executing, they both branch back to the
 306 else block to execute the code that happens after the if/then/else.  In this
 307 case the only thing left to do is to return to the caller of the function.  The
 308 question then becomes: how does the code know which expression to return?</p>
 309
 310 <p>The answer to this question involves an important SSA operation: the
 311 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
 312 operation</a>.  If you're not familiar with SSA, <a
 313 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
 314 article</a> is a good introduction and there are various other introductions to
 315 it available on your favorite search engine.  The short version is that
 316 "execution" of the Phi operation requires "remembering" which block control came
 317 from.  The Phi operation takes on the value corresponding to the input control
 318 block.  In this case, if control comes in from the "then" block, it gets the
 319 value of "calltmp".  If control comes from the "else" block, it gets the value
 320 of "calltmp1".</p>
 321
 322 <p>At this point, you are probably starting to think "on no! this means my
 323 simple and elegant front-end will have to start generating SSA form in order to
 324 use LLVM!".  Fortunately, this is not the case, and we strongly advise
 325 <em>not</em> implementing an SSA construction algorithm in your front-end
 326 unless there is an amazingly good reason to do so.  In practice, there are two
 327 sorts of values that float around in code written in your average imperative
 328 programming language that might need Phi nodes:</p>
 329
 330 <ol>
 331 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
 332 <li>Values that are implicit in the structure of your AST, such as the phi node
 333 in this case.</li>
 334 </ol>
 335
 336 <p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
 337 variables"), we'll talk about #1
 338 in depth.  For now, just believe me that you don't need SSA construction to
 339 handle them.  For #2, you have the choice of using the techniques that we will
 340 describe for #1, or you can insert Phi nodes directly if convenient.  In this
 341 case, it is really really easy to generate the Phi node, so we choose to do it
 342 directly.</p>
 343
 344 <p>Okay, enough of the motivation and overview, lets generate code!</p>
 345
 346 </div>
 347
 348 <!-- ======================================================================= -->
 349 <div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
 350 If/Then/Else</a></div>
 351 <!-- ======================================================================= -->
 352
 353 <div class="doc_text">
 354
 355 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
 356 for <tt>IfExprAST</tt>:</p>
 357
 358 <div class="doc_code">
 359 <pre>
 360 Value *IfExprAST::Codegen() {
 361   Value *CondV = Cond-&gt;Codegen();
 362   if (CondV == 0) return 0;
 363
 364   // Convert condition to a bool by comparing equal to 0.0.
 365   CondV = Builder.CreateFCmpONE(CondV,
 366                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 367                                 "ifcond");
 368 </pre>
 369 </div>
 370
 371 <p>This code is straight-forward and similar to what we saw before.  We emit the
 372 expression for the condition, then compare that value to zero to get a truth
 373 value as a 1-bit (bool) value.</p>
 374
 375 <div class="doc_code">
 376 <pre>
 377   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 378
 379   // Create blocks for the then and else cases.  Insert the 'then' block at the
 380   // end of the function.
 381   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
 382   BasicBlock *ElseBB = new BasicBlock("else");
 383   BasicBlock *MergeBB = new BasicBlock("ifcont");
 384
 385   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
 386 </pre>
 387 </div>
 388
 389 <p>This code creates the basic blocks that are related to the if/then/else
 390 statement, and correspond directly to the blocks in the example above.  The
 391 first line of this gets the current Function object that is being built.  It
 392 gets this by asking the builder for the current BasicBlock, and asking that
 393 block for its "parent" (the function it is currently embedded into).</p>
 394
 395 <p>Once it has that, it creates three blocks.  Note that it passes "TheFunction"
 396 into the constructor for the "then" block.  This causes the constructor to
 397 automatically insert the new block onto the end of the specified function.  The
 398 other two blocks are created, but aren't yet inserted into the function.</p>
 399
 400 <p>Once the blocks are created, we can emit the conditional branch that chooses
 401 between them.  Note that creating new blocks does not implicitly affect the
 402 LLVMBuilder, so it is still inserting into the block that the condition
 403 went into.  Also note that it is creating a branch to the "then" block and the
 404 "else" block, even though the "else" block isn't inserted into the function yet.
 405 This is all ok: it is the standard way that LLVM supports forward
 406 references.</p>
 407
 408 <div class="doc_code">
 409 <pre>
 410   // Emit then value.
 411   Builder.SetInsertPoint(ThenBB);
 412
 413   Value *ThenV = Then-&gt;Codegen();
 414   if (ThenV == 0) return 0;
 415
 416   Builder.CreateBr(MergeBB);
 417   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
 418   ThenBB = Builder.GetInsertBlock();
 419 </pre>
 420 </div>
 421
 422 <p>After the conditional branch is inserted, we move the builder to start
 423 inserting into the "then" block.  Strictly speaking, this call moves the
 424 insertion point to be at the end of the specified block.  However, since the
 425 "then" block is empty, it also starts out by inserting at the beginning of the
 426 block.  :)</p>
 427
 428 <p>Once the insertion point is set, we recursively codegen the "then" expression
 429 from the AST.  To finish off the then block, we create an unconditional branch
 430 to the merge block.  One interesting (and very important) aspect of the LLVM IR
 431 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
 432 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
 433 instruction</a> such as return or branch.  This means that all control flow,
 434 <em>including fall throughs</em> must be made explicit in the LLVM IR.  If you
 435 violate this rule, the verifier will emit an error.</p>
 436
 437 <p>The final line here is quite subtle, but is very important.  The basic issue
 438 is that when we create the Phi node in the merge block, we need to set up the
 439 block/value pairs that indicate how the Phi will work.  Importantly, the Phi
 440 node expects to have an entry for each predecessor of the block in the CFG.  Why
 441 then are we getting the current block when we just set it to ThenBB 5 lines
 442 above?  The problem is that the "Then" expression may actually itself change the
 443 block that the Builder is emitting into if, for example, it contains a nested
 444 "if/then/else" expression.  Because calling Codegen recursively could
 445 arbitrarily change the notion of the current block, we are required to get an
 446 up-to-date value for code that will set up the Phi node.</p>
 447
 448 <div class="doc_code">
 449 <pre>
 450   // Emit else block.
 451   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
 452   Builder.SetInsertPoint(ElseBB);
 453
 454   Value *ElseV = Else-&gt;Codegen();
 455   if (ElseV == 0) return 0;
 456
 457   Builder.CreateBr(MergeBB);
 458   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
 459   ElseBB = Builder.GetInsertBlock();
 460 </pre>
 461 </div>
 462
 463 <p>Code generation for the 'else' block is basically identical to codegen for
 464 the 'then' block.  The only significant difference is the first line, which adds
 465 the 'else' block to the function.  Recall previously that the 'else' block was
 466 created, but not added to the function.  Now that the 'then' and 'else' blocks
 467 are emitted, we can finish up with the merge code:</p>
 468
 469 <div class="doc_code">
 470 <pre>
 471   // Emit merge block.
 472   TheFunction->getBasicBlockList().push_back(MergeBB);
 473   Builder.SetInsertPoint(MergeBB);
 474   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
 475
 476   PN->addIncoming(ThenV, ThenBB);
 477   PN->addIncoming(ElseV, ElseBB);
 478   return PN;
 479 }
 480 </pre>
 481 </div>
 482
 483 <p>The first two lines here are now familiar: the first adds the "merge" block
 484 to the Function object (it was previously floating, like the else block above).
 485 The second block changes the insertion point so that newly created code will go
 486 into the "merge" block.  Once that is done, we need to create the PHI node and
 487 set up the block/value pairs for the PHI.</p>
 488
 489 <p>Finally, the CodeGen function returns the phi node as the value computed by
 490 the if/then/else expression.  In our example above, this returned value will
 491 feed into the code for the top-level function, which will create the return
 492 instruction.</p>
 493
 494 <p>Overall, we now have the ability to execution conditional code in
 495 Kaleidoscope.  With this extension, Kaleidoscope is a fairly complete language
 496 that can calculate a wide variety of numeric functions.  Next up we'll add
 497 another useful expression that is familiar from non-functional languages...</p>
 498
 499 </div>
 500
 501 <!-- *********************************************************************** -->
 502 <div class="doc_section"><a name="for">'for' Loop Expression</a></div>
 503 <!-- *********************************************************************** -->
 504
 505 <div class="doc_text">
 506
 507 <p>Now that we know how to add basic control flow constructs to the language,
 508 we have the tools to add more powerful things.  Lets add something more
 509 aggressive, a 'for' expression:</p>
 510
 511 <div class="doc_code">
 512 <pre>
 513  extern putchard(char)
 514  def printstar(n)
 515    for i = 1, i &lt; n, 1.0 in
 516      putchard(42);  # ascii 42 = '*'
 517
 518  # print 100 '*' characters
 519  printstar(100);
 520 </pre>
 521 </div>
 522
 523 <p>This expression defines a new variable ("i" in this case) which iterates from
 524 a starting value, while the condition ("i &lt; n" in this case) is true,
 525 incrementing by an optional step value ("1.0" in this case).  If the step value
 526 is omitted, it defaults to 1.0.  While the loop is true, it executes its
 527 body expression.  Because we don't have anything better to return, we'll just
 528 define the loop as always returning 0.0.  In the future when we have mutable
 529 variables, it will get more useful.</p>
 530
 531 <p>As before, lets talk about the changes that we need to Kaleidoscope to
 532 support this.</p>
 533
 534 </div>
 535
 536 <!-- ======================================================================= -->
 537 <div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
 538 the 'for' Loop</a></div>
 539 <!-- ======================================================================= -->
 540
 541 <div class="doc_text">
 542
 543 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
 544
 545 <div class="doc_code">
 546 <pre>
 547   ... in enum Token ...
 548   // control
 549   tok_if = -6, tok_then = -7, tok_else = -8,
 550 <b>  tok_for = -9, tok_in = -10</b>
 551
 552   ... in gettok ...
 553   if (IdentifierStr == "def") return tok_def;
 554   if (IdentifierStr == "extern") return tok_extern;
 555   if (IdentifierStr == "if") return tok_if;
 556   if (IdentifierStr == "then") return tok_then;
 557   if (IdentifierStr == "else") return tok_else;
 558   <b>if (IdentifierStr == "for") return tok_for;
 559   if (IdentifierStr == "in") return tok_in;</b>
 560   return tok_identifier;
 561 </pre>
 562 </div>
 563
 564 </div>
 565
 566 <!-- ======================================================================= -->
 567 <div class="doc_subsubsection"><a name="forast">AST Extensions for
 568 the 'for' Loop</a></div>
 569 <!-- ======================================================================= -->
 570
 571 <div class="doc_text">
 572
 573 <p>The AST node is similarly simple.  It basically boils down to capturing
 574 the variable name and the consituent expressions in the node.</p>
 575
 576 <div class="doc_code">
 577 <pre>
 578 /// ForExprAST - Expression class for for/in.
 579 class ForExprAST : public ExprAST {
 580   std::string VarName;
 581   ExprAST *Start, *End, *Step, *Body;
 582 public:
 583   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
 584              ExprAST *step, ExprAST *body)
 585     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
 586   virtual Value *Codegen();
 587 };
 588 </pre>
 589 </div>
 590
 591 </div>
 592
 593 <!-- ======================================================================= -->
 594 <div class="doc_subsubsection"><a name="forparser">Parser Extensions for
 595 the 'for' Loop</a></div>
 596 <!-- ======================================================================= -->
 597
 598 <div class="doc_text">
 599
 600 <p>The parser code is also fairly standard.  The only interesting thing here is
 601 handling of the optional step value.  The parser code handles it by checking to
 602 see if the second comma is present.  If not, it sets the step value to null in
 603 the AST node:</p>
 604
 605 <div class="doc_code">
 606 <pre>
 607 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
 608 static ExprAST *ParseForExpr() {
 609   getNextToken();  // eat the for.
 610
 611   if (CurTok != tok_identifier)
 612     return Error("expected identifier after for");
 613
 614   std::string IdName = IdentifierStr;
 615   getNextToken();  // eat identifier.
 616
 617   if (CurTok != '=')
 618     return Error("expected '=' after for");
 619   getNextToken();  // eat '='.
 620
 621
 622   ExprAST *Start = ParseExpression();
 623   if (Start == 0) return 0;
 624   if (CurTok != ',')
 625     return Error("expected ',' after for start value");
 626   getNextToken();
 627
 628   ExprAST *End = ParseExpression();
 629   if (End == 0) return 0;
 630
 631   // The step value is optional.
 632   ExprAST *Step = 0;
 633   if (CurTok == ',') {
 634     getNextToken();
 635     Step = ParseExpression();
 636     if (Step == 0) return 0;
 637   }
 638
 639   if (CurTok != tok_in)
 640     return Error("expected 'in' after for");
 641   getNextToken();  // eat 'in'.
 642
 643   ExprAST *Body = ParseExpression();
 644   if (Body == 0) return 0;
 645
 646   return new ForExprAST(IdName, Start, End, Step, Body);
 647 }
 648 </pre>
 649 </div>
 650
 651 </div>
 652
 653 <!-- ======================================================================= -->
 654 <div class="doc_subsubsection"><a name="forir">LLVM IR for
 655 the 'for' Loop</a></div>
 656 <!-- ======================================================================= -->
 657
 658 <div class="doc_text">
 659
 660 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
 661 With the simple example above, we get this LLVM IR (note that this dump is
 662 generated with optimizations disabled):
 663 </p>
 664
 665 <div class="doc_code">
 666 <pre>
 667 declare double @putchard(double)
 668
 669 define double @printstar(double %n) {
 670 entry:
 671         ; initial value = 1.0 (inlined into phi)
 672         br label %loop
 673
 674 loop:           ; preds = %loop, %entry
 675         %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
 676         ; body
 677         %calltmp = call double @putchard( double 4.200000e+01 )
 678         ; increment
 679         %nextvar = add double %i, 1.000000e+00
 680
 681         ; termination test
 682         %multmp = fcmp ult double %i, %n
 683         %booltmp = uitofp i1 %multmp to double
 684         %loopcond = fcmp one double %booltmp, 0.000000e+00
 685         br i1 %loopcond, label %loop, label %afterloop
 686
 687 afterloop:              ; preds = %loop
 688         ; loop always returns 0.0
 689         ret double 0.000000e+00
 690 }
 691 </pre>
 692 </div>
 693
 694 <p>This loop contains all the same constructs we saw before: a phi node, several
 695 expressions, and some basic blocks.  Lets see how this fits together.</p>
 696
 697 </div>
 698
 699 <!-- ======================================================================= -->
 700 <div class="doc_subsubsection"><a name="forcodegen">Code Generation for
 701 the 'for' Loop</a></div>
 702 <!-- ======================================================================= -->
 703
 704 <div class="doc_text">
 705
 706 <p>The first part of codegen is very simple: we just output the start expression
 707 for the loop value:</p>
 708
 709 <div class="doc_code">
 710 <pre>
 711 Value *ForExprAST::Codegen() {
 712   // Emit the start code first, without 'variable' in scope.
 713   Value *StartVal = Start-&gt;Codegen();
 714   if (StartVal == 0) return 0;
 715 </pre>
 716 </div>
 717
 718 <p>With this out of the way, the next step is to set up the LLVM basic block
 719 for the start of the loop body.  In the case above, the whole loop body is one
 720 block, but remember that the body code itself could consist of multiple blocks
 721 (e.g. if it is a if/then/else expression).</p>
 722
 723 <div class="doc_code">
 724 <pre>
 725   // Make the new basic block for the loop header, inserting after current
 726   // block.
 727   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 728   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
 729   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
 730
 731   // Insert an explicit fall through from the current block to the LoopBB.
 732   Builder.CreateBr(LoopBB);
 733 </pre>
 734 </div>
 735
 736 <p>This code is similar to what we saw for if/then/else.  Because we will need
 737 it to create the Phi node, we remember the block that falls through into the
 738 loop.  Once we have that, we create the actual block that starts the loop and
 739 create an unconditional branch for the fall-through between the two blocks.</p>
 740
 741 <div class="doc_code">
 742 <pre>
 743   // Start insertion in LoopBB.
 744   Builder.SetInsertPoint(LoopBB);
 745
 746   // Start the PHI node with an entry for Start.
 747   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
 748   Variable-&gt;addIncoming(StartVal, PreheaderBB);
 749 </pre>
 750 </div>
 751
 752 <p>Now that the "preheader" for the loop is set up, we switch to emitting code
 753 for the loop body.  To begin with, we move the insertion point and create the
 754 PHI node for the loop induction variable.  SInce we already know the incoming
 755 value for the starting value, we add it to the Phi node.  Note that the Phi will
 756 eventually get a second value for the backedge, but we can't set it up yet
 757 (because it doesn't exist!).</p>
 758
 759 <div class="doc_code">
 760 <pre>
 761   // Within the loop, the variable is defined equal to the PHI node.  If it
 762   // shadows an existing variable, we have to restore it, so save it now.
 763   Value *OldVal = NamedValues[VarName];
 764   NamedValues[VarName] = Variable;
 765
 766   // Emit the body of the loop.  This, like any other expr, can change the
 767   // current BB.  Note that we ignore the value computed by the body, but don't
 768   // allow an error.
 769   if (Body-&gt;Codegen() == 0)
 770     return 0;
 771 </pre>
 772 </div>
 773
 774 <p>Now the code starts to get more interesting.  Our 'for' loop introduces a new
 775 variable to the symbol table.  This means that our symbol table can now contain
 776 either function arguments or loop variables.  To handle this, before we codegen
 777 the body of the loop, we add the loop variable as the current value for its
 778 name.  Note that it is possible that there is a variable of the same name in the
 779 outer scope.  It would be easy to make this an error (emit an error and return
 780 null if there is already an entry for VarName) but we choose to allow shadowing
 781 of variables.  In order to handle this correctly, we remember the Value that
 782 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
 783 no shadowed variable).</p>
 784
 785 <p>Once the loop variable is set into the symbol table, the code recursively
 786 codegen's the body.  This allows the body to use the loop variable: any
 787 references to it will naturally find it in the symbol table.</p>
 788
 789 <div class="doc_code">
 790 <pre>
 791   // Emit the step value.
 792   Value *StepVal;
 793   if (Step) {
 794     StepVal = Step-&gt;Codegen();
 795     if (StepVal == 0) return 0;
 796   } else {
 797     // If not specified, use 1.0.
 798     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
 799   }
 800
 801   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
 802 </pre>
 803 </div>
 804
 805 <p>Now that the body is emitted, we compute the next value of the iteration
 806 variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
 807 will be the value of the loop variable on the next iteration of the loop.</p>
 808
 809 <div class="doc_code">
 810 <pre>
 811   // Compute the end condition.
 812   Value *EndCond = End-&gt;Codegen();
 813   if (EndCond == 0) return EndCond;
 814
 815   // Convert condition to a bool by comparing equal to 0.0.
 816   EndCond = Builder.CreateFCmpONE(EndCond,
 817                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 818                                   "loopcond");
 819 </pre>
 820 </div>
 821
 822 <p>Finally, we evaluate the exit value of the loop, to determine whether the
 823 loop should exit.  This mirrors the condition evaluation for the if/then/else
 824 statement.</p>
 825
 826 <div class="doc_code">
 827 <pre>
 828   // Create the "after loop" block and insert it.
 829   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
 830   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
 831
 832   // Insert the conditional branch into the end of LoopEndBB.
 833   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
 834
 835   // Any new code will be inserted in AfterBB.
 836   Builder.SetInsertPoint(AfterBB);
 837 </pre>
 838 </div>
 839
 840 <p>With the code for the body of the loop complete, we just need to finish up
 841 the control flow for it.  This remembers the end block (for the phi node), then
 842 creates the block for the loop exit ("afterloop").  Based on the value of the
 843 exit condition, it creates a conditional branch that chooses between executing
 844 the loop again and exiting the loop.  Any future code is emitted in the
 845 "afterloop" block, so it sets the insertion position to it.</p>
 846
 847 <div class="doc_code">
 848 <pre>
 849   // Add a new entry to the PHI node for the backedge.
 850   Variable-&gt;addIncoming(NextVar, LoopEndBB);
 851
 852   // Restore the unshadowed variable.
 853   if (OldVal)
 854     NamedValues[VarName] = OldVal;
 855   else
 856     NamedValues.erase(VarName);
 857
 858   // for expr always returns 0.0.
 859   return Constant::getNullValue(Type::DoubleTy);
 860 }
 861 </pre>
 862 </div>
 863
 864 <p>The final code handles various cleanups: now that we have the "NextVar"
 865 value, we can add the incoming value to the loop PHI node.  After that, we
 866 remove the loop variable from the symbol table, so that it isn't in scope after
 867 the for loop.  Finally, code generation of the for loop always returns 0.0, so
 868 that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
 869
 870 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
 871 the tutorial.  We added two control flow constructs, and used them to motivate
 872 a couple of aspects of the LLVM IR that are important for front-end implementors
 873 to know.  In the next chapter of our saga, we will get a bit crazier and add
 874 operator overloading to our poor innocent language.</p>
 875
 876 </div>
 877
 878 <!-- *********************************************************************** -->
 879 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 880 <!-- *********************************************************************** -->
 881
 882 <div class="doc_text">
 883
 884 <p>
 885 Here is the complete code listing for our running example, enhanced with the
 886 if/then/else and for expressions..  To build this example, use:
 887 </p>
 888
 889 <div class="doc_code">
 890 <pre>
 891    # Compile
 892    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 893    # Run
 894    ./toy
 895 </pre>
 896 </div>
 897
 898 <p>Here is the code:</p>
 899
 900 <div class="doc_code">
 901 <pre>
 902 #include "llvm/DerivedTypes.h"
 903 #include "llvm/ExecutionEngine/ExecutionEngine.h"
 904 #include "llvm/Module.h"
 905 #include "llvm/ModuleProvider.h"
 906 #include "llvm/PassManager.h"
 907 #include "llvm/Analysis/Verifier.h"
 908 #include "llvm/Target/TargetData.h"
 909 #include "llvm/Transforms/Scalar.h"
 910 #include "llvm/Support/LLVMBuilder.h"
 911 #include &lt;cstdio&gt;
 912 #include &lt;string&gt;
 913 #include &lt;map&gt;
 914 #include &lt;vector&gt;
 915 using namespace llvm;
 916
 917 //===----------------------------------------------------------------------===//
 918 // Lexer
 919 //===----------------------------------------------------------------------===//
 920
 921 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 922 // of these for known things.
 923 enum Token {
 924   tok_eof = -1,
 925
 926   // commands
 927   tok_def = -2, tok_extern = -3,
 928
 929   // primary
 930   tok_identifier = -4, tok_number = -5,
 931
 932   // control
 933   tok_if = -6, tok_then = -7, tok_else = -8,
 934   tok_for = -9, tok_in = -10
 935 };
 936
 937 static std::string IdentifierStr;  // Filled in if tok_identifier
 938 static double NumVal;              // Filled in if tok_number
 939
 940 /// gettok - Return the next token from standard input.
 941 static int gettok() {
 942   static int LastChar = ' ';
 943
 944   // Skip any whitespace.
 945   while (isspace(LastChar))
 946     LastChar = getchar();
 947
 948   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
 949     IdentifierStr = LastChar;
 950     while (isalnum((LastChar = getchar())))
 951       IdentifierStr += LastChar;
 952
 953     if (IdentifierStr == "def") return tok_def;
 954     if (IdentifierStr == "extern") return tok_extern;
 955     if (IdentifierStr == "if") return tok_if;
 956     if (IdentifierStr == "then") return tok_then;
 957     if (IdentifierStr == "else") return tok_else;
 958     if (IdentifierStr == "for") return tok_for;
 959     if (IdentifierStr == "in") return tok_in;
 960     return tok_identifier;
 961   }
 962
 963   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
 964     std::string NumStr;
 965     do {
 966       NumStr += LastChar;
 967       LastChar = getchar();
 968     } while (isdigit(LastChar) || LastChar == '.');
 969
 970     NumVal = strtod(NumStr.c_str(), 0);
 971     return tok_number;
 972   }
 973
 974   if (LastChar == '#') {
 975     // Comment until end of line.
 976     do LastChar = getchar();
 977     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
 978
 979     if (LastChar != EOF)
 980       return gettok();
 981   }
 982
 983   // Check for end of file.  Don't eat the EOF.
 984   if (LastChar == EOF)
 985     return tok_eof;
 986
 987   // Otherwise, just return the character as its ascii value.
 988   int ThisChar = LastChar;
 989   LastChar = getchar();
 990   return ThisChar;
 991 }
 992
 993 //===----------------------------------------------------------------------===//
 994 // Abstract Syntax Tree (aka Parse Tree)
 995 //===----------------------------------------------------------------------===//
 996
 997 /// ExprAST - Base class for all expression nodes.
 998 class ExprAST {
 999 public:
1000   virtual ~ExprAST() {}
1001   virtual Value *Codegen() = 0;
1002 };
1003
1004 /// NumberExprAST - Expression class for numeric literals like "1.0".
1005 class NumberExprAST : public ExprAST {
1006   double Val;
1007 public:
1008   NumberExprAST(double val) : Val(val) {}
1009   virtual Value *Codegen();
1010 };
1011
1012 /// VariableExprAST - Expression class for referencing a variable, like "a".
1013 class VariableExprAST : public ExprAST {
1014   std::string Name;
1015 public:
1016   VariableExprAST(const std::string &amp;name) : Name(name) {}
1017   virtual Value *Codegen();
1018 };
1019
1020 /// BinaryExprAST - Expression class for a binary operator.
1021 class BinaryExprAST : public ExprAST {
1022   char Op;
1023   ExprAST *LHS, *RHS;
1024 public:
1025   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1026     : Op(op), LHS(lhs), RHS(rhs) {}
1027   virtual Value *Codegen();
1028 };
1029
1030 /// CallExprAST - Expression class for function calls.
1031 class CallExprAST : public ExprAST {
1032   std::string Callee;
1033   std::vector&lt;ExprAST*&gt; Args;
1034 public:
1035   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1036     : Callee(callee), Args(args) {}
1037   virtual Value *Codegen();
1038 };
1039
1040 /// IfExprAST - Expression class for if/then/else.
1041 class IfExprAST : public ExprAST {
1042   ExprAST *Cond, *Then, *Else;
1043 public:
1044   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1045   : Cond(cond), Then(then), Else(_else) {}
1046   virtual Value *Codegen();
1047 };
1048
1049 /// ForExprAST - Expression class for for/in.
1050 class ForExprAST : public ExprAST {
1051   std::string VarName;
1052   ExprAST *Start, *End, *Step, *Body;
1053 public:
1054   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1055              ExprAST *step, ExprAST *body)
1056     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1057   virtual Value *Codegen();
1058 };
1059
1060 /// PrototypeAST - This class represents the "prototype" for a function,
1061 /// which captures its argument names as well as if it is an operator.
1062 class PrototypeAST {
1063   std::string Name;
1064   std::vector&lt;std::string&gt; Args;
1065 public:
1066   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1067     : Name(name), Args(args) {}
1068
1069   Function *Codegen();
1070 };
1071
1072 /// FunctionAST - This class represents a function definition itself.
1073 class FunctionAST {
1074   PrototypeAST *Proto;
1075   ExprAST *Body;
1076 public:
1077   FunctionAST(PrototypeAST *proto, ExprAST *body)
1078     : Proto(proto), Body(body) {}
1079
1080   Function *Codegen();
1081 };
1082
1083 //===----------------------------------------------------------------------===//
1084 // Parser
1085 //===----------------------------------------------------------------------===//
1086
1087 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1088 /// token the parser it looking at.  getNextToken reads another token from the
1089 /// lexer and updates CurTok with its results.
1090 static int CurTok;
1091 static int getNextToken() {
1092   return CurTok = gettok();
1093 }
1094
1095 /// BinopPrecedence - This holds the precedence for each binary operator that is
1096 /// defined.
1097 static std::map&lt;char, int&gt; BinopPrecedence;
1098
1099 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1100 static int GetTokPrecedence() {
1101   if (!isascii(CurTok))
1102     return -1;
1103
1104   // Make sure it's a declared binop.
1105   int TokPrec = BinopPrecedence[CurTok];
1106   if (TokPrec &lt;= 0) return -1;
1107   return TokPrec;
1108 }
1109
1110 /// Error* - These are little helper functions for error handling.
1111 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1112 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1113 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1114
1115 static ExprAST *ParseExpression();
1116
1117 /// identifierexpr
1118 ///   ::= identifier
1119 ///   ::= identifier '(' expression* ')'
1120 static ExprAST *ParseIdentifierExpr() {
1121   std::string IdName = IdentifierStr;
1122
1123   getNextToken();  // eat identifier.
1124
1125   if (CurTok != '(') // Simple variable ref.
1126     return new VariableExprAST(IdName);
1127
1128   // Call.
1129   getNextToken();  // eat (
1130   std::vector&lt;ExprAST*&gt; Args;
1131   if (CurTok != ')') {
1132     while (1) {
1133       ExprAST *Arg = ParseExpression();
1134       if (!Arg) return 0;
1135       Args.push_back(Arg);
1136
1137       if (CurTok == ')') break;
1138
1139       if (CurTok != ',')
1140         return Error("Expected ')'");
1141       getNextToken();
1142     }
1143   }
1144
1145   // Eat the ')'.
1146   getNextToken();
1147
1148   return new CallExprAST(IdName, Args);
1149 }
1150
1151 /// numberexpr ::= number
1152 static ExprAST *ParseNumberExpr() {
1153   ExprAST *Result = new NumberExprAST(NumVal);
1154   getNextToken(); // consume the number
1155   return Result;
1156 }
1157
1158 /// parenexpr ::= '(' expression ')'
1159 static ExprAST *ParseParenExpr() {
1160   getNextToken();  // eat (.
1161   ExprAST *V = ParseExpression();
1162   if (!V) return 0;
1163
1164   if (CurTok != ')')
1165     return Error("expected ')'");
1166   getNextToken();  // eat ).
1167   return V;
1168 }
1169
1170 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1171 static ExprAST *ParseIfExpr() {
1172   getNextToken();  // eat the if.
1173
1174   // condition.
1175   ExprAST *Cond = ParseExpression();
1176   if (!Cond) return 0;
1177
1178   if (CurTok != tok_then)
1179     return Error("expected then");
1180   getNextToken();  // eat the then
1181
1182   ExprAST *Then = ParseExpression();
1183   if (Then == 0) return 0;
1184
1185   if (CurTok != tok_else)
1186     return Error("expected else");
1187
1188   getNextToken();
1189
1190   ExprAST *Else = ParseExpression();
1191   if (!Else) return 0;
1192
1193   return new IfExprAST(Cond, Then, Else);
1194 }
1195
1196 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1197 static ExprAST *ParseForExpr() {
1198   getNextToken();  // eat the for.
1199
1200   if (CurTok != tok_identifier)
1201     return Error("expected identifier after for");
1202
1203   std::string IdName = IdentifierStr;
1204   getNextToken();  // eat identifier.
1205
1206   if (CurTok != '=')
1207     return Error("expected '=' after for");
1208   getNextToken();  // eat '='.
1209
1210
1211   ExprAST *Start = ParseExpression();
1212   if (Start == 0) return 0;
1213   if (CurTok != ',')
1214     return Error("expected ',' after for start value");
1215   getNextToken();
1216
1217   ExprAST *End = ParseExpression();
1218   if (End == 0) return 0;
1219
1220   // The step value is optional.
1221   ExprAST *Step = 0;
1222   if (CurTok == ',') {
1223     getNextToken();
1224     Step = ParseExpression();
1225     if (Step == 0) return 0;
1226   }
1227
1228   if (CurTok != tok_in)
1229     return Error("expected 'in' after for");
1230   getNextToken();  // eat 'in'.
1231
1232   ExprAST *Body = ParseExpression();
1233   if (Body == 0) return 0;
1234
1235   return new ForExprAST(IdName, Start, End, Step, Body);
1236 }
1237
1238
1239 /// primary
1240 ///   ::= identifierexpr
1241 ///   ::= numberexpr
1242 ///   ::= parenexpr
1243 ///   ::= ifexpr
1244 ///   ::= forexpr
1245 static ExprAST *ParsePrimary() {
1246   switch (CurTok) {
1247   default: return Error("unknown token when expecting an expression");
1248   case tok_identifier: return ParseIdentifierExpr();
1249   case tok_number:     return ParseNumberExpr();
1250   case '(':            return ParseParenExpr();
1251   case tok_if:         return ParseIfExpr();
1252   case tok_for:        return ParseForExpr();
1253   }
1254 }
1255
1256 /// binoprhs
1257 ///   ::= ('+' primary)*
1258 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1259   // If this is a binop, find its precedence.
1260   while (1) {
1261     int TokPrec = GetTokPrecedence();
1262
1263     // If this is a binop that binds at least as tightly as the current binop,
1264     // consume it, otherwise we are done.
1265     if (TokPrec &lt; ExprPrec)
1266       return LHS;
1267
1268     // Okay, we know this is a binop.
1269     int BinOp = CurTok;
1270     getNextToken();  // eat binop
1271
1272     // Parse the primary expression after the binary operator.
1273     ExprAST *RHS = ParsePrimary();
1274     if (!RHS) return 0;
1275
1276     // If BinOp binds less tightly with RHS than the operator after RHS, let
1277     // the pending operator take RHS as its LHS.
1278     int NextPrec = GetTokPrecedence();
1279     if (TokPrec &lt; NextPrec) {
1280       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1281       if (RHS == 0) return 0;
1282     }
1283
1284     // Merge LHS/RHS.
1285     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1286   }
1287 }
1288
1289 /// expression
1290 ///   ::= primary binoprhs
1291 ///
1292 static ExprAST *ParseExpression() {
1293   ExprAST *LHS = ParsePrimary();
1294   if (!LHS) return 0;
1295
1296   return ParseBinOpRHS(0, LHS);
1297 }
1298
1299 /// prototype
1300 ///   ::= id '(' id* ')'
1301 static PrototypeAST *ParsePrototype() {
1302   if (CurTok != tok_identifier)
1303     return ErrorP("Expected function name in prototype");
1304
1305   std::string FnName = IdentifierStr;
1306   getNextToken();
1307
1308   if (CurTok != '(')
1309     return ErrorP("Expected '(' in prototype");
1310
1311   std::vector&lt;std::string&gt; ArgNames;
1312   while (getNextToken() == tok_identifier)
1313     ArgNames.push_back(IdentifierStr);
1314   if (CurTok != ')')
1315     return ErrorP("Expected ')' in prototype");
1316
1317   // success.
1318   getNextToken();  // eat ')'.
1319
1320   return new PrototypeAST(FnName, ArgNames);
1321 }
1322
1323 /// definition ::= 'def' prototype expression
1324 static FunctionAST *ParseDefinition() {
1325   getNextToken();  // eat def.
1326   PrototypeAST *Proto = ParsePrototype();
1327   if (Proto == 0) return 0;
1328
1329   if (ExprAST *E = ParseExpression())
1330     return new FunctionAST(Proto, E);
1331   return 0;
1332 }
1333
1334 /// toplevelexpr ::= expression
1335 static FunctionAST *ParseTopLevelExpr() {
1336   if (ExprAST *E = ParseExpression()) {
1337     // Make an anonymous proto.
1338     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1339     return new FunctionAST(Proto, E);
1340   }
1341   return 0;
1342 }
1343
1344 /// external ::= 'extern' prototype
1345 static PrototypeAST *ParseExtern() {
1346   getNextToken();  // eat extern.
1347   return ParsePrototype();
1348 }
1349
1350 //===----------------------------------------------------------------------===//
1351 // Code Generation
1352 //===----------------------------------------------------------------------===//
1353
1354 static Module *TheModule;
1355 static LLVMFoldingBuilder Builder;
1356 static std::map&lt;std::string, Value*&gt; NamedValues;
1357 static FunctionPassManager *TheFPM;
1358
1359 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1360
1361 Value *NumberExprAST::Codegen() {
1362   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1363 }
1364
1365 Value *VariableExprAST::Codegen() {
1366   // Look this variable up in the function.
1367   Value *V = NamedValues[Name];
1368   return V ? V : ErrorV("Unknown variable name");
1369 }
1370
1371 Value *BinaryExprAST::Codegen() {
1372   Value *L = LHS-&gt;Codegen();
1373   Value *R = RHS-&gt;Codegen();
1374   if (L == 0 || R == 0) return 0;
1375
1376   switch (Op) {
1377   case '+': return Builder.CreateAdd(L, R, "addtmp");
1378   case '-': return Builder.CreateSub(L, R, "subtmp");
1379   case '*': return Builder.CreateMul(L, R, "multmp");
1380   case '&lt;':
1381     L = Builder.CreateFCmpULT(L, R, "multmp");
1382     // Convert bool 0/1 to double 0.0 or 1.0
1383     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1384   default: return ErrorV("invalid binary operator");
1385   }
1386 }
1387
1388 Value *CallExprAST::Codegen() {
1389   // Look up the name in the global module table.
1390   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1391   if (CalleeF == 0)
1392     return ErrorV("Unknown function referenced");
1393
1394   // If argument mismatch error.
1395   if (CalleeF-&gt;arg_size() != Args.size())
1396     return ErrorV("Incorrect # arguments passed");
1397
1398   std::vector&lt;Value*&gt; ArgsV;
1399   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1400     ArgsV.push_back(Args[i]-&gt;Codegen());
1401     if (ArgsV.back() == 0) return 0;
1402   }
1403
1404   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1405 }
1406
1407 Value *IfExprAST::Codegen() {
1408   Value *CondV = Cond-&gt;Codegen();
1409   if (CondV == 0) return 0;
1410
1411   // Convert condition to a bool by comparing equal to 0.0.
1412   CondV = Builder.CreateFCmpONE(CondV,
1413                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1414                                 "ifcond");
1415
1416   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1417
1418   // Create blocks for the then and else cases.  Insert the 'then' block at the
1419   // end of the function.
1420   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1421   BasicBlock *ElseBB = new BasicBlock("else");
1422   BasicBlock *MergeBB = new BasicBlock("ifcont");
1423
1424   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1425
1426   // Emit then value.
1427   Builder.SetInsertPoint(ThenBB);
1428
1429   Value *ThenV = Then-&gt;Codegen();
1430   if (ThenV == 0) return 0;
1431
1432   Builder.CreateBr(MergeBB);
1433   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1434   ThenBB = Builder.GetInsertBlock();
1435
1436   // Emit else block.
1437   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1438   Builder.SetInsertPoint(ElseBB);
1439
1440   Value *ElseV = Else-&gt;Codegen();
1441   if (ElseV == 0) return 0;
1442
1443   Builder.CreateBr(MergeBB);
1444   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1445   ElseBB = Builder.GetInsertBlock();
1446
1447   // Emit merge block.
1448   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1449   Builder.SetInsertPoint(MergeBB);
1450   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1451
1452   PN-&gt;addIncoming(ThenV, ThenBB);
1453   PN-&gt;addIncoming(ElseV, ElseBB);
1454   return PN;
1455 }
1456
1457 Value *ForExprAST::Codegen() {
1458   // Output this as:
1459   //   ...
1460   //   start = startexpr
1461   //   goto loop
1462   // loop:
1463   //   variable = phi [start, loopheader], [nextvariable, loopend]
1464   //   ...
1465   //   bodyexpr
1466   //   ...
1467   // loopend:
1468   //   step = stepexpr
1469   //   nextvariable = variable + step
1470   //   endcond = endexpr
1471   //   br endcond, loop, endloop
1472   // outloop:
1473
1474   // Emit the start code first, without 'variable' in scope.
1475   Value *StartVal = Start-&gt;Codegen();
1476   if (StartVal == 0) return 0;
1477
1478   // Make the new basic block for the loop header, inserting after current
1479   // block.
1480   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1481   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1482   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1483
1484   // Insert an explicit fall through from the current block to the LoopBB.
1485   Builder.CreateBr(LoopBB);
1486
1487   // Start insertion in LoopBB.
1488   Builder.SetInsertPoint(LoopBB);
1489
1490   // Start the PHI node with an entry for Start.
1491   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1492   Variable-&gt;addIncoming(StartVal, PreheaderBB);
1493
1494   // Within the loop, the variable is defined equal to the PHI node.  If it
1495   // shadows an existing variable, we have to restore it, so save it now.
1496   Value *OldVal = NamedValues[VarName];
1497   NamedValues[VarName] = Variable;
1498
1499   // Emit the body of the loop.  This, like any other expr, can change the
1500   // current BB.  Note that we ignore the value computed by the body, but don't
1501   // allow an error.
1502   if (Body-&gt;Codegen() == 0)
1503     return 0;
1504
1505   // Emit the step value.
1506   Value *StepVal;
1507   if (Step) {
1508     StepVal = Step-&gt;Codegen();
1509     if (StepVal == 0) return 0;
1510   } else {
1511     // If not specified, use 1.0.
1512     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1513   }
1514
1515   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1516
1517   // Compute the end condition.
1518   Value *EndCond = End-&gt;Codegen();
1519   if (EndCond == 0) return EndCond;
1520
1521   // Convert condition to a bool by comparing equal to 0.0.
1522   EndCond = Builder.CreateFCmpONE(EndCond,
1523                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1524                                   "loopcond");
1525
1526   // Create the "after loop" block and insert it.
1527   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1528   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1529
1530   // Insert the conditional branch into the end of LoopEndBB.
1531   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1532
1533   // Any new code will be inserted in AfterBB.
1534   Builder.SetInsertPoint(AfterBB);
1535
1536   // Add a new entry to the PHI node for the backedge.
1537   Variable-&gt;addIncoming(NextVar, LoopEndBB);
1538
1539   // Restore the unshadowed variable.
1540   if (OldVal)
1541     NamedValues[VarName] = OldVal;
1542   else
1543     NamedValues.erase(VarName);
1544
1545
1546   // for expr always returns 0.0.
1547   return Constant::getNullValue(Type::DoubleTy);
1548 }
1549
1550 Function *PrototypeAST::Codegen() {
1551   // Make the function type:  double(double,double) etc.
1552   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1553   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1554
1555   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1556
1557   // If F conflicted, there was already something named 'Name'.  If it has a
1558   // body, don't allow redefinition or reextern.
1559   if (F-&gt;getName() != Name) {
1560     // Delete the one we just made and get the existing one.
1561     F-&gt;eraseFromParent();
1562     F = TheModule-&gt;getFunction(Name);
1563
1564     // If F already has a body, reject this.
1565     if (!F-&gt;empty()) {
1566       ErrorF("redefinition of function");
1567       return 0;
1568     }
1569
1570     // If F took a different number of args, reject.
1571     if (F-&gt;arg_size() != Args.size()) {
1572       ErrorF("redefinition of function with different # args");
1573       return 0;
1574     }
1575   }
1576
1577   // Set names for all arguments.
1578   unsigned Idx = 0;
1579   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1580        ++AI, ++Idx) {
1581     AI-&gt;setName(Args[Idx]);
1582
1583     // Add arguments to variable symbol table.
1584     NamedValues[Args[Idx]] = AI;
1585   }
1586
1587   return F;
1588 }
1589
1590 Function *FunctionAST::Codegen() {
1591   NamedValues.clear();
1592
1593   Function *TheFunction = Proto-&gt;Codegen();
1594   if (TheFunction == 0)
1595     return 0;
1596
1597   // Create a new basic block to start insertion into.
1598   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1599   Builder.SetInsertPoint(BB);
1600
1601   if (Value *RetVal = Body-&gt;Codegen()) {
1602     // Finish off the function.
1603     Builder.CreateRet(RetVal);
1604
1605     // Validate the generated code, checking for consistency.
1606     verifyFunction(*TheFunction);
1607
1608     // Optimize the function.
1609     TheFPM-&gt;run(*TheFunction);
1610
1611     return TheFunction;
1612   }
1613
1614   // Error reading body, remove function.
1615   TheFunction-&gt;eraseFromParent();
1616   return 0;
1617 }
1618
1619 //===----------------------------------------------------------------------===//
1620 // Top-Level parsing and JIT Driver
1621 //===----------------------------------------------------------------------===//
1622
1623 static ExecutionEngine *TheExecutionEngine;
1624
1625 static void HandleDefinition() {
1626   if (FunctionAST *F = ParseDefinition()) {
1627     if (Function *LF = F-&gt;Codegen()) {
1628       fprintf(stderr, "Read function definition:");
1629       LF-&gt;dump();
1630     }
1631   } else {
1632     // Skip token for error recovery.
1633     getNextToken();
1634   }
1635 }
1636
1637 static void HandleExtern() {
1638   if (PrototypeAST *P = ParseExtern()) {
1639     if (Function *F = P-&gt;Codegen()) {
1640       fprintf(stderr, "Read extern: ");
1641       F-&gt;dump();
1642     }
1643   } else {
1644     // Skip token for error recovery.
1645     getNextToken();
1646   }
1647 }
1648
1649 static void HandleTopLevelExpression() {
1650   // Evaluate a top level expression into an anonymous function.
1651   if (FunctionAST *F = ParseTopLevelExpr()) {
1652     if (Function *LF = F-&gt;Codegen()) {
1653       // JIT the function, returning a function pointer.
1654       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1655
1656       // Cast it to the right type (takes no arguments, returns a double) so we
1657       // can call it as a native function.
1658       double (*FP)() = (double (*)())FPtr;
1659       fprintf(stderr, "Evaluated to %f\n", FP());
1660     }
1661   } else {
1662     // Skip token for error recovery.
1663     getNextToken();
1664   }
1665 }
1666
1667 /// top ::= definition | external | expression | ';'
1668 static void MainLoop() {
1669   while (1) {
1670     fprintf(stderr, "ready&gt; ");
1671     switch (CurTok) {
1672     case tok_eof:    return;
1673     case ';':        getNextToken(); break;  // ignore top level semicolons.
1674     case tok_def:    HandleDefinition(); break;
1675     case tok_extern: HandleExtern(); break;
1676     default:         HandleTopLevelExpression(); break;
1677     }
1678   }
1679 }
1680
1681
1682
1683 //===----------------------------------------------------------------------===//
1684 // "Library" functions that can be "extern'd" from user code.
1685 //===----------------------------------------------------------------------===//
1686
1687 /// putchard - putchar that takes a double and returns 0.
1688 extern "C"
1689 double putchard(double X) {
1690   putchar((char)X);
1691   return 0;
1692 }
1693
1694 //===----------------------------------------------------------------------===//
1695 // Main driver code.
1696 //===----------------------------------------------------------------------===//
1697
1698 int main() {
1699   // Install standard binary operators.
1700   // 1 is lowest precedence.
1701   BinopPrecedence['&lt;'] = 10;
1702   BinopPrecedence['+'] = 20;
1703   BinopPrecedence['-'] = 20;
1704   BinopPrecedence['*'] = 40;  // highest.
1705
1706   // Prime the first token.
1707   fprintf(stderr, "ready&gt; ");
1708   getNextToken();
1709
1710   // Make the module, which holds all the code.
1711   TheModule = new Module("my cool jit");
1712
1713   // Create the JIT.
1714   TheExecutionEngine = ExecutionEngine::create(TheModule);
1715
1716   {
1717     ExistingModuleProvider OurModuleProvider(TheModule);
1718     FunctionPassManager OurFPM(&amp;OurModuleProvider);
1719
1720     // Set up the optimizer pipeline.  Start with registering info about how the
1721     // target lays out data structures.
1722     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1723     // Do simple "peephole" optimizations and bit-twiddling optzns.
1724     OurFPM.add(createInstructionCombiningPass());
1725     // Reassociate expressions.
1726     OurFPM.add(createReassociatePass());
1727     // Eliminate Common SubExpressions.
1728     OurFPM.add(createGVNPass());
1729     // Simplify the control flow graph (deleting unreachable blocks, etc).
1730     OurFPM.add(createCFGSimplificationPass());
1731     // Set the global so the code gen can use this.
1732     TheFPM = &amp;OurFPM;
1733
1734     // Run the main "interpreter loop" now.
1735     MainLoop();
1736
1737     TheFPM = 0;
1738   }  // Free module provider and pass manager.
1739
1740
1741   // Print out all of the generated code.
1742   TheModule-&gt;dump();
1743   return 0;
1744 }
1745 </pre>
1746 </div>
1747
1748 </div>
1749
1750 <!-- *********************************************************************** -->
1751 <hr>
1752 <address>
1753   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1754   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1755   <a href="http://validator.w3.org/check/referer"><img
1756   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1757
1758   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1759   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1760   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1761 </address>
1762 </body>
1763 </html>