docs/tutorial/LangImpl5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Control Flow</title>
   7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   8   <meta name="author" content="Chris Lattner">
   9   <link rel="stylesheet" href="../llvm.css" type="text/css">
  10 </head>
  11
  12 <body>
  13
  14 <div class="doc_title">Kaleidoscope: Extending the Language: Control Flow</div>
  15
  16 <div class="doc_author">
  17   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  18 </div>
  19
  20 <!-- *********************************************************************** -->
  21 <div class="doc_section"><a name="intro">Part 5 Introduction</a></div>
  22 <!-- *********************************************************************** -->
  23
  24 <div class="doc_text">
  25
  26 <p>Welcome to Part 5 of the "<a href="index.html">Implementing a language with
  27 LLVM</a>" tutorial.  Parts 1-4 described the implementation of the simple
  28 Kaleidoscope language and included support for generating LLVM IR, following by
  29 optimizations and a JIT compiler.  Unfortunately, as presented, Kaleidoscope is
  30 mostly useless: it has no control flow other than call and return.  This means
  31 that you can't have conditional branches in the code, significantly limiting its
  32 power.  In this episode of "build that compiler", we'll extend Kaleidoscope to
  33 have an if/then/else expression plus a simple looping construct.</p>
  34
  35 </div>
  36
  37 <!-- *********************************************************************** -->
  38 <div class="doc_section"><a name="ifthen">If/Then/Else</a></div>
  39 <!-- *********************************************************************** -->
  40
  41 <div class="doc_text">
  42
  43 <p>
  44 Extending Kaleidoscope to support if/then/else is quite straight-forward.  It
  45 basically requires adding lexer support for this "new" concept to the lexer,
  46 parser, AST, and LLVM code emitter.  This example is nice, because it shows how
  47 easy it is to "grow" a language over time, incrementally extending it as new
  48 ideas are discovered.</p>
  49
  50 <p>Before we get going on "how" we do this extension, lets talk about what we
  51 want.  The basic idea is that we want to be able to write this sort of thing:
  52 </p>
  53
  54 <div class="doc_code">
  55 <pre>
  56 def fib(x)
  57   if x &lt; 3 then
  58     1
  59   else
  60     fib(x-1)+fib(x-2);
  61 </pre>
  62 </div>
  63
  64 <p>In Kaleidoscope, every construct is an expression: there are no statements.
  65 As such, the if/then/else expression needs to return a value like any other.
  66 Since we're using a mostly functional form, we'll have it evaluate its
  67 conditional, then return the 'then' or 'else' value based on how the condition
  68 was resolved.  This is very similar to the C "?:" expression.</p>
  69
  70 <p>The semantics of the if/then/else expression is that it first evaluates the
  71 condition to a boolean equality value: 0.0 is false and everything else is true.
  72 If the condition is true, the first subexpression is evaluated and returned, if
  73 the condition is false, the second subexpression is evaluated and returned.
  74 Since Kaleidoscope allows side-effects, this behavior is important to nail down.
  75 </p>
  76
  77 <p>Now that we know what we want, lets break this down into its constituent
  78 pieces.</p>
  79
  80 </div>
  81
  82 <!-- ======================================================================= -->
  83 <div class="doc_subsubsection"><a name="iflexer">Lexer Extensions for
  84 If/Then/Else</a></div>
  85 <!-- ======================================================================= -->
  86
  87
  88 <div class="doc_text">
  89
  90 <p>The lexer extensions are straight-forward.  First we add new enum values
  91 for the relevant tokens:</p>
  92
  93 <div class="doc_code">
  94 <pre>
  95   // control
  96   tok_if = -6, tok_then = -7, tok_else = -8,
  97 </pre>
  98 </div>
  99
 100 <p>Once we have that, we recognize the new keywords in the lexer, pretty simple
 101 stuff:</p>
 102
 103 <div class="doc_code">
 104 <pre>
 105     ...
 106     if (IdentifierStr == "def") return tok_def;
 107     if (IdentifierStr == "extern") return tok_extern;
 108     <b>if (IdentifierStr == "if") return tok_if;
 109     if (IdentifierStr == "then") return tok_then;
 110     if (IdentifierStr == "else") return tok_else;</b>
 111     return tok_identifier;
 112 </pre>
 113 </div>
 114
 115 </div>
 116
 117 <!-- ======================================================================= -->
 118 <div class="doc_subsubsection"><a name="ifast">AST Extensions for
 119  If/Then/Else </a></div>
 120 <!-- ======================================================================= -->
 121
 122 <div class="doc_text">
 123
 124 <p>To represent the new expression we add a new AST node for it:</p>
 125
 126 <div class="doc_code">
 127 <pre>
 128 /// IfExprAST - Expression class for if/then/else.
 129 class IfExprAST : public ExprAST {
 130   ExprAST *Cond, *Then, *Else;
 131 public:
 132   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
 133     : Cond(cond), Then(then), Else(_else) {}
 134   virtual Value *Codegen();
 135 };
 136 </pre>
 137 </div>
 138
 139 <p>The AST node just has pointers to the various subexpressions.</p>
 140
 141 </div>
 142
 143 <!-- ======================================================================= -->
 144 <div class="doc_subsubsection"><a name="ifparser">Parser Extensions for
 145 If/Then/Else </a></div>
 146 <!-- ======================================================================= -->
 147
 148 <div class="doc_text">
 149
 150 <p>Now that we have the relevant tokens coming from the lexer and we have the
 151 AST node to build, our parsing logic is relatively straight-forward.  First we
 152 define a new parsing function:</p>
 153
 154 <div class="doc_code">
 155 <pre>
 156 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
 157 static ExprAST *ParseIfExpr() {
 158   getNextToken();  // eat the if.
 159
 160   // condition.
 161   ExprAST *Cond = ParseExpression();
 162   if (!Cond) return 0;
 163
 164   if (CurTok != tok_then)
 165     return Error("expected then");
 166   getNextToken();  // eat the then
 167
 168   ExprAST *Then = ParseExpression();
 169   if (Then == 0) return 0;
 170
 171   if (CurTok != tok_else)
 172     return Error("expected else");
 173
 174   getNextToken();
 175
 176   ExprAST *Else = ParseExpression();
 177   if (!Else) return 0;
 178
 179   return new IfExprAST(Cond, Then, Else);
 180 }
 181 </pre>
 182 </div>
 183
 184 <p>Next we hook it up as a primary expression:</p>
 185
 186 <div class="doc_code">
 187 <pre>
 188 static ExprAST *ParsePrimary() {
 189   switch (CurTok) {
 190   default: return Error("unknown token when expecting an expression");
 191   case tok_identifier: return ParseIdentifierExpr();
 192   case tok_number:     return ParseNumberExpr();
 193   case '(':            return ParseParenExpr();
 194   <b>case tok_if:         return ParseIfExpr();</b>
 195   }
 196 }
 197 </pre>
 198 </div>
 199
 200 </div>
 201
 202 <!-- ======================================================================= -->
 203 <div class="doc_subsubsection"><a name="ifir">LLVM IR for If/Then/Else</a></div>
 204 <!-- ======================================================================= -->
 205
 206 <div class="doc_text">
 207
 208 <p>Now that we have it parsing and building the AST, the final piece is adding
 209 LLVM code generation support.  This is the most interesting part of the
 210 if/then/else example, because this is where it starts to introduce new concepts.
 211 All of the code above has been described in previous chapters fairly thoroughly.
 212 </p>
 213
 214 <p>To motivate the code we want to produce, lets take a look at a simple
 215 example.  Consider:</p>
 216
 217 <div class="doc_code">
 218 <pre>
 219 extern foo();
 220 extern bar();
 221 def baz(x) if x then foo() else bar();
 222 </pre>
 223 </div>
 224
 225 <p>If you disable optimizations, the code you'll (soon) get from Kaleidoscope
 226 looks like this:</p>
 227
 228 <div class="doc_code">
 229 <pre>
 230 declare double @foo()
 231
 232 declare double @bar()
 233
 234 define double @baz(double %x) {
 235 entry:
 236         %ifcond = fcmp one double %x, 0.000000e+00
 237         br i1 %ifcond, label %then, label %else
 238
 239 then:           ; preds = %entry
 240         %calltmp = call double @foo()
 241         br label %ifcont
 242
 243 else:           ; preds = %entry
 244         %calltmp1 = call double @bar()
 245         br label %ifcont
 246
 247 ifcont:         ; preds = %else, %then
 248         %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
 249         ret double %iftmp
 250 }
 251 </pre>
 252 </div>
 253
 254 <p>To visualize the control flow graph, you can use a nifty feature of the LLVM
 255 '<a href="http://llvm.org/cmds/opt.html">opt</a>' tool.  If you put this LLVM IR
 256 into "t.ll" and run "<tt>llvm-as &lt; t.ll | opt -analyze -view-cfg</tt>", <a
 257 href="../ProgrammersManual.html#ViewGraph">a window will pop up</a> and you'll
 258 see this graph:</p>
 259
 260 <center><img src="LangImpl5-cfg.png" alt="Example CFG" width="423"
 261 height="315"></center>
 262
 263 <p>Another way to get this is to call "<tt>F-&gt;viewCFG()</tt>" or
 264 "<tt>F-&gt;viewCFGOnly()</tt>" (where F is a "<tt>Function*</tt>") either by
 265 inserting actual calls into the code and recompiling or by calling these in the
 266 debugger.  LLVM has many nice features for visualizing various graphs.</p>
 267
 268 <p>Coming back to the generated code, it is fairly simple: the entry block
 269 evaluates the conditional expression ("x" in our case here) and compares the
 270 result to 0.0 with the "<tt><a href="../LangRef.html#i_fcmp">fcmp</a> one</tt>"
 271 instruction ('one' is "ordered and not equal").  Based on the result of this
 272 expression, the code jumps to either the "then" or "else" blocks, which contain
 273 the expressions for the true/false case.</p>
 274
 275 <p>Once the then/else blocks is finished executing, they both branch back to the
 276 else block to execute the code that happens after the if/then/else.  In this
 277 case the only thing left to do is to return to the caller of the function.  The
 278 question then becomes: how does the code know which expression to return?</p>
 279
 280 <p>The answer to this question involves an important SSA operation: the
 281 <a href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Phi
 282 operation</a>.  If you're not familiar with SSA, <a
 283 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">the wikipedia
 284 article</a> is a good introduction and there are various other introductions to
 285 it available on your favorite search engine.  The short version is that
 286 "execution" of the Phi operation requires "remembering" which block control came
 287 from.  The Phi operation takes on the value corresponding to the input control
 288 block.  In this case, if control comes in from the "then" block, it gets the
 289 value of "calltmp".  If control comes from the "else" block, it gets the value
 290 of "calltmp1".</p>
 291
 292 <p>At this point, you are probably starting to think "on no! this means my
 293 simple and elegant front-end will have to start generating SSA form in order to
 294 use LLVM!".  Fortunately, this is not the case, and we strongly advise
 295 <em>not</em> implementing an SSA construction algorithm in your front-end
 296 unless there is an amazingly good reason to do so.  In practice, there are two
 297 sorts of values that float around in code written in your average imperative
 298 programming language that might need Phi nodes:</p>
 299
 300 <ol>
 301 <li>Code that involves user variables: <tt>x = 1; x = x + 1; </tt></li>
 302 <li>Values that are implicit in the structure of your AST, such as the phi node
 303 in this case.</li>
 304 </ol>
 305
 306 <p>In <a href="LangImpl7.html">Chapter 7</a> of this tutorial ("mutable
 307 variables"), we'll talk about #1
 308 in depth.  For now, just believe me that you don't need SSA construction to
 309 handle them.  For #2, you have the choice of using the techniques that we will
 310 describe for #1, or you can insert Phi nodes directly if convenient.  In this
 311 case, it is really really easy to generate the Phi node, so we choose to do it
 312 directly.</p>
 313
 314 <p>Okay, enough of the motivation and overview, lets generate code!</p>
 315
 316 </div>
 317
 318 <!-- ======================================================================= -->
 319 <div class="doc_subsubsection"><a name="ifcodegen">Code Generation for
 320 If/Then/Else</a></div>
 321 <!-- ======================================================================= -->
 322
 323 <div class="doc_text">
 324
 325 <p>In order to generate code for this, we implement the <tt>Codegen</tt> method
 326 for <tt>IfExprAST</tt>:</p>
 327
 328 <div class="doc_code">
 329 <pre>
 330 Value *IfExprAST::Codegen() {
 331   Value *CondV = Cond-&gt;Codegen();
 332   if (CondV == 0) return 0;
 333
 334   // Convert condition to a bool by comparing equal to 0.0.
 335   CondV = Builder.CreateFCmpONE(CondV,
 336                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 337                                 "ifcond");
 338 </pre>
 339 </div>
 340
 341 <p>This code is straight-forward and similar to what we saw before.  We emit the
 342 expression for the condition, then compare that value to zero to get a truth
 343 value as a 1-bit (bool) value.</p>
 344
 345 <div class="doc_code">
 346 <pre>
 347   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 348
 349   // Create blocks for the then and else cases.  Insert the 'then' block at the
 350   // end of the function.
 351   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
 352   BasicBlock *ElseBB = new BasicBlock("else");
 353   BasicBlock *MergeBB = new BasicBlock("ifcont");
 354
 355   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
 356 </pre>
 357 </div>
 358
 359 <p>This code creates the basic blocks that are related to the if/then/else
 360 statement, and correspond directly to the blocks in the example above.  The
 361 first line of this gets the current Function object that is being built.  It
 362 gets this by asking the builder for the current BasicBlock, and asking that
 363 block for its "parent" (the function it is currently embedded into).</p>
 364
 365 <p>Once it has that, it creates three blocks.  Note that it passes "TheFunction"
 366 into the constructor for the "then" block.  This causes the constructor to
 367 automatically insert the new block onto the end of the specified function.  The
 368 other two blocks are created, but aren't yet inserted into the function.</p>
 369
 370 <p>Once the blocks are created, we can emit the conditional branch that chooses
 371 between them.  Note that creating new blocks does not implicitly affect the
 372 LLVMBuilder, so it is still inserting into the block that the condition
 373 went into.  Also note that it is creating a branch to the "then" block and the
 374 "else" block, even though the "else" block isn't inserted into the function yet.
 375 This is all ok: it is the standard way that LLVM supports forward
 376 references.</p>
 377
 378 <div class="doc_code">
 379 <pre>
 380   // Emit then value.
 381   Builder.SetInsertPoint(ThenBB);
 382
 383   Value *ThenV = Then-&gt;Codegen();
 384   if (ThenV == 0) return 0;
 385
 386   Builder.CreateBr(MergeBB);
 387   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
 388   ThenBB = Builder.GetInsertBlock();
 389 </pre>
 390 </div>
 391
 392 <p>After the conditional branch is inserted, we move the builder to start
 393 inserting into the "then" block.  Strictly speaking, this call moves the
 394 insertion point to be at the end of the specified block.  However, since the
 395 "then" block is empty, it also starts out by inserting at the beginning of the
 396 block.  :)</p>
 397
 398 <p>Once the insertion point is set, we recursively codegen the "then" expression
 399 from the AST.  To finish off the then block, we create an unconditional branch
 400 to the merge block.  One interesting (and very important) aspect of the LLVM IR
 401 is that it <a href="../LangRef.html#functionstructure">requires all basic blocks
 402 to be "terminated"</a> with a <a href="../LangRef.html#terminators">control flow
 403 instruction</a> such as return or branch.  This means that all control flow,
 404 <em>including fall throughs</em> must be made explicit in the LLVM IR.  If you
 405 violate this rule, the verifier will emit an error.</p>
 406
 407 <p>The final line here is quite subtle, but is very important.  The basic issue
 408 is that when we create the Phi node in the merge block, we need to set up the
 409 block/value pairs that indicate how the Phi will work.  Importantly, the Phi
 410 node expects to have an entry for each predecessor of the block in the CFG.  Why
 411 then are we getting the current block when we just set it to ThenBB 5 lines
 412 above?  The problem is that the "Then" expression may actually itself change the
 413 block that the Builder is emitting into if, for example, it contains a nested
 414 "if/then/else" expression.  Because calling Codegen recursively could
 415 arbitrarily change the notion of the current block, we are required to get an
 416 up-to-date value for code that will set up the Phi node.</p>
 417
 418 <div class="doc_code">
 419 <pre>
 420   // Emit else block.
 421   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
 422   Builder.SetInsertPoint(ElseBB);
 423
 424   Value *ElseV = Else-&gt;Codegen();
 425   if (ElseV == 0) return 0;
 426
 427   Builder.CreateBr(MergeBB);
 428   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
 429   ElseBB = Builder.GetInsertBlock();
 430 </pre>
 431 </div>
 432
 433 <p>Code generation for the 'else' block is basically identical to codegen for
 434 the 'then' block.  The only significant difference is the first line, which adds
 435 the 'else' block to the function.  Recall previously that the 'else' block was
 436 created, but not added to the function.  Now that the 'then' and 'else' blocks
 437 are emitted, we can finish up with the merge code:</p>
 438
 439 <div class="doc_code">
 440 <pre>
 441   // Emit merge block.
 442   TheFunction->getBasicBlockList().push_back(MergeBB);
 443   Builder.SetInsertPoint(MergeBB);
 444   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
 445
 446   PN->addIncoming(ThenV, ThenBB);
 447   PN->addIncoming(ElseV, ElseBB);
 448   return PN;
 449 }
 450 </pre>
 451 </div>
 452
 453 <p>The first two lines here are now familiar: the first adds the "merge" block
 454 to the Function object (it was previously floating, like the else block above).
 455 The second block changes the insertion point so that newly created code will go
 456 into the "merge" block.  Once that is done, we need to create the PHI node and
 457 set up the block/value pairs for the PHI.</p>
 458
 459 <p>Finally, the CodeGen function returns the phi node as the value computed by
 460 the if/then/else expression.  In our example above, this returned value will
 461 feed into the code for the top-level function, which will create the return
 462 instruction.</p>
 463
 464 <p>Overall, we now have the ability to execution conditional code in
 465 Kaleidoscope.  With this extension, Kaleidoscope is a fairly complete language
 466 that can calculate a wide variety of numeric functions.  Next up we'll add
 467 another useful expression that is familiar from non-functional languages...</p>
 468
 469 </div>
 470
 471 <!-- *********************************************************************** -->
 472 <div class="doc_section"><a name="for">'for' Loop Expression</a></div>
 473 <!-- *********************************************************************** -->
 474
 475 <div class="doc_text">
 476
 477 <p>Now that we know how to add basic control flow constructs to the language,
 478 we have the tools to add more powerful things.  Lets add something more
 479 aggressive, a 'for' expression:</p>
 480
 481 <div class="doc_code">
 482 <pre>
 483  extern putchard(char)
 484  def printstar(n)
 485    for i = 1, i &lt; n, 1.0 in
 486      putchard(42);  # ascii 42 = '*'
 487
 488  # print 100 '*' characters
 489  printstar(100);
 490 </pre>
 491 </div>
 492
 493 <p>This expression defines a new variable ("i" in this case) which iterates from
 494 a starting value, while the condition ("i &lt; n" in this case) is true,
 495 incrementing by an optional step value ("1.0" in this case).  If the step value
 496 is omitted, it defaults to 1.0.  While the loop is true, it executes its
 497 body expression.  Because we don't have anything better to return, we'll just
 498 define the loop as always returning 0.0.  In the future when we have mutable
 499 variables, it will get more useful.</p>
 500
 501 <p>As before, lets talk about the changes that we need to Kaleidoscope to
 502 support this.</p>
 503
 504 </div>
 505
 506 <!-- ======================================================================= -->
 507 <div class="doc_subsubsection"><a name="forlexer">Lexer Extensions for
 508 the 'for' Loop</a></div>
 509 <!-- ======================================================================= -->
 510
 511 <div class="doc_text">
 512
 513 <p>The lexer extensions are the same sort of thing as for if/then/else:</p>
 514
 515 <div class="doc_code">
 516 <pre>
 517   ... in enum Token ...
 518   // control
 519   tok_if = -6, tok_then = -7, tok_else = -8,
 520 <b>  tok_for = -9, tok_in = -10</b>
 521
 522   ... in gettok ...
 523   if (IdentifierStr == "def") return tok_def;
 524   if (IdentifierStr == "extern") return tok_extern;
 525   if (IdentifierStr == "if") return tok_if;
 526   if (IdentifierStr == "then") return tok_then;
 527   if (IdentifierStr == "else") return tok_else;
 528   <b>if (IdentifierStr == "for") return tok_for;
 529   if (IdentifierStr == "in") return tok_in;</b>
 530   return tok_identifier;
 531 </pre>
 532 </div>
 533
 534 </div>
 535
 536 <!-- ======================================================================= -->
 537 <div class="doc_subsubsection"><a name="forast">AST Extensions for
 538 the 'for' Loop</a></div>
 539 <!-- ======================================================================= -->
 540
 541 <div class="doc_text">
 542
 543 <p>The AST node is similarly simple.  It basically boils down to capturing
 544 the variable name and the consituent expressions in the node.</p>
 545
 546 <div class="doc_code">
 547 <pre>
 548 /// ForExprAST - Expression class for for/in.
 549 class ForExprAST : public ExprAST {
 550   std::string VarName;
 551   ExprAST *Start, *End, *Step, *Body;
 552 public:
 553   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
 554              ExprAST *step, ExprAST *body)
 555     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
 556   virtual Value *Codegen();
 557 };
 558 </pre>
 559 </div>
 560
 561 </div>
 562
 563 <!-- ======================================================================= -->
 564 <div class="doc_subsubsection"><a name="forparser">Parser Extensions for
 565 the 'for' Loop</a></div>
 566 <!-- ======================================================================= -->
 567
 568 <div class="doc_text">
 569
 570 <p>The parser code is also fairly standard.  The only interesting thing here is
 571 handling of the optional step value.  The parser code handles it by checking to
 572 see if the second comma is present.  If not, it sets the step value to null in
 573 the AST node:</p>
 574
 575 <div class="doc_code">
 576 <pre>
 577 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
 578 static ExprAST *ParseForExpr() {
 579   getNextToken();  // eat the for.
 580
 581   if (CurTok != tok_identifier)
 582     return Error("expected identifier after for");
 583
 584   std::string IdName = IdentifierStr;
 585   getNextToken();  // eat identifer.
 586
 587   if (CurTok != '=')
 588     return Error("expected '=' after for");
 589   getNextToken();  // eat '='.
 590
 591
 592   ExprAST *Start = ParseExpression();
 593   if (Start == 0) return 0;
 594   if (CurTok != ',')
 595     return Error("expected ',' after for start value");
 596   getNextToken();
 597
 598   ExprAST *End = ParseExpression();
 599   if (End == 0) return 0;
 600
 601   // The step value is optional.
 602   ExprAST *Step = 0;
 603   if (CurTok == ',') {
 604     getNextToken();
 605     Step = ParseExpression();
 606     if (Step == 0) return 0;
 607   }
 608
 609   if (CurTok != tok_in)
 610     return Error("expected 'in' after for");
 611   getNextToken();  // eat 'in'.
 612
 613   ExprAST *Body = ParseExpression();
 614   if (Body == 0) return 0;
 615
 616   return new ForExprAST(IdName, Start, End, Step, Body);
 617 }
 618 </pre>
 619 </div>
 620
 621 </div>
 622
 623 <!-- ======================================================================= -->
 624 <div class="doc_subsubsection"><a name="forir">LLVM IR for
 625 the 'for' Loop</a></div>
 626 <!-- ======================================================================= -->
 627
 628 <div class="doc_text">
 629
 630 <p>Now we get to the good part: the LLVM IR we want to generate for this thing.
 631 With the simple example above, we get this LLVM IR (note that this dump is
 632 generated with optimizations disabled):
 633 </p>
 634
 635 <div class="doc_code">
 636 <pre>
 637 declare double @putchard(double)
 638
 639 define double @printstar(double %n) {
 640 entry:
 641         ; initial value = 1.0 (inlined into phi)
 642         br label %loop
 643
 644 loop:           ; preds = %loop, %entry
 645         %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
 646         ; body
 647         %calltmp = call double @putchard( double 4.200000e+01 )
 648         ; increment
 649         %nextvar = add double %i, 1.000000e+00
 650
 651         ; termination test
 652         %multmp = fcmp ult double %i, %n
 653         %booltmp = uitofp i1 %multmp to double
 654         %loopcond = fcmp one double %booltmp, 0.000000e+00
 655         br i1 %loopcond, label %loop, label %afterloop
 656
 657 afterloop:              ; preds = %loop
 658         ; loop always returns 0.0
 659         ret double 0.000000e+00
 660 }
 661 </pre>
 662 </div>
 663
 664 <p>This loop contains all the same constructs we saw before: a phi node, several
 665 expressions, and some basic blocks.  Lets see how this fits together.</p>
 666
 667 </div>
 668
 669 <!-- ======================================================================= -->
 670 <div class="doc_subsubsection"><a name="forcodegen">Code Generation for
 671 the 'for' Loop</a></div>
 672 <!-- ======================================================================= -->
 673
 674 <div class="doc_text">
 675
 676 <p>The first part of codegen is very simple: we just output the start expression
 677 for the loop value:</p>
 678
 679 <div class="doc_code">
 680 <pre>
 681 Value *ForExprAST::Codegen() {
 682   // Emit the start code first, without 'variable' in scope.
 683   Value *StartVal = Start-&gt;Codegen();
 684   if (StartVal == 0) return 0;
 685 </pre>
 686 </div>
 687
 688 <p>With this out of the way, the next step is to set up the LLVM basic block
 689 for the start of the loop body.  In the case above, the whole loop body is one
 690 block, but remember that the body code itself could consist of multiple blocks
 691 (e.g. if it is a if/then/else expression).</p>
 692
 693 <div class="doc_code">
 694 <pre>
 695   // Make the new basic block for the loop header, inserting after current
 696   // block.
 697   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 698   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
 699   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
 700
 701   // Insert an explicit fall through from the current block to the LoopBB.
 702   Builder.CreateBr(LoopBB);
 703 </pre>
 704 </div>
 705
 706 <p>This code is similar to what we saw for if/then/else.  Because we will need
 707 it to create the Phi node, we remember the block that falls through into the
 708 loop.  Once we have that, we create the actual block that starts the loop and
 709 create an unconditional branch for the fall-through between the two blocks.</p>
 710
 711 <div class="doc_code">
 712 <pre>
 713   // Start insertion in LoopBB.
 714   Builder.SetInsertPoint(LoopBB);
 715
 716   // Start the PHI node with an entry for Start.
 717   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
 718   Variable-&gt;addIncoming(StartVal, PreheaderBB);
 719 </pre>
 720 </div>
 721
 722 <p>Now that the "preheader" for the loop is set up, we switch to emitting code
 723 for the loop body.  To begin with, we move the insertion point and create the
 724 PHI node for the loop induction variable.  SInce we already know the incoming
 725 value for the starting value, we add it to the Phi node.  Note that the Phi will
 726 eventually get a second value for the backedge, but we can't set it up yet
 727 (because it doesn't exist!).</p>
 728
 729 <div class="doc_code">
 730 <pre>
 731   // Within the loop, the variable is defined equal to the PHI node.  If it
 732   // shadows an existing variable, we have to restore it, so save it now.
 733   Value *OldVal = NamedValues[VarName];
 734   NamedValues[VarName] = Variable;
 735
 736   // Emit the body of the loop.  This, like any other expr, can change the
 737   // current BB.  Note that we ignore the value computed by the body, but don't
 738   // allow an error.
 739   if (Body-&gt;Codegen() == 0)
 740     return 0;
 741 </pre>
 742 </div>
 743
 744 <p>Now the code starts to get more interesting.  Our 'for' loop introduces a new
 745 variable to the symbol table.  This means that our symbol table can now contain
 746 either function arguments or loop variables.  To handle this, before we codegen
 747 the body of the loop, we add the loop variable as the current value for its
 748 name.  Note that it is possible that there is a variable of the same name in the
 749 outer scope.  It would be easy to make this an error (emit an error and return
 750 null if there is already an entry for VarName) but we choose to allow shadowing
 751 of variables.  In order to handle this correctly, we remember the Value that
 752 we are potentially shadowing in <tt>OldVal</tt> (which will be null if there is
 753 no shadowed variable).</p>
 754
 755 <p>Once the loop variable is set into the symbol table, the code recursively
 756 codegen's the body.  This allows the body to use the loop variable: any
 757 references to it will naturally find it in the symbol table.</p>
 758
 759 <div class="doc_code">
 760 <pre>
 761   // Emit the step value.
 762   Value *StepVal;
 763   if (Step) {
 764     StepVal = Step-&gt;Codegen();
 765     if (StepVal == 0) return 0;
 766   } else {
 767     // If not specified, use 1.0.
 768     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
 769   }
 770
 771   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
 772 </pre>
 773 </div>
 774
 775 <p>Now that the body is emitted, we compute the next value of the iteration
 776 variable by adding the step value or 1.0 if it isn't present. '<tt>NextVar</tt>'
 777 will be the value of the loop variable on the next iteration of the loop.</p>
 778
 779 <div class="doc_code">
 780 <pre>
 781   // Compute the end condition.
 782   Value *EndCond = End-&gt;Codegen();
 783   if (EndCond == 0) return EndCond;
 784
 785   // Convert condition to a bool by comparing equal to 0.0.
 786   EndCond = Builder.CreateFCmpONE(EndCond,
 787                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
 788                                   "loopcond");
 789 </pre>
 790 </div>
 791
 792 <p>Finally, we evaluate the exit value of the loop, to determine whether the
 793 loop should exit.  This mirrors the condition evaluation for the if/then/else
 794 statement.</p>
 795
 796 <div class="doc_code">
 797 <pre>
 798   // Create the "after loop" block and insert it.
 799   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
 800   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
 801
 802   // Insert the conditional branch into the end of LoopEndBB.
 803   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
 804
 805   // Any new code will be inserted in AfterBB.
 806   Builder.SetInsertPoint(AfterBB);
 807 </pre>
 808 </div>
 809
 810 <p>With the code for the body of the loop complete, we just need to finish up
 811 the control flow for it.  This remembers the end block (for the phi node), then
 812 creates the block for the loop exit ("afterloop").  Based on the value of the
 813 exit condition, it creates a conditional branch that chooses between executing
 814 the loop again and exiting the loop.  Any future code is emitted in the
 815 "afterloop" block, so it sets the insertion position to it.</p>
 816
 817 <div class="doc_code">
 818 <pre>
 819   // Add a new entry to the PHI node for the backedge.
 820   Variable-&gt;addIncoming(NextVar, LoopEndBB);
 821
 822   // Restore the unshadowed variable.
 823   if (OldVal)
 824     NamedValues[VarName] = OldVal;
 825   else
 826     NamedValues.erase(VarName);
 827
 828   // for expr always returns 0.0.
 829   return Constant::getNullValue(Type::DoubleTy);
 830 }
 831 </pre>
 832 </div>
 833
 834 <p>The final code handles various cleanups: now that we have the "NextVar"
 835 value, we can add the incoming value to the loop PHI node.  After that, we
 836 remove the loop variable from the symbol table, so that it isn't in scope after
 837 the for loop.  Finally, code generation of the for loop always returns 0.0, so
 838 that is what we return from <tt>ForExprAST::Codegen</tt>.</p>
 839
 840 <p>With this, we conclude the "adding control flow to Kaleidoscope" chapter of
 841 the tutorial.  We added two control flow constructs, and used them to motivate
 842 a couple of aspects of the LLVM IR that are important for front-end implementors
 843 to know.  In the next chapter of our saga, we will get a bit crazier and add
 844 operator overloading to our poor innocent language.</p>
 845
 846 </div>
 847
 848 <!-- *********************************************************************** -->
 849 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 850 <!-- *********************************************************************** -->
 851
 852 <div class="doc_text">
 853
 854 <p>
 855 Here is the complete code listing for our running example, enhanced with the
 856 if/then/else and for expressions..  To build this example, use:
 857 </p>
 858
 859 <div class="doc_code">
 860 <pre>
 861    # Compile
 862    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 863    # Run
 864    ./toy
 865 </pre>
 866 </div>
 867
 868 <p>Here is the code:</p>
 869
 870 <div class="doc_code">
 871 <pre>
 872 #include "llvm/DerivedTypes.h"
 873 #include "llvm/ExecutionEngine/ExecutionEngine.h"
 874 #include "llvm/Module.h"
 875 #include "llvm/ModuleProvider.h"
 876 #include "llvm/PassManager.h"
 877 #include "llvm/Analysis/Verifier.h"
 878 #include "llvm/Target/TargetData.h"
 879 #include "llvm/Transforms/Scalar.h"
 880 #include "llvm/Support/LLVMBuilder.h"
 881 #include &lt;cstdio&gt;
 882 #include &lt;string&gt;
 883 #include &lt;map&gt;
 884 #include &lt;vector&gt;
 885 using namespace llvm;
 886
 887 //===----------------------------------------------------------------------===//
 888 // Lexer
 889 //===----------------------------------------------------------------------===//
 890
 891 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 892 // of these for known things.
 893 enum Token {
 894   tok_eof = -1,
 895
 896   // commands
 897   tok_def = -2, tok_extern = -3,
 898
 899   // primary
 900   tok_identifier = -4, tok_number = -5,
 901
 902   // control
 903   tok_if = -6, tok_then = -7, tok_else = -8,
 904   tok_for = -9, tok_in = -10
 905 };
 906
 907 static std::string IdentifierStr;  // Filled in if tok_identifier
 908 static double NumVal;              // Filled in if tok_number
 909
 910 /// gettok - Return the next token from standard input.
 911 static int gettok() {
 912   static int LastChar = ' ';
 913
 914   // Skip any whitespace.
 915   while (isspace(LastChar))
 916     LastChar = getchar();
 917
 918   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
 919     IdentifierStr = LastChar;
 920     while (isalnum((LastChar = getchar())))
 921       IdentifierStr += LastChar;
 922
 923     if (IdentifierStr == "def") return tok_def;
 924     if (IdentifierStr == "extern") return tok_extern;
 925     if (IdentifierStr == "if") return tok_if;
 926     if (IdentifierStr == "then") return tok_then;
 927     if (IdentifierStr == "else") return tok_else;
 928     if (IdentifierStr == "for") return tok_for;
 929     if (IdentifierStr == "in") return tok_in;
 930     return tok_identifier;
 931   }
 932
 933   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
 934     std::string NumStr;
 935     do {
 936       NumStr += LastChar;
 937       LastChar = getchar();
 938     } while (isdigit(LastChar) || LastChar == '.');
 939
 940     NumVal = strtod(NumStr.c_str(), 0);
 941     return tok_number;
 942   }
 943
 944   if (LastChar == '#') {
 945     // Comment until end of line.
 946     do LastChar = getchar();
 947     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
 948
 949     if (LastChar != EOF)
 950       return gettok();
 951   }
 952
 953   // Check for end of file.  Don't eat the EOF.
 954   if (LastChar == EOF)
 955     return tok_eof;
 956
 957   // Otherwise, just return the character as its ascii value.
 958   int ThisChar = LastChar;
 959   LastChar = getchar();
 960   return ThisChar;
 961 }
 962
 963 //===----------------------------------------------------------------------===//
 964 // Abstract Syntax Tree (aka Parse Tree)
 965 //===----------------------------------------------------------------------===//
 966
 967 /// ExprAST - Base class for all expression nodes.
 968 class ExprAST {
 969 public:
 970   virtual ~ExprAST() {}
 971   virtual Value *Codegen() = 0;
 972 };
 973
 974 /// NumberExprAST - Expression class for numeric literals like "1.0".
 975 class NumberExprAST : public ExprAST {
 976   double Val;
 977 public:
 978   NumberExprAST(double val) : Val(val) {}
 979   virtual Value *Codegen();
 980 };
 981
 982 /// VariableExprAST - Expression class for referencing a variable, like "a".
 983 class VariableExprAST : public ExprAST {
 984   std::string Name;
 985 public:
 986   VariableExprAST(const std::string &amp;name) : Name(name) {}
 987   virtual Value *Codegen();
 988 };
 989
 990 /// BinaryExprAST - Expression class for a binary operator.
 991 class BinaryExprAST : public ExprAST {
 992   char Op;
 993   ExprAST *LHS, *RHS;
 994 public:
 995   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
 996     : Op(op), LHS(lhs), RHS(rhs) {}
 997   virtual Value *Codegen();
 998 };
 999
1000 /// CallExprAST - Expression class for function calls.
1001 class CallExprAST : public ExprAST {
1002   std::string Callee;
1003   std::vector&lt;ExprAST*&gt; Args;
1004 public:
1005   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1006     : Callee(callee), Args(args) {}
1007   virtual Value *Codegen();
1008 };
1009
1010 /// IfExprAST - Expression class for if/then/else.
1011 class IfExprAST : public ExprAST {
1012   ExprAST *Cond, *Then, *Else;
1013 public:
1014   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1015   : Cond(cond), Then(then), Else(_else) {}
1016   virtual Value *Codegen();
1017 };
1018
1019 /// ForExprAST - Expression class for for/in.
1020 class ForExprAST : public ExprAST {
1021   std::string VarName;
1022   ExprAST *Start, *End, *Step, *Body;
1023 public:
1024   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1025              ExprAST *step, ExprAST *body)
1026     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1027   virtual Value *Codegen();
1028 };
1029
1030 /// PrototypeAST - This class represents the "prototype" for a function,
1031 /// which captures its argument names as well as if it is an operator.
1032 class PrototypeAST {
1033   std::string Name;
1034   std::vector&lt;std::string&gt; Args;
1035 public:
1036   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
1037     : Name(name), Args(args) {}
1038
1039   Function *Codegen();
1040 };
1041
1042 /// FunctionAST - This class represents a function definition itself.
1043 class FunctionAST {
1044   PrototypeAST *Proto;
1045   ExprAST *Body;
1046 public:
1047   FunctionAST(PrototypeAST *proto, ExprAST *body)
1048     : Proto(proto), Body(body) {}
1049
1050   Function *Codegen();
1051 };
1052
1053 //===----------------------------------------------------------------------===//
1054 // Parser
1055 //===----------------------------------------------------------------------===//
1056
1057 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1058 /// token the parser it looking at.  getNextToken reads another token from the
1059 /// lexer and updates CurTok with its results.
1060 static int CurTok;
1061 static int getNextToken() {
1062   return CurTok = gettok();
1063 }
1064
1065 /// BinopPrecedence - This holds the precedence for each binary operator that is
1066 /// defined.
1067 static std::map&lt;char, int&gt; BinopPrecedence;
1068
1069 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1070 static int GetTokPrecedence() {
1071   if (!isascii(CurTok))
1072     return -1;
1073
1074   // Make sure it's a declared binop.
1075   int TokPrec = BinopPrecedence[CurTok];
1076   if (TokPrec &lt;= 0) return -1;
1077   return TokPrec;
1078 }
1079
1080 /// Error* - These are little helper functions for error handling.
1081 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1082 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1083 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1084
1085 static ExprAST *ParseExpression();
1086
1087 /// identifierexpr
1088 ///   ::= identifer
1089 ///   ::= identifer '(' expression* ')'
1090 static ExprAST *ParseIdentifierExpr() {
1091   std::string IdName = IdentifierStr;
1092
1093   getNextToken();  // eat identifer.
1094
1095   if (CurTok != '(') // Simple variable ref.
1096     return new VariableExprAST(IdName);
1097
1098   // Call.
1099   getNextToken();  // eat (
1100   std::vector&lt;ExprAST*&gt; Args;
1101   if (CurTok != ')') {
1102     while (1) {
1103       ExprAST *Arg = ParseExpression();
1104       if (!Arg) return 0;
1105       Args.push_back(Arg);
1106
1107       if (CurTok == ')') break;
1108
1109       if (CurTok != ',')
1110         return Error("Expected ')'");
1111       getNextToken();
1112     }
1113   }
1114
1115   // Eat the ')'.
1116   getNextToken();
1117
1118   return new CallExprAST(IdName, Args);
1119 }
1120
1121 /// numberexpr ::= number
1122 static ExprAST *ParseNumberExpr() {
1123   ExprAST *Result = new NumberExprAST(NumVal);
1124   getNextToken(); // consume the number
1125   return Result;
1126 }
1127
1128 /// parenexpr ::= '(' expression ')'
1129 static ExprAST *ParseParenExpr() {
1130   getNextToken();  // eat (.
1131   ExprAST *V = ParseExpression();
1132   if (!V) return 0;
1133
1134   if (CurTok != ')')
1135     return Error("expected ')'");
1136   getNextToken();  // eat ).
1137   return V;
1138 }
1139
1140 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1141 static ExprAST *ParseIfExpr() {
1142   getNextToken();  // eat the if.
1143
1144   // condition.
1145   ExprAST *Cond = ParseExpression();
1146   if (!Cond) return 0;
1147
1148   if (CurTok != tok_then)
1149     return Error("expected then");
1150   getNextToken();  // eat the then
1151
1152   ExprAST *Then = ParseExpression();
1153   if (Then == 0) return 0;
1154
1155   if (CurTok != tok_else)
1156     return Error("expected else");
1157
1158   getNextToken();
1159
1160   ExprAST *Else = ParseExpression();
1161   if (!Else) return 0;
1162
1163   return new IfExprAST(Cond, Then, Else);
1164 }
1165
1166 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1167 static ExprAST *ParseForExpr() {
1168   getNextToken();  // eat the for.
1169
1170   if (CurTok != tok_identifier)
1171     return Error("expected identifier after for");
1172
1173   std::string IdName = IdentifierStr;
1174   getNextToken();  // eat identifer.
1175
1176   if (CurTok != '=')
1177     return Error("expected '=' after for");
1178   getNextToken();  // eat '='.
1179
1180
1181   ExprAST *Start = ParseExpression();
1182   if (Start == 0) return 0;
1183   if (CurTok != ',')
1184     return Error("expected ',' after for start value");
1185   getNextToken();
1186
1187   ExprAST *End = ParseExpression();
1188   if (End == 0) return 0;
1189
1190   // The step value is optional.
1191   ExprAST *Step = 0;
1192   if (CurTok == ',') {
1193     getNextToken();
1194     Step = ParseExpression();
1195     if (Step == 0) return 0;
1196   }
1197
1198   if (CurTok != tok_in)
1199     return Error("expected 'in' after for");
1200   getNextToken();  // eat 'in'.
1201
1202   ExprAST *Body = ParseExpression();
1203   if (Body == 0) return 0;
1204
1205   return new ForExprAST(IdName, Start, End, Step, Body);
1206 }
1207
1208
1209 /// primary
1210 ///   ::= identifierexpr
1211 ///   ::= numberexpr
1212 ///   ::= parenexpr
1213 ///   ::= ifexpr
1214 ///   ::= forexpr
1215 static ExprAST *ParsePrimary() {
1216   switch (CurTok) {
1217   default: return Error("unknown token when expecting an expression");
1218   case tok_identifier: return ParseIdentifierExpr();
1219   case tok_number:     return ParseNumberExpr();
1220   case '(':            return ParseParenExpr();
1221   case tok_if:         return ParseIfExpr();
1222   case tok_for:        return ParseForExpr();
1223   }
1224 }
1225
1226 /// binoprhs
1227 ///   ::= ('+' primary)*
1228 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1229   // If this is a binop, find its precedence.
1230   while (1) {
1231     int TokPrec = GetTokPrecedence();
1232
1233     // If this is a binop that binds at least as tightly as the current binop,
1234     // consume it, otherwise we are done.
1235     if (TokPrec &lt; ExprPrec)
1236       return LHS;
1237
1238     // Okay, we know this is a binop.
1239     int BinOp = CurTok;
1240     getNextToken();  // eat binop
1241
1242     // Parse the primary expression after the binary operator.
1243     ExprAST *RHS = ParsePrimary();
1244     if (!RHS) return 0;
1245
1246     // If BinOp binds less tightly with RHS than the operator after RHS, let
1247     // the pending operator take RHS as its LHS.
1248     int NextPrec = GetTokPrecedence();
1249     if (TokPrec &lt; NextPrec) {
1250       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1251       if (RHS == 0) return 0;
1252     }
1253
1254     // Merge LHS/RHS.
1255     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1256   }
1257 }
1258
1259 /// expression
1260 ///   ::= primary binoprhs
1261 ///
1262 static ExprAST *ParseExpression() {
1263   ExprAST *LHS = ParsePrimary();
1264   if (!LHS) return 0;
1265
1266   return ParseBinOpRHS(0, LHS);
1267 }
1268
1269 /// prototype
1270 ///   ::= id '(' id* ')'
1271 static PrototypeAST *ParsePrototype() {
1272   if (CurTok != tok_identifier)
1273     return ErrorP("Expected function name in prototype");
1274
1275   std::string FnName = IdentifierStr;
1276   getNextToken();
1277
1278   if (CurTok != '(')
1279     return ErrorP("Expected '(' in prototype");
1280
1281   std::vector&lt;std::string&gt; ArgNames;
1282   while (getNextToken() == tok_identifier)
1283     ArgNames.push_back(IdentifierStr);
1284   if (CurTok != ')')
1285     return ErrorP("Expected ')' in prototype");
1286
1287   // success.
1288   getNextToken();  // eat ')'.
1289
1290   return new PrototypeAST(FnName, ArgNames);
1291 }
1292
1293 /// definition ::= 'def' prototype expression
1294 static FunctionAST *ParseDefinition() {
1295   getNextToken();  // eat def.
1296   PrototypeAST *Proto = ParsePrototype();
1297   if (Proto == 0) return 0;
1298
1299   if (ExprAST *E = ParseExpression())
1300     return new FunctionAST(Proto, E);
1301   return 0;
1302 }
1303
1304 /// toplevelexpr ::= expression
1305 static FunctionAST *ParseTopLevelExpr() {
1306   if (ExprAST *E = ParseExpression()) {
1307     // Make an anonymous proto.
1308     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1309     return new FunctionAST(Proto, E);
1310   }
1311   return 0;
1312 }
1313
1314 /// external ::= 'extern' prototype
1315 static PrototypeAST *ParseExtern() {
1316   getNextToken();  // eat extern.
1317   return ParsePrototype();
1318 }
1319
1320 //===----------------------------------------------------------------------===//
1321 // Code Generation
1322 //===----------------------------------------------------------------------===//
1323
1324 static Module *TheModule;
1325 static LLVMFoldingBuilder Builder;
1326 static std::map&lt;std::string, Value*&gt; NamedValues;
1327 static FunctionPassManager *TheFPM;
1328
1329 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1330
1331 Value *NumberExprAST::Codegen() {
1332   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1333 }
1334
1335 Value *VariableExprAST::Codegen() {
1336   // Look this variable up in the function.
1337   Value *V = NamedValues[Name];
1338   return V ? V : ErrorV("Unknown variable name");
1339 }
1340
1341 Value *BinaryExprAST::Codegen() {
1342   Value *L = LHS-&gt;Codegen();
1343   Value *R = RHS-&gt;Codegen();
1344   if (L == 0 || R == 0) return 0;
1345
1346   switch (Op) {
1347   case '+': return Builder.CreateAdd(L, R, "addtmp");
1348   case '-': return Builder.CreateSub(L, R, "subtmp");
1349   case '*': return Builder.CreateMul(L, R, "multmp");
1350   case '&lt;':
1351     L = Builder.CreateFCmpULT(L, R, "multmp");
1352     // Convert bool 0/1 to double 0.0 or 1.0
1353     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1354   default: return ErrorV("invalid binary operator");
1355   }
1356 }
1357
1358 Value *CallExprAST::Codegen() {
1359   // Look up the name in the global module table.
1360   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1361   if (CalleeF == 0)
1362     return ErrorV("Unknown function referenced");
1363
1364   // If argument mismatch error.
1365   if (CalleeF-&gt;arg_size() != Args.size())
1366     return ErrorV("Incorrect # arguments passed");
1367
1368   std::vector&lt;Value*&gt; ArgsV;
1369   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1370     ArgsV.push_back(Args[i]-&gt;Codegen());
1371     if (ArgsV.back() == 0) return 0;
1372   }
1373
1374   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1375 }
1376
1377 Value *IfExprAST::Codegen() {
1378   Value *CondV = Cond-&gt;Codegen();
1379   if (CondV == 0) return 0;
1380
1381   // Convert condition to a bool by comparing equal to 0.0.
1382   CondV = Builder.CreateFCmpONE(CondV,
1383                                 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1384                                 "ifcond");
1385
1386   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1387
1388   // Create blocks for the then and else cases.  Insert the 'then' block at the
1389   // end of the function.
1390   BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1391   BasicBlock *ElseBB = new BasicBlock("else");
1392   BasicBlock *MergeBB = new BasicBlock("ifcont");
1393
1394   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1395
1396   // Emit then value.
1397   Builder.SetInsertPoint(ThenBB);
1398
1399   Value *ThenV = Then-&gt;Codegen();
1400   if (ThenV == 0) return 0;
1401
1402   Builder.CreateBr(MergeBB);
1403   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1404   ThenBB = Builder.GetInsertBlock();
1405
1406   // Emit else block.
1407   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1408   Builder.SetInsertPoint(ElseBB);
1409
1410   Value *ElseV = Else-&gt;Codegen();
1411   if (ElseV == 0) return 0;
1412
1413   Builder.CreateBr(MergeBB);
1414   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1415   ElseBB = Builder.GetInsertBlock();
1416
1417   // Emit merge block.
1418   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1419   Builder.SetInsertPoint(MergeBB);
1420   PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1421
1422   PN-&gt;addIncoming(ThenV, ThenBB);
1423   PN-&gt;addIncoming(ElseV, ElseBB);
1424   return PN;
1425 }
1426
1427 Value *ForExprAST::Codegen() {
1428   // Output this as:
1429   //   ...
1430   //   start = startexpr
1431   //   goto loop
1432   // loop:
1433   //   variable = phi [start, loopheader], [nextvariable, loopend]
1434   //   ...
1435   //   bodyexpr
1436   //   ...
1437   // loopend:
1438   //   step = stepexpr
1439   //   nextvariable = variable + step
1440   //   endcond = endexpr
1441   //   br endcond, loop, endloop
1442   // outloop:
1443
1444   // Emit the start code first, without 'variable' in scope.
1445   Value *StartVal = Start-&gt;Codegen();
1446   if (StartVal == 0) return 0;
1447
1448   // Make the new basic block for the loop header, inserting after current
1449   // block.
1450   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1451   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1452   BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1453
1454   // Insert an explicit fall through from the current block to the LoopBB.
1455   Builder.CreateBr(LoopBB);
1456
1457   // Start insertion in LoopBB.
1458   Builder.SetInsertPoint(LoopBB);
1459
1460   // Start the PHI node with an entry for Start.
1461   PHINode *Variable = Builder.CreatePHI(Type::DoubleTy, VarName.c_str());
1462   Variable-&gt;addIncoming(StartVal, PreheaderBB);
1463
1464   // Within the loop, the variable is defined equal to the PHI node.  If it
1465   // shadows an existing variable, we have to restore it, so save it now.
1466   Value *OldVal = NamedValues[VarName];
1467   NamedValues[VarName] = Variable;
1468
1469   // Emit the body of the loop.  This, like any other expr, can change the
1470   // current BB.  Note that we ignore the value computed by the body, but don't
1471   // allow an error.
1472   if (Body-&gt;Codegen() == 0)
1473     return 0;
1474
1475   // Emit the step value.
1476   Value *StepVal;
1477   if (Step) {
1478     StepVal = Step-&gt;Codegen();
1479     if (StepVal == 0) return 0;
1480   } else {
1481     // If not specified, use 1.0.
1482     StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1483   }
1484
1485   Value *NextVar = Builder.CreateAdd(Variable, StepVal, "nextvar");
1486
1487   // Compute the end condition.
1488   Value *EndCond = End-&gt;Codegen();
1489   if (EndCond == 0) return EndCond;
1490
1491   // Convert condition to a bool by comparing equal to 0.0.
1492   EndCond = Builder.CreateFCmpONE(EndCond,
1493                                   ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1494                                   "loopcond");
1495
1496   // Create the "after loop" block and insert it.
1497   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1498   BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1499
1500   // Insert the conditional branch into the end of LoopEndBB.
1501   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1502
1503   // Any new code will be inserted in AfterBB.
1504   Builder.SetInsertPoint(AfterBB);
1505
1506   // Add a new entry to the PHI node for the backedge.
1507   Variable-&gt;addIncoming(NextVar, LoopEndBB);
1508
1509   // Restore the unshadowed variable.
1510   if (OldVal)
1511     NamedValues[VarName] = OldVal;
1512   else
1513     NamedValues.erase(VarName);
1514
1515
1516   // for expr always returns 0.0.
1517   return Constant::getNullValue(Type::DoubleTy);
1518 }
1519
1520 Function *PrototypeAST::Codegen() {
1521   // Make the function type:  double(double,double) etc.
1522   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1523   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1524
1525   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1526
1527   // If F conflicted, there was already something named 'Name'.  If it has a
1528   // body, don't allow redefinition or reextern.
1529   if (F-&gt;getName() != Name) {
1530     // Delete the one we just made and get the existing one.
1531     F-&gt;eraseFromParent();
1532     F = TheModule-&gt;getFunction(Name);
1533
1534     // If F already has a body, reject this.
1535     if (!F-&gt;empty()) {
1536       ErrorF("redefinition of function");
1537       return 0;
1538     }
1539
1540     // If F took a different number of args, reject.
1541     if (F-&gt;arg_size() != Args.size()) {
1542       ErrorF("redefinition of function with different # args");
1543       return 0;
1544     }
1545   }
1546
1547   // Set names for all arguments.
1548   unsigned Idx = 0;
1549   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1550        ++AI, ++Idx) {
1551     AI-&gt;setName(Args[Idx]);
1552
1553     // Add arguments to variable symbol table.
1554     NamedValues[Args[Idx]] = AI;
1555   }
1556
1557   return F;
1558 }
1559
1560 Function *FunctionAST::Codegen() {
1561   NamedValues.clear();
1562
1563   Function *TheFunction = Proto-&gt;Codegen();
1564   if (TheFunction == 0)
1565     return 0;
1566
1567   // Create a new basic block to start insertion into.
1568   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1569   Builder.SetInsertPoint(BB);
1570
1571   if (Value *RetVal = Body-&gt;Codegen()) {
1572     // Finish off the function.
1573     Builder.CreateRet(RetVal);
1574
1575     // Validate the generated code, checking for consistency.
1576     verifyFunction(*TheFunction);
1577
1578     // Optimize the function.
1579     TheFPM-&gt;run(*TheFunction);
1580
1581     return TheFunction;
1582   }
1583
1584   // Error reading body, remove function.
1585   TheFunction-&gt;eraseFromParent();
1586   return 0;
1587 }
1588
1589 //===----------------------------------------------------------------------===//
1590 // Top-Level parsing and JIT Driver
1591 //===----------------------------------------------------------------------===//
1592
1593 static ExecutionEngine *TheExecutionEngine;
1594
1595 static void HandleDefinition() {
1596   if (FunctionAST *F = ParseDefinition()) {
1597     if (Function *LF = F-&gt;Codegen()) {
1598       fprintf(stderr, "Read function definition:");
1599       LF-&gt;dump();
1600     }
1601   } else {
1602     // Skip token for error recovery.
1603     getNextToken();
1604   }
1605 }
1606
1607 static void HandleExtern() {
1608   if (PrototypeAST *P = ParseExtern()) {
1609     if (Function *F = P-&gt;Codegen()) {
1610       fprintf(stderr, "Read extern: ");
1611       F-&gt;dump();
1612     }
1613   } else {
1614     // Skip token for error recovery.
1615     getNextToken();
1616   }
1617 }
1618
1619 static void HandleTopLevelExpression() {
1620   // Evaluate a top level expression into an anonymous function.
1621   if (FunctionAST *F = ParseTopLevelExpr()) {
1622     if (Function *LF = F-&gt;Codegen()) {
1623       // JIT the function, returning a function pointer.
1624       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
1625
1626       // Cast it to the right type (takes no arguments, returns a double) so we
1627       // can call it as a native function.
1628       double (*FP)() = (double (*)())FPtr;
1629       fprintf(stderr, "Evaluated to %f\n", FP());
1630     }
1631   } else {
1632     // Skip token for error recovery.
1633     getNextToken();
1634   }
1635 }
1636
1637 /// top ::= definition | external | expression | ';'
1638 static void MainLoop() {
1639   while (1) {
1640     fprintf(stderr, "ready&gt; ");
1641     switch (CurTok) {
1642     case tok_eof:    return;
1643     case ';':        getNextToken(); break;  // ignore top level semicolons.
1644     case tok_def:    HandleDefinition(); break;
1645     case tok_extern: HandleExtern(); break;
1646     default:         HandleTopLevelExpression(); break;
1647     }
1648   }
1649 }
1650
1651
1652
1653 //===----------------------------------------------------------------------===//
1654 // "Library" functions that can be "extern'd" from user code.
1655 //===----------------------------------------------------------------------===//
1656
1657 /// putchard - putchar that takes a double and returns 0.
1658 extern "C"
1659 double putchard(double X) {
1660   putchar((char)X);
1661   return 0;
1662 }
1663
1664 //===----------------------------------------------------------------------===//
1665 // Main driver code.
1666 //===----------------------------------------------------------------------===//
1667
1668 int main() {
1669   // Install standard binary operators.
1670   // 1 is lowest precedence.
1671   BinopPrecedence['&lt;'] = 10;
1672   BinopPrecedence['+'] = 20;
1673   BinopPrecedence['-'] = 20;
1674   BinopPrecedence['*'] = 40;  // highest.
1675
1676   // Prime the first token.
1677   fprintf(stderr, "ready&gt; ");
1678   getNextToken();
1679
1680   // Make the module, which holds all the code.
1681   TheModule = new Module("my cool jit");
1682
1683   // Create the JIT.
1684   TheExecutionEngine = ExecutionEngine::create(TheModule);
1685
1686   {
1687     ExistingModuleProvider OurModuleProvider(TheModule);
1688     FunctionPassManager OurFPM(&amp;OurModuleProvider);
1689
1690     // Set up the optimizer pipeline.  Start with registering info about how the
1691     // target lays out data structures.
1692     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
1693     // Do simple "peephole" optimizations and bit-twiddling optzns.
1694     OurFPM.add(createInstructionCombiningPass());
1695     // Reassociate expressions.
1696     OurFPM.add(createReassociatePass());
1697     // Eliminate Common SubExpressions.
1698     OurFPM.add(createGVNPass());
1699     // Simplify the control flow graph (deleting unreachable blocks, etc).
1700     OurFPM.add(createCFGSimplificationPass());
1701     // Set the global so the code gen can use this.
1702     TheFPM = &amp;OurFPM;
1703
1704     // Run the main "interpreter loop" now.
1705     MainLoop();
1706
1707     TheFPM = 0;
1708   }  // Free module provider and pass manager.
1709
1710
1711   // Print out all of the generated code.
1712   TheModule-&gt;dump();
1713   return 0;
1714 }
1715 </pre>
1716 </div>
1717
1718 </div>
1719
1720 <!-- *********************************************************************** -->
1721 <hr>
1722 <address>
1723   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1724   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1725   <a href="http://validator.w3.org/check/referer"><img
1726   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1727
1728   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1729   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1730   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1731 </address>
1732 </body>
1733 </html>