docs/tutorial/LangImpl3.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
   7   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   8   <meta name="author" content="Chris Lattner">
   9   <link rel="stylesheet" href="../llvm.css" type="text/css">
  10 </head>
  11
  12 <body>
  13
  14 <div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
  15
  16 <ul>
  17 <li>Chapter 3
  18   <ol>
  19     <li><a href="#intro">Chapter 3 Introduction</a></li>
  20     <li><a href="#basics">Code Generation setup</a></li>
  21     <li><a href="#exprs">Expression Code Generation</a></li>
  22     <li><a href="#funcs">Function Code Generation</a></li>
  23     <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
  24     <li><a href="#code">Full Code Listing</a></li>
  25   </ol>
  26 </li>
  27 </ul>
  28
  29 <div class="doc_author">
  30   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  31 </div>
  32
  33 <!-- *********************************************************************** -->
  34 <div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
  35 <!-- *********************************************************************** -->
  36
  37 <div class="doc_text">
  38
  39 <p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
  40 with LLVM</a>" tutorial.  This chapter shows you how to transform the <a
  41 href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
  42 This will teach you a little bit about how LLVM does things, as well as
  43 demonstrate how easy it is to use.  It's much more work to build a lexer and
  44 parser than it is to generate LLVM IR code.
  45 </p>
  46
  47 </div>
  48
  49 <!-- *********************************************************************** -->
  50 <div class="doc_section"><a name="basics">Code Generation setup</a></div>
  51 <!-- *********************************************************************** -->
  52
  53 <div class="doc_text">
  54
  55 <p>
  56 In order to generate LLVM IR, we want some simple setup to get started.  First,
  57 we define virtual codegen methods in each AST class:</p>
  58
  59 <div class="doc_code">
  60 <pre>
  61 /// ExprAST - Base class for all expression nodes.
  62 class ExprAST {
  63 public:
  64   virtual ~ExprAST() {}
  65   <b>virtual Value *Codegen() = 0;</b>
  66 };
  67
  68 /// NumberExprAST - Expression class for numeric literals like "1.0".
  69 class NumberExprAST : public ExprAST {
  70   double Val;
  71 public:
  72   explicit NumberExprAST(double val) : Val(val) {}
  73   <b>virtual Value *Codegen();</b>
  74 };
  75 ...
  76 </pre>
  77 </div>
  78
  79 <p>The Codegen() method says to emit IR for that AST node and all things it
  80 depends on, and they all return an LLVM Value object.
  81 "Value" is the class used to represent a "<a
  82 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
  83 Assignment (SSA)</a> register" or "SSA value" in LLVM.  The most distinct aspect
  84 of SSA values is that their value is computed as the related instruction
  85 executes, and it does not get a new value until (and if) the instruction
  86 re-executes.  In order words, there is no way to "change" an SSA value.  For
  87 more information, please read up on <a
  88 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
  89 Assignment</a> - the concepts are really quite natural once you grok them.</p>
  90
  91 <p>Note that instead of adding virtual methods to the ExprAST class hierarchy,
  92 it could also make sense to use a visitor pattern or some other way to model
  93 this.  Again, this tutorial won't dwell on good software engineering practices:
  94 for our purposes, adding a virtual method is simplest.</p>
  95
  96 <p>The
  97 second thing we want is an "Error" method like we used for parser, which will
  98 be used to report errors found during code generation (for example, use of an
  99 undeclared parameter):</p>
 100
 101 <div class="doc_code">
 102 <pre>
 103 Value *ErrorV(const char *Str) { Error(Str); return 0; }
 104
 105 static Module *TheModule;
 106 static LLVMBuilder Builder;
 107 static std::map&lt;std::string, Value*&gt; NamedValues;
 108 </pre>
 109 </div>
 110
 111 <p>The static variables will be used during code generation.  <tt>TheModule</tt>
 112 is the LLVM construct that contains all of the functions and global variables in
 113 a chunk of code.  In many ways, it is the top-level structure that the LLVM IR
 114 uses to contain code.</p>
 115
 116 <p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
 117 LLVM instructions.  Instances of the <a
 118 href="http://llvm.org/doxygen/LLVMBuilder_8h-source.html"><tt>LLVMBuilder</tt>
 119 class</a> keeps track of the current place to
 120 insert instructions and has methods to create new instructions.</p>
 121
 122 <p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
 123 current scope and what their LLVM representation is.  In this form of
 124 Kaleidoscope, the only things that can be referenced are function parameters.
 125 As such, function parameters will be in this map when generating code for their
 126 function body.</p>
 127
 128 <p>
 129 With these basics in place, we can start talking about how to generate code for
 130 each expression.  Note that this assumes that the <tt>Builder</tt> has been set
 131 up to generate code <em>into</em> something.  For now, we'll assume that this
 132 has already been done, and we'll just use it to emit code.
 133 </p>
 134
 135 </div>
 136
 137 <!-- *********************************************************************** -->
 138 <div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
 139 <!-- *********************************************************************** -->
 140
 141 <div class="doc_text">
 142
 143 <p>Generating LLVM code for expression nodes is very straight-forward: less
 144 than 45 lines of commented code for all four of our expression nodes.  First,
 145 we'll do numeric literals:</p>
 146
 147 <div class="doc_code">
 148 <pre>
 149 Value *NumberExprAST::Codegen() {
 150   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
 151 }
 152 </pre>
 153 </div>
 154
 155 <p>In the LLVM IR, numeric constants are represented with the
 156 <tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
 157 internally (<tt>APFloat</tt> has the capability of holding floating point
 158 constants of <em>A</em>rbitrary <em>P</em>recision).  This code basically just
 159 creates and returns a <tt>ConstantFP</tt>.  Note that in the LLVM IR
 160 that constants are all uniqued together and shared.  For this reason, the API
 161 uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
 162
 163 <div class="doc_code">
 164 <pre>
 165 Value *VariableExprAST::Codegen() {
 166   // Look this variable up in the function.
 167   Value *V = NamedValues[Name];
 168   return V ? V : ErrorV("Unknown variable name");
 169 }
 170 </pre>
 171 </div>
 172
 173 <p>References to variables is also quite simple here.  In the simple version
 174 of Kaleidoscope, we assume that the variable has already been emited somewhere
 175 and its value is available.  In practice, the only values that can be in the
 176 <tt>NamedValues</tt> map are function arguments.  This
 177 code simply checks to see that the specified name is in the map (if not, an
 178 unknown variable is being referenced) and returns the value for it.</p>
 179
 180 <div class="doc_code">
 181 <pre>
 182 Value *BinaryExprAST::Codegen() {
 183   Value *L = LHS-&gt;Codegen();
 184   Value *R = RHS-&gt;Codegen();
 185   if (L == 0 || R == 0) return 0;
 186
 187   switch (Op) {
 188   case '+': return Builder.CreateAdd(L, R, "addtmp");
 189   case '-': return Builder.CreateSub(L, R, "subtmp");
 190   case '*': return Builder.CreateMul(L, R, "multmp");
 191   case '&lt;':
 192     L = Builder.CreateFCmpULT(L, R, "multmp");
 193     // Convert bool 0/1 to double 0.0 or 1.0
 194     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
 195   default: return ErrorV("invalid binary operator");
 196   }
 197 }
 198 </pre>
 199 </div>
 200
 201 <p>Binary operators start to get more interesting.  The basic idea here is that
 202 we recursively emit code for the left-hand side of the expression, then the
 203 right-hand side, then we compute the result of the binary expression.  In this
 204 code, we do a simple switch on the opcode to create the right LLVM instruction.
 205 </p>
 206
 207 <p>In this example, the LLVM builder class is starting to show its value.
 208 Because it knows where to insert the newly created instruction, you just have to
 209 specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
 210 operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
 211 for the generated instruction.  One nice thing about LLVM is that the name is
 212 just a hint: if there are multiple additions in a single function, the first
 213 will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
 214 giving it a name like "addtmp42".  Local value names for instructions are purely
 215 optional, but it makes it much easier to read the IR dumps.</p>
 216
 217 <p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
 218 have very strict type properties: for example, the Left and Right operators of
 219 an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
 220 type, and that the result of the add matches the operands.  Because all values
 221 in Kaleidoscope are doubles, this makes for very simple code for add, sub and
 222 mul.</p>
 223
 224 <p>On the other hand, LLVM specifies that the <a
 225 href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
 226 (a one bit integer).  However, Kaleidoscope wants the value to be a 0.0 or 1.0
 227 value.  In order to get these semantics, we combine the fcmp instruction with
 228 a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>.  This instruction
 229 converts its input integer into a floating point value by treating the input
 230 as an unsigned value.  In contrast, if we used the <a
 231 href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
 232 operator would return 0.0 and -1.0, depending on the input value.</p>
 233
 234 <div class="doc_code">
 235 <pre>
 236 Value *CallExprAST::Codegen() {
 237   // Look up the name in the global module table.
 238   Function *CalleeF = TheModule-&gt;getFunction(Callee);
 239   if (CalleeF == 0)
 240     return ErrorV("Unknown function referenced");
 241
 242   // If argument mismatch error.
 243   if (CalleeF-&gt;arg_size() != Args.size())
 244     return ErrorV("Incorrect # arguments passed");
 245
 246   std::vector&lt;Value*&gt; ArgsV;
 247   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
 248     ArgsV.push_back(Args[i]-&gt;Codegen());
 249     if (ArgsV.back() == 0) return 0;
 250   }
 251
 252   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
 253 }
 254 </pre>
 255 </div>
 256
 257 <p>Code generation for function calls is quite straight-forward with LLVM.  The
 258 code above first looks the name of the function up in the LLVM Module's symbol
 259 table.  Recall that the LLVM Module is the container that holds all of the
 260 functions we are JIT'ing.  By giving each function the same name as what the
 261 user specifies, we can use the LLVM symbol table to resolve function names for
 262 us.</p>
 263
 264 <p>Once we have the function to call, we recursively codegen each argument that
 265 is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
 266 instruction</a>.  Note that LLVM uses the native C calling conventions by
 267 default, allowing these calls to call into standard library functions like
 268 "sin" and "cos" with no additional effort.</p>
 269
 270 <p>This wraps up our handling of the four basic expressions that we have so far
 271 in Kaleidoscope.  Feel free to go in and add some more.  For example, by
 272 browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
 273 several other interesting instructions that are really easy to plug into our
 274 basic framework.</p>
 275
 276 </div>
 277
 278 <!-- *********************************************************************** -->
 279 <div class="doc_section"><a name="funcs">Function Code Generation</a></div>
 280 <!-- *********************************************************************** -->
 281
 282 <div class="doc_text">
 283
 284 <p>Code generation for prototypes and functions has to handle a number of
 285 details, which make their code less beautiful and elegant than expression code
 286 generation, but they illustrate some important points.  First, lets talk about
 287 code generation for prototypes: this is used both for function bodies as well
 288 as external function declarations.  The code starts with:</p>
 289
 290 <div class="doc_code">
 291 <pre>
 292 Function *PrototypeAST::Codegen() {
 293   // Make the function type:  double(double,double) etc.
 294   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
 295   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
 296
 297   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
 298 </pre>
 299 </div>
 300
 301 <p>This code packs a lot of power into a few lines.  Note first that this
 302 function returns a Function* instead of a Value*.  Because a "prototype" really
 303 talks about the external interface for a function (not the value computed by
 304 an expression), it makes sense for it to return the LLVM Function it corresponds
 305 to when codegen'd.</p>
 306
 307 <p>The next step is to create
 308 the <tt>FunctionType</tt> that should be used for a given Prototype.  Since all
 309 function arguments in Kaleidoscope are of type double, the first line creates
 310 a vector of "N" LLVM Double types.  It then uses the <tt>FunctionType::get</tt>
 311 method to create a function type that takes "N" doubles as arguments, returns
 312 one double as a result, and that is not vararg (the false parameter indicates
 313 this).  Note that Types in LLVM are uniqued just like Constants are, so you
 314 don't "new" a type, you "get" it.</p>
 315
 316 <p>The final line above actually creates the function that the prototype will
 317 correspond to.  This indicates which type, linkage, and name to use, and which
 318 module to insert into.  "<a href="LangRef.html#linkage">external linkage</a>"
 319 means that the function may be defined outside the current module and/or that it
 320 is callable by functions outside the module.  The Name passed in is the name the
 321 user specified: since "<tt>TheModule</tt>" is specified, this name is registered
 322 in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
 323 above.</p>
 324
 325 <div class="doc_code">
 326 <pre>
 327   // If F conflicted, there was already something named 'Name'.  If it has a
 328   // body, don't allow redefinition or reextern.
 329   if (F-&gt;getName() != Name) {
 330     // Delete the one we just made and get the existing one.
 331     F-&gt;eraseFromParent();
 332     F = TheModule-&gt;getFunction(Name);
 333 </pre>
 334 </div>
 335
 336 <p>The Module symbol table works just like the Function symbol table when it
 337 comes to name conflicts: if a new function is created with a name was previously
 338 added to the symbol table, it will get implicitly renamed when added to the
 339 Module.  The code above exploits this fact to tell if there was a previous
 340 definition of this function.</p>
 341
 342 <p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
 343 first, we want to allow 'extern'ing a function more than once, so long as the
 344 prototypes for the externs match (since all arguments have the same type, we
 345 just have to check that the number of arguments match).  Second, we want to
 346 allow 'extern'ing a function and then definining a body for it.  This is useful
 347 when defining mutually recursive functions.</p>
 348
 349 <p>In order to implement this, the code above first checks to see if there is
 350 a collision on the name of the function.  If so, it deletes the function we just
 351 created (by calling <tt>eraseFromParent</tt>) and then calling
 352 <tt>getFunction</tt> to get the existing function with the specified name.  Note
 353 that many APIs in LLVM have "erase" forms and "remove" forms.  The "remove" form
 354 unlinks the object from its parent (e.g. a Function from a Module) and returns
 355 it.  The "erase" form unlinks the object and then deletes it.</p>
 356
 357 <div class="doc_code">
 358 <pre>
 359     // If F already has a body, reject this.
 360     if (!F-&gt;empty()) {
 361       ErrorF("redefinition of function");
 362       return 0;
 363     }
 364
 365     // If F took a different number of args, reject.
 366     if (F-&gt;arg_size() != Args.size()) {
 367       ErrorF("redefinition of function with different # args");
 368       return 0;
 369     }
 370   }
 371 </pre>
 372 </div>
 373
 374 <p>In order to verify the logic above, we first check to see if the preexisting
 375 function is "empty".  In this case, empty means that it has no basic blocks in
 376 it, which means it has no body.  If it has no body, this means its a forward
 377 declaration.  Since we don't allow anything after a full definition of the
 378 function, the code rejects this case.  If the previous reference to a function
 379 was an 'extern', we simply verify that the number of arguments for that
 380 definition and this one match up.  If not, we emit an error.</p>
 381
 382 <div class="doc_code">
 383 <pre>
 384   // Set names for all arguments.
 385   unsigned Idx = 0;
 386   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
 387        ++AI, ++Idx) {
 388     AI-&gt;setName(Args[Idx]);
 389
 390     // Add arguments to variable symbol table.
 391     NamedValues[Args[Idx]] = AI;
 392   }
 393   return F;
 394 }
 395 </pre>
 396 </div>
 397
 398 <p>The last bit of code for prototypes loops over all of the arguments in the
 399 function, setting the name of the LLVM Argument objects to match and registering
 400 the arguments in the <tt>NamedValues</tt> map for future use by the
 401 <tt>VariableExprAST</tt> AST node.  Once this is set up, it returns the Function
 402 object to the caller.  Note that we don't check for conflicting
 403 argument names here (e.g. "extern foo(a b a)").  Doing so would be very
 404 straight-forward.</p>
 405
 406 <div class="doc_code">
 407 <pre>
 408 Function *FunctionAST::Codegen() {
 409   NamedValues.clear();
 410
 411   Function *TheFunction = Proto-&gt;Codegen();
 412   if (TheFunction == 0)
 413     return 0;
 414 </pre>
 415 </div>
 416
 417 <p>Code generation for function definitions starts out simply enough: first we
 418 codegen the prototype and verify that it is ok.  We also clear out the
 419 <tt>NamedValues</tt> map to make sure that there isn't anything in it from the
 420 last function we compiled.</p>
 421
 422 <div class="doc_code">
 423 <pre>
 424   // Create a new basic block to start insertion into.
 425   BasicBlock *BB = new BasicBlock("entry", TheFunction);
 426   Builder.SetInsertPoint(BB);
 427
 428   if (Value *RetVal = Body-&gt;Codegen()) {
 429 </pre>
 430 </div>
 431
 432 <p>Now we get to the point where the <tt>Builder</tt> is set up.  The first
 433 line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
 434 block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>.  The
 435 second line then tells the builder that new instructions should be inserted into
 436 the end of the new basic block.  Basic blocks in LLVM are an important part
 437 of functions that define the <a
 438 href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
 439 Since we don't have any control flow, our functions will only contain one
 440 block so far.  We'll fix this in a future installment :).</p>
 441
 442 <div class="doc_code">
 443 <pre>
 444   if (Value *RetVal = Body-&gt;Codegen()) {
 445     // Finish off the function.
 446     Builder.CreateRet(RetVal);
 447
 448     // Validate the generated code, checking for consistency.
 449     verifyFunction(*TheFunction);
 450     return TheFunction;
 451   }
 452 </pre>
 453 </div>
 454
 455 <p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
 456 the root expression of the function.  If no error happens, this emits code to
 457 compute the expression into the entry block and returns the value that was
 458 computed.  Assuming no error, we then create an LLVM <a
 459 href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
 460 Once the function is built, we call the <tt>verifyFunction</tt> function, which
 461 is provided by LLVM.  This function does a variety of consistency checks on the
 462 generated code, to determine if our compiler is doing everything right.  Using
 463 this is important: it can catch a lot of bugs.  Once the function is finished
 464 and validated, we return it.</p>
 465
 466 <div class="doc_code">
 467 <pre>
 468   // Error reading body, remove function.
 469   TheFunction-&gt;eraseFromParent();
 470   return 0;
 471 }
 472 </pre>
 473 </div>
 474
 475 <p>The only piece left here is handling of the error case.  For simplicity, we
 476 simply handle this by deleting the function we produced with the
 477 <tt>eraseFromParent</tt> method.  This allows the user to redefine a function
 478 that they incorrectly typed in before: if we didn't delete it, it would live in
 479 the symbol table, with a body, preventing future redefinition.</p>
 480
 481 <p>This code does have a bug though.  Since the <tt>PrototypeAST::Codegen</tt>
 482 can return a previously defined forward declaration, this can actually delete
 483 a forward declaration.  There are a number of ways to fix this bug, see what you
 484 can come up with!  Here is a testcase:</p>
 485
 486 <div class="doc_code">
 487 <pre>
 488 extern foo(a b);     # ok, defines foo.
 489 def foo(a b) c;      # error, 'c' is invalid.
 490 def bar() foo(1, 2); # error, unknown function "foo"
 491 </pre>
 492 </div>
 493
 494 </div>
 495
 496 <!-- *********************************************************************** -->
 497 <div class="doc_section"><a name="driver">Driver Changes and
 498 Closing Thoughts</a></div>
 499 <!-- *********************************************************************** -->
 500
 501 <div class="doc_text">
 502
 503 <p>
 504 For now, code generation to LLVM doesn't really get us much, except that we can
 505 look at the pretty IR calls.  The sample code inserts calls to Codegen into the
 506 "<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
 507 dumps out the LLVM IR.  This gives a nice way to look at the LLVM IR for simple
 508 functions.  For example:
 509 </p>
 510
 511 <div class="doc_code">
 512 <pre>
 513 ready> <b>4+5</b>;
 514 ready> Read top-level expression:
 515 define double @""() {
 516 entry:
 517         %addtmp = add double 4.000000e+00, 5.000000e+00
 518         ret double %addtmp
 519 }
 520 </pre>
 521 </div>
 522
 523 <p>Note how the parser turns the top-level expression into anonymous functions
 524 for us.  This will be handy when we add JIT support in the next chapter.  Also
 525 note that the code is very literally transcribed, no optimizations are being
 526 performed.  We will add optimizations explicitly in the next chapter.</p>
 527
 528 <div class="doc_code">
 529 <pre>
 530 ready&gt; <b>def foo(a b) a*a + 2*a*b + b*b;</b>
 531 ready&gt; Read function definition:
 532 define double @foo(double %a, double %b) {
 533 entry:
 534         %multmp = mul double %a, %a
 535         %multmp1 = mul double 2.000000e+00, %a
 536         %multmp2 = mul double %multmp1, %b
 537         %addtmp = add double %multmp, %multmp2
 538         %multmp3 = mul double %b, %b
 539         %addtmp4 = add double %addtmp, %multmp3
 540         ret double %addtmp4
 541 }
 542 </pre>
 543 </div>
 544
 545 <p>This shows some simple arithmetic. Notice the striking similarity to the
 546 LLVM builder calls that we use to create the instructions.</p>
 547
 548 <div class="doc_code">
 549 <pre>
 550 ready&gt; <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
 551 ready&gt; Read function definition:
 552 define double @bar(double %a) {
 553 entry:
 554         %calltmp = call double @foo( double %a, double 4.000000e+00 )
 555         %calltmp1 = call double @bar( double 3.133700e+04 )
 556         %addtmp = add double %calltmp, %calltmp1
 557         ret double %addtmp
 558 }
 559 </pre>
 560 </div>
 561
 562 <p>This shows some function calls.  Note that this function will take a long
 563 time to execute if you call it.  In the future we'll add conditional control
 564 flow to make recursion actually be useful :).</p>
 565
 566 <div class="doc_code">
 567 <pre>
 568 ready&gt; <b>extern cos(x);</b>
 569 ready&gt; Read extern:
 570 declare double @cos(double)
 571
 572 ready&gt; <b>cos(1.234);</b>
 573 ready&gt; Read top-level expression:
 574 define double @""() {
 575 entry:
 576         %calltmp = call double @cos( double 1.234000e+00 )
 577         ret double %calltmp
 578 }
 579 </pre>
 580 </div>
 581
 582 <p>This shows an extern for the libm "cos" function, and a call to it.</p>
 583
 584
 585 <div class="doc_code">
 586 <pre>
 587 ready&gt; <b>^D</b>
 588 ; ModuleID = 'my cool jit'
 589
 590 define double @""() {
 591 entry:
 592         %addtmp = add double 4.000000e+00, 5.000000e+00
 593         ret double %addtmp
 594 }
 595
 596 define double @foo(double %a, double %b) {
 597 entry:
 598         %multmp = mul double %a, %a
 599         %multmp1 = mul double 2.000000e+00, %a
 600         %multmp2 = mul double %multmp1, %b
 601         %addtmp = add double %multmp, %multmp2
 602         %multmp3 = mul double %b, %b
 603         %addtmp4 = add double %addtmp, %multmp3
 604         ret double %addtmp4
 605 }
 606
 607 define double @bar(double %a) {
 608 entry:
 609         %calltmp = call double @foo( double %a, double 4.000000e+00 )
 610         %calltmp1 = call double @bar( double 3.133700e+04 )
 611         %addtmp = add double %calltmp, %calltmp1
 612         ret double %addtmp
 613 }
 614
 615 declare double @cos(double)
 616
 617 define double @""() {
 618 entry:
 619         %calltmp = call double @cos( double 1.234000e+00 )
 620         ret double %calltmp
 621 }
 622 </pre>
 623 </div>
 624
 625 <p>When you quit the current demo, it dumps out the IR for the entire module
 626 generated.  Here you can see the big picture with all the functions referencing
 627 each other.</p>
 628
 629 <p>This wraps up this chapter of the Kaleidoscope tutorial.  Up next we'll
 630 describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
 631 support</a> to this so we can actually start running code!</p>
 632
 633 </div>
 634
 635
 636 <!-- *********************************************************************** -->
 637 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 638 <!-- *********************************************************************** -->
 639
 640 <div class="doc_text">
 641
 642 <p>
 643 Here is the complete code listing for our running example, enhanced with the
 644 LLVM code generator.    Because this uses the LLVM libraries, we need to link
 645 them in.  To do this, we use the <a
 646 href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
 647 our makefile/command line about which options to use:</p>
 648
 649 <div class="doc_code">
 650 <pre>
 651    # Compile
 652    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
 653    # Run
 654    ./toy
 655 </pre>
 656 </div>
 657
 658 <p>Here is the code:</p>
 659
 660 <div class="doc_code">
 661 <pre>
 662 // To build this:
 663 // See example below.
 664
 665 #include "llvm/DerivedTypes.h"
 666 #include "llvm/Module.h"
 667 #include "llvm/Analysis/Verifier.h"
 668 #include "llvm/Support/LLVMBuilder.h"
 669 #include &lt;cstdio&gt;
 670 #include &lt;string&gt;
 671 #include &lt;map&gt;
 672 #include &lt;vector&gt;
 673 using namespace llvm;
 674
 675 //===----------------------------------------------------------------------===//
 676 // Lexer
 677 //===----------------------------------------------------------------------===//
 678
 679 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
 680 // of these for known things.
 681 enum Token {
 682   tok_eof = -1,
 683
 684   // commands
 685   tok_def = -2, tok_extern = -3,
 686
 687   // primary
 688   tok_identifier = -4, tok_number = -5,
 689 };
 690
 691 static std::string IdentifierStr;  // Filled in if tok_identifier
 692 static double NumVal;              // Filled in if tok_number
 693
 694 /// gettok - Return the next token from standard input.
 695 static int gettok() {
 696   static int LastChar = ' ';
 697
 698   // Skip any whitespace.
 699   while (isspace(LastChar))
 700     LastChar = getchar();
 701
 702   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
 703     IdentifierStr = LastChar;
 704     while (isalnum((LastChar = getchar())))
 705       IdentifierStr += LastChar;
 706
 707     if (IdentifierStr == "def") return tok_def;
 708     if (IdentifierStr == "extern") return tok_extern;
 709     return tok_identifier;
 710   }
 711
 712   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
 713     std::string NumStr;
 714     do {
 715       NumStr += LastChar;
 716       LastChar = getchar();
 717     } while (isdigit(LastChar) || LastChar == '.');
 718
 719     NumVal = strtod(NumStr.c_str(), 0);
 720     return tok_number;
 721   }
 722
 723   if (LastChar == '#') {
 724     // Comment until end of line.
 725     do LastChar = getchar();
 726     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
 727
 728     if (LastChar != EOF)
 729       return gettok();
 730   }
 731
 732   // Check for end of file.  Don't eat the EOF.
 733   if (LastChar == EOF)
 734     return tok_eof;
 735
 736   // Otherwise, just return the character as its ascii value.
 737   int ThisChar = LastChar;
 738   LastChar = getchar();
 739   return ThisChar;
 740 }
 741
 742 //===----------------------------------------------------------------------===//
 743 // Abstract Syntax Tree (aka Parse Tree)
 744 //===----------------------------------------------------------------------===//
 745
 746 /// ExprAST - Base class for all expression nodes.
 747 class ExprAST {
 748 public:
 749   virtual ~ExprAST() {}
 750   virtual Value *Codegen() = 0;
 751 };
 752
 753 /// NumberExprAST - Expression class for numeric literals like "1.0".
 754 class NumberExprAST : public ExprAST {
 755   double Val;
 756 public:
 757   explicit NumberExprAST(double val) : Val(val) {}
 758   virtual Value *Codegen();
 759 };
 760
 761 /// VariableExprAST - Expression class for referencing a variable, like "a".
 762 class VariableExprAST : public ExprAST {
 763   std::string Name;
 764 public:
 765   explicit VariableExprAST(const std::string &amp;name) : Name(name) {}
 766   virtual Value *Codegen();
 767 };
 768
 769 /// BinaryExprAST - Expression class for a binary operator.
 770 class BinaryExprAST : public ExprAST {
 771   char Op;
 772   ExprAST *LHS, *RHS;
 773 public:
 774   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
 775     : Op(op), LHS(lhs), RHS(rhs) {}
 776   virtual Value *Codegen();
 777 };
 778
 779 /// CallExprAST - Expression class for function calls.
 780 class CallExprAST : public ExprAST {
 781   std::string Callee;
 782   std::vector&lt;ExprAST*&gt; Args;
 783 public:
 784   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
 785     : Callee(callee), Args(args) {}
 786   virtual Value *Codegen();
 787 };
 788
 789 /// PrototypeAST - This class represents the "prototype" for a function,
 790 /// which captures its argument names as well as if it is an operator.
 791 class PrototypeAST {
 792   std::string Name;
 793   std::vector&lt;std::string&gt; Args;
 794 public:
 795   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args)
 796     : Name(name), Args(args) {}
 797
 798   Function *Codegen();
 799 };
 800
 801 /// FunctionAST - This class represents a function definition itself.
 802 class FunctionAST {
 803   PrototypeAST *Proto;
 804   ExprAST *Body;
 805 public:
 806   FunctionAST(PrototypeAST *proto, ExprAST *body)
 807     : Proto(proto), Body(body) {}
 808
 809   Function *Codegen();
 810 };
 811
 812 //===----------------------------------------------------------------------===//
 813 // Parser
 814 //===----------------------------------------------------------------------===//
 815
 816 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
 817 /// token the parser it looking at.  getNextToken reads another token from the
 818 /// lexer and updates CurTok with its results.
 819 static int CurTok;
 820 static int getNextToken() {
 821   return CurTok = gettok();
 822 }
 823
 824 /// BinopPrecedence - This holds the precedence for each binary operator that is
 825 /// defined.
 826 static std::map&lt;char, int&gt; BinopPrecedence;
 827
 828 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
 829 static int GetTokPrecedence() {
 830   if (!isascii(CurTok))
 831     return -1;
 832
 833   // Make sure it's a declared binop.
 834   int TokPrec = BinopPrecedence[CurTok];
 835   if (TokPrec &lt;= 0) return -1;
 836   return TokPrec;
 837 }
 838
 839 /// Error* - These are little helper functions for error handling.
 840 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
 841 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
 842 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
 843
 844 static ExprAST *ParseExpression();
 845
 846 /// identifierexpr
 847 ///   ::= identifier
 848 ///   ::= identifier '(' expression* ')'
 849 static ExprAST *ParseIdentifierExpr() {
 850   std::string IdName = IdentifierStr;
 851
 852   getNextToken();  // eat identifier.
 853
 854   if (CurTok != '(') // Simple variable ref.
 855     return new VariableExprAST(IdName);
 856
 857   // Call.
 858   getNextToken();  // eat (
 859   std::vector&lt;ExprAST*&gt; Args;
 860   while (1) {
 861     ExprAST *Arg = ParseExpression();
 862     if (!Arg) return 0;
 863     Args.push_back(Arg);
 864
 865     if (CurTok == ')') break;
 866
 867     if (CurTok != ',')
 868       return Error("Expected ')'");
 869     getNextToken();
 870   }
 871
 872   // Eat the ')'.
 873   getNextToken();
 874
 875   return new CallExprAST(IdName, Args);
 876 }
 877
 878 /// numberexpr ::= number
 879 static ExprAST *ParseNumberExpr() {
 880   ExprAST *Result = new NumberExprAST(NumVal);
 881   getNextToken(); // consume the number
 882   return Result;
 883 }
 884
 885 /// parenexpr ::= '(' expression ')'
 886 static ExprAST *ParseParenExpr() {
 887   getNextToken();  // eat (.
 888   ExprAST *V = ParseExpression();
 889   if (!V) return 0;
 890
 891   if (CurTok != ')')
 892     return Error("expected ')'");
 893   getNextToken();  // eat ).
 894   return V;
 895 }
 896
 897 /// primary
 898 ///   ::= identifierexpr
 899 ///   ::= numberexpr
 900 ///   ::= parenexpr
 901 static ExprAST *ParsePrimary() {
 902   switch (CurTok) {
 903   default: return Error("unknown token when expecting an expression");
 904   case tok_identifier: return ParseIdentifierExpr();
 905   case tok_number:     return ParseNumberExpr();
 906   case '(':            return ParseParenExpr();
 907   }
 908 }
 909
 910 /// binoprhs
 911 ///   ::= ('+' primary)*
 912 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
 913   // If this is a binop, find its precedence.
 914   while (1) {
 915     int TokPrec = GetTokPrecedence();
 916
 917     // If this is a binop that binds at least as tightly as the current binop,
 918     // consume it, otherwise we are done.
 919     if (TokPrec &lt; ExprPrec)
 920       return LHS;
 921
 922     // Okay, we know this is a binop.
 923     int BinOp = CurTok;
 924     getNextToken();  // eat binop
 925
 926     // Parse the primary expression after the binary operator.
 927     ExprAST *RHS = ParsePrimary();
 928     if (!RHS) return 0;
 929
 930     // If BinOp binds less tightly with RHS than the operator after RHS, let
 931     // the pending operator take RHS as its LHS.
 932     int NextPrec = GetTokPrecedence();
 933     if (TokPrec &lt; NextPrec) {
 934       RHS = ParseBinOpRHS(TokPrec+1, RHS);
 935       if (RHS == 0) return 0;
 936     }
 937
 938     // Merge LHS/RHS.
 939     LHS = new BinaryExprAST(BinOp, LHS, RHS);
 940   }
 941 }
 942
 943 /// expression
 944 ///   ::= primary binoprhs
 945 ///
 946 static ExprAST *ParseExpression() {
 947   ExprAST *LHS = ParsePrimary();
 948   if (!LHS) return 0;
 949
 950   return ParseBinOpRHS(0, LHS);
 951 }
 952
 953 /// prototype
 954 ///   ::= id '(' id* ')'
 955 static PrototypeAST *ParsePrototype() {
 956   if (CurTok != tok_identifier)
 957     return ErrorP("Expected function name in prototype");
 958
 959   std::string FnName = IdentifierStr;
 960   getNextToken();
 961
 962   if (CurTok != '(')
 963     return ErrorP("Expected '(' in prototype");
 964
 965   std::vector&lt;std::string&gt; ArgNames;
 966   while (getNextToken() == tok_identifier)
 967     ArgNames.push_back(IdentifierStr);
 968   if (CurTok != ')')
 969     return ErrorP("Expected ')' in prototype");
 970
 971   // success.
 972   getNextToken();  // eat ')'.
 973
 974   return new PrototypeAST(FnName, ArgNames);
 975 }
 976
 977 /// definition ::= 'def' prototype expression
 978 static FunctionAST *ParseDefinition() {
 979   getNextToken();  // eat def.
 980   PrototypeAST *Proto = ParsePrototype();
 981   if (Proto == 0) return 0;
 982
 983   if (ExprAST *E = ParseExpression())
 984     return new FunctionAST(Proto, E);
 985   return 0;
 986 }
 987
 988 /// toplevelexpr ::= expression
 989 static FunctionAST *ParseTopLevelExpr() {
 990   if (ExprAST *E = ParseExpression()) {
 991     // Make an anonymous proto.
 992     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
 993     return new FunctionAST(Proto, E);
 994   }
 995   return 0;
 996 }
 997
 998 /// external ::= 'extern' prototype
 999 static PrototypeAST *ParseExtern() {
1000   getNextToken();  // eat extern.
1001   return ParsePrototype();
1002 }
1003
1004 //===----------------------------------------------------------------------===//
1005 // Code Generation
1006 //===----------------------------------------------------------------------===//
1007
1008 static Module *TheModule;
1009 static LLVMBuilder Builder;
1010 static std::map&lt;std::string, Value*&gt; NamedValues;
1011
1012 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1013
1014 Value *NumberExprAST::Codegen() {
1015   return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1016 }
1017
1018 Value *VariableExprAST::Codegen() {
1019   // Look this variable up in the function.
1020   Value *V = NamedValues[Name];
1021   return V ? V : ErrorV("Unknown variable name");
1022 }
1023
1024 Value *BinaryExprAST::Codegen() {
1025   Value *L = LHS-&gt;Codegen();
1026   Value *R = RHS-&gt;Codegen();
1027   if (L == 0 || R == 0) return 0;
1028
1029   switch (Op) {
1030   case '+': return Builder.CreateAdd(L, R, "addtmp");
1031   case '-': return Builder.CreateSub(L, R, "subtmp");
1032   case '*': return Builder.CreateMul(L, R, "multmp");
1033   case '&lt;':
1034     L = Builder.CreateFCmpULT(L, R, "multmp");
1035     // Convert bool 0/1 to double 0.0 or 1.0
1036     return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1037   default: return ErrorV("invalid binary operator");
1038   }
1039 }
1040
1041 Value *CallExprAST::Codegen() {
1042   // Look up the name in the global module table.
1043   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1044   if (CalleeF == 0)
1045     return ErrorV("Unknown function referenced");
1046
1047   // If argument mismatch error.
1048   if (CalleeF-&gt;arg_size() != Args.size())
1049     return ErrorV("Incorrect # arguments passed");
1050
1051   std::vector&lt;Value*&gt; ArgsV;
1052   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1053     ArgsV.push_back(Args[i]-&gt;Codegen());
1054     if (ArgsV.back() == 0) return 0;
1055   }
1056
1057   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1058 }
1059
1060 Function *PrototypeAST::Codegen() {
1061   // Make the function type:  double(double,double) etc.
1062   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::DoubleTy);
1063   FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1064
1065   Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1066
1067   // If F conflicted, there was already something named 'Name'.  If it has a
1068   // body, don't allow redefinition or reextern.
1069   if (F-&gt;getName() != Name) {
1070     // Delete the one we just made and get the existing one.
1071     F-&gt;eraseFromParent();
1072     F = TheModule-&gt;getFunction(Name);
1073
1074     // If F already has a body, reject this.
1075     if (!F-&gt;empty()) {
1076       ErrorF("redefinition of function");
1077       return 0;
1078     }
1079
1080     // If F took a different number of args, reject.
1081     if (F-&gt;arg_size() != Args.size()) {
1082       ErrorF("redefinition of function with different # args");
1083       return 0;
1084     }
1085   }
1086
1087   // Set names for all arguments.
1088   unsigned Idx = 0;
1089   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1090        ++AI, ++Idx) {
1091     AI-&gt;setName(Args[Idx]);
1092
1093     // Add arguments to variable symbol table.
1094     NamedValues[Args[Idx]] = AI;
1095   }
1096
1097   return F;
1098 }
1099
1100 Function *FunctionAST::Codegen() {
1101   NamedValues.clear();
1102
1103   Function *TheFunction = Proto-&gt;Codegen();
1104   if (TheFunction == 0)
1105     return 0;
1106
1107   // Create a new basic block to start insertion into.
1108   BasicBlock *BB = new BasicBlock("entry", TheFunction);
1109   Builder.SetInsertPoint(BB);
1110
1111   if (Value *RetVal = Body-&gt;Codegen()) {
1112     // Finish off the function.
1113     Builder.CreateRet(RetVal);
1114
1115     // Validate the generated code, checking for consistency.
1116     verifyFunction(*TheFunction);
1117     return TheFunction;
1118   }
1119
1120   // Error reading body, remove function.
1121   TheFunction-&gt;eraseFromParent();
1122   return 0;
1123 }
1124
1125 //===----------------------------------------------------------------------===//
1126 // Top-Level parsing and JIT Driver
1127 //===----------------------------------------------------------------------===//
1128
1129 static void HandleDefinition() {
1130   if (FunctionAST *F = ParseDefinition()) {
1131     if (Function *LF = F-&gt;Codegen()) {
1132       fprintf(stderr, "Read function definition:");
1133       LF-&gt;dump();
1134     }
1135   } else {
1136     // Skip token for error recovery.
1137     getNextToken();
1138   }
1139 }
1140
1141 static void HandleExtern() {
1142   if (PrototypeAST *P = ParseExtern()) {
1143     if (Function *F = P-&gt;Codegen()) {
1144       fprintf(stderr, "Read extern: ");
1145       F-&gt;dump();
1146     }
1147   } else {
1148     // Skip token for error recovery.
1149     getNextToken();
1150   }
1151 }
1152
1153 static void HandleTopLevelExpression() {
1154   // Evaluate a top level expression into an anonymous function.
1155   if (FunctionAST *F = ParseTopLevelExpr()) {
1156     if (Function *LF = F-&gt;Codegen()) {
1157       fprintf(stderr, "Read top-level expression:");
1158       LF-&gt;dump();
1159     }
1160   } else {
1161     // Skip token for error recovery.
1162     getNextToken();
1163   }
1164 }
1165
1166 /// top ::= definition | external | expression | ';'
1167 static void MainLoop() {
1168   while (1) {
1169     fprintf(stderr, "ready&gt; ");
1170     switch (CurTok) {
1171     case tok_eof:    return;
1172     case ';':        getNextToken(); break;  // ignore top level semicolons.
1173     case tok_def:    HandleDefinition(); break;
1174     case tok_extern: HandleExtern(); break;
1175     default:         HandleTopLevelExpression(); break;
1176     }
1177   }
1178 }
1179
1180
1181
1182 //===----------------------------------------------------------------------===//
1183 // "Library" functions that can be "extern'd" from user code.
1184 //===----------------------------------------------------------------------===//
1185
1186 /// putchard - putchar that takes a double and returns 0.
1187 extern "C"
1188 double putchard(double X) {
1189   putchar((char)X);
1190   return 0;
1191 }
1192
1193 //===----------------------------------------------------------------------===//
1194 // Main driver code.
1195 //===----------------------------------------------------------------------===//
1196
1197 int main() {
1198   TheModule = new Module("my cool jit");
1199
1200   // Install standard binary operators.
1201   // 1 is lowest precedence.
1202   BinopPrecedence['&lt;'] = 10;
1203   BinopPrecedence['+'] = 20;
1204   BinopPrecedence['-'] = 20;
1205   BinopPrecedence['*'] = 40;  // highest.
1206
1207   // Prime the first token.
1208   fprintf(stderr, "ready&gt; ");
1209   getNextToken();
1210
1211   MainLoop();
1212   TheModule-&gt;dump();
1213   return 0;
1214 }
1215 </pre>
1216 </div>
1217 </div>
1218
1219 <!-- *********************************************************************** -->
1220 <hr>
1221 <address>
1222   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1223   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1224   <a href="http://validator.w3.org/check/referer"><img
1225   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1226
1227   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1228   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1229   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
1230 </address>
1231 </body>
1232 </html>