1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Implementing code generation to LLVM IR</title>
7 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
8 <meta name="author" content="Chris Lattner">
9 <link rel="stylesheet" href="../llvm.css" type="text/css">
14 <div class="doc_title">Kaleidoscope: Code generation to LLVM IR</div>
19 <li><a href="#intro">Chapter 3 Introduction</a></li>
20 <li><a href="#basics">Code Generation setup</a></li>
21 <li><a href="#exprs">Expression Code Generation</a></li>
22 <li><a href="#funcs">Function Code Generation</a></li>
23 <li><a href="#driver">Driver Changes and Closing Thoughts</a></li>
24 <li><a href="#code">Full Code Listing</a></li>
29 <div class="doc_author">
30 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
33 <!-- *********************************************************************** -->
34 <div class="doc_section"><a name="intro">Chapter 3 Introduction</a></div>
35 <!-- *********************************************************************** -->
37 <div class="doc_text">
39 <p>Welcome to Chapter 3 of the "<a href="index.html">Implementing a language
40 with LLVM</a>" tutorial. This chapter shows you how to transform the <a
41 href="LangImpl2.html">Abstract Syntax Tree built in Chapter 2</a> into LLVM IR.
42 This will teach you a little bit about how LLVM does things, as well as
43 demonstrate how easy it is to use. It's much more work to build a lexer and
44 parser than it is to generate LLVM IR code.
49 <!-- *********************************************************************** -->
50 <div class="doc_section"><a name="basics">Code Generation setup</a></div>
51 <!-- *********************************************************************** -->
53 <div class="doc_text">
56 In order to generate LLVM IR, we want some simple setup to get started. First,
57 we define virtual codegen methods in each AST class:</p>
59 <div class="doc_code">
61 /// ExprAST - Base class for all expression nodes.
65 <b>virtual Value *Codegen() = 0;</b>
68 /// NumberExprAST - Expression class for numeric literals like "1.0".
69 class NumberExprAST : public ExprAST {
72 explicit NumberExprAST(double val) : Val(val) {}
73 <b>virtual Value *Codegen();</b>
79 <p>The Codegen() method says to emit IR for that AST node and all things it
80 depends on, and they all return an LLVM Value object.
81 "Value" is the class used to represent a "<a
82 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
83 Assignment (SSA)</a> register" or "SSA value" in LLVM. The most distinct aspect
84 of SSA values is that their value is computed as the related instruction
85 executes, and it does not get a new value until (and if) the instruction
86 re-executes. In order words, there is no way to "change" an SSA value. For
87 more information, please read up on <a
88 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">Static Single
89 Assignment</a> - the concepts are really quite natural once you grok them.</p>
91 <p>Note that instead of adding virtual methods to the ExprAST class hierarchy,
92 it could also make sense to use a visitor pattern or some other way to model
93 this. Again, this tutorial won't dwell on good software engineering practices:
94 for our purposes, adding a virtual method is simplest.</p>
97 second thing we want is an "Error" method like we used for parser, which will
98 be used to report errors found during code generation (for example, use of an
99 undeclared parameter):</p>
101 <div class="doc_code">
103 Value *ErrorV(const char *Str) { Error(Str); return 0; }
105 static Module *TheModule;
106 static LLVMBuilder Builder;
107 static std::map<std::string, Value*> NamedValues;
111 <p>The static variables will be used during code generation. <tt>TheModule</tt>
112 is the LLVM construct that contains all of the functions and global variables in
113 a chunk of code. In many ways, it is the top-level structure that the LLVM IR
114 uses to contain code.</p>
116 <p>The <tt>Builder</tt> object is a helper object that makes it easy to generate
117 LLVM instructions. Instances of the <a
118 href="http://llvm.org/doxygen/LLVMBuilder_8h-source.html"><tt>LLVMBuilder</tt>
119 class</a> keeps track of the current place to
120 insert instructions and has methods to create new instructions.</p>
122 <p>The <tt>NamedValues</tt> map keeps track of which values are defined in the
123 current scope and what their LLVM representation is. In this form of
124 Kaleidoscope, the only things that can be referenced are function parameters.
125 As such, function parameters will be in this map when generating code for their
129 With these basics in place, we can start talking about how to generate code for
130 each expression. Note that this assumes that the <tt>Builder</tt> has been set
131 up to generate code <em>into</em> something. For now, we'll assume that this
132 has already been done, and we'll just use it to emit code.
137 <!-- *********************************************************************** -->
138 <div class="doc_section"><a name="exprs">Expression Code Generation</a></div>
139 <!-- *********************************************************************** -->
141 <div class="doc_text">
143 <p>Generating LLVM code for expression nodes is very straight-forward: less
144 than 45 lines of commented code for all four of our expression nodes. First,
145 we'll do numeric literals:</p>
147 <div class="doc_code">
149 Value *NumberExprAST::Codegen() {
150 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
155 <p>In the LLVM IR, numeric constants are represented with the
156 <tt>ConstantFP</tt> class, which holds the numeric value in an <tt>APFloat</tt>
157 internally (<tt>APFloat</tt> has the capability of holding floating point
158 constants of <em>A</em>rbitrary <em>P</em>recision). This code basically just
159 creates and returns a <tt>ConstantFP</tt>. Note that in the LLVM IR
160 that constants are all uniqued together and shared. For this reason, the API
161 uses "the foo::get(..)" idiom instead of "new foo(..)" or "foo::create(..).</p>
163 <div class="doc_code">
165 Value *VariableExprAST::Codegen() {
166 // Look this variable up in the function.
167 Value *V = NamedValues[Name];
168 return V ? V : ErrorV("Unknown variable name");
173 <p>References to variables is also quite simple here. In the simple version
174 of Kaleidoscope, we assume that the variable has already been emited somewhere
175 and its value is available. In practice, the only values that can be in the
176 <tt>NamedValues</tt> map are function arguments. This
177 code simply checks to see that the specified name is in the map (if not, an
178 unknown variable is being referenced) and returns the value for it.</p>
180 <div class="doc_code">
182 Value *BinaryExprAST::Codegen() {
183 Value *L = LHS->Codegen();
184 Value *R = RHS->Codegen();
185 if (L == 0 || R == 0) return 0;
188 case '+': return Builder.CreateAdd(L, R, "addtmp");
189 case '-': return Builder.CreateSub(L, R, "subtmp");
190 case '*': return Builder.CreateMul(L, R, "multmp");
192 L = Builder.CreateFCmpULT(L, R, "multmp");
193 // Convert bool 0/1 to double 0.0 or 1.0
194 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
195 default: return ErrorV("invalid binary operator");
201 <p>Binary operators start to get more interesting. The basic idea here is that
202 we recursively emit code for the left-hand side of the expression, then the
203 right-hand side, then we compute the result of the binary expression. In this
204 code, we do a simple switch on the opcode to create the right LLVM instruction.
207 <p>In this example, the LLVM builder class is starting to show its value.
208 Because it knows where to insert the newly created instruction, you just have to
209 specificy what instruction to create (e.g. with <tt>CreateAdd</tt>), which
210 operands to use (<tt>L</tt> and <tt>R</tt> here) and optionally provide a name
211 for the generated instruction. One nice thing about LLVM is that the name is
212 just a hint: if there are multiple additions in a single function, the first
213 will be named "addtmp" and the second will be "autorenamed" by adding a suffix,
214 giving it a name like "addtmp42". Local value names for instructions are purely
215 optional, but it makes it much easier to read the IR dumps.</p>
217 <p><a href="../LangRef.html#instref">LLVM instructions</a> are constrained to
218 have very strict type properties: for example, the Left and Right operators of
219 an <a href="../LangRef.html#i_add">add instruction</a> have to have the same
220 type, and that the result of the add matches the operands. Because all values
221 in Kaleidoscope are doubles, this makes for very simple code for add, sub and
224 <p>On the other hand, LLVM specifies that the <a
225 href="../LangRef.html#i_fcmp">fcmp instruction</a> always returns an 'i1' value
226 (a one bit integer). However, Kaleidoscope wants the value to be a 0.0 or 1.0
227 value. In order to get these semantics, we combine the fcmp instruction with
228 a <a href="../LangRef.html#i_uitofp">uitofp instruction</a>. This instruction
229 converts its input integer into a floating point value by treating the input
230 as an unsigned value. In contrast, if we used the <a
231 href="../LangRef.html#i_sitofp">sitofp instruction</a>, the Kaleidoscope '<'
232 operator would return 0.0 and -1.0, depending on the input value.</p>
234 <div class="doc_code">
236 Value *CallExprAST::Codegen() {
237 // Look up the name in the global module table.
238 Function *CalleeF = TheModule->getFunction(Callee);
240 return ErrorV("Unknown function referenced");
242 // If argument mismatch error.
243 if (CalleeF->arg_size() != Args.size())
244 return ErrorV("Incorrect # arguments passed");
246 std::vector<Value*> ArgsV;
247 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
248 ArgsV.push_back(Args[i]->Codegen());
249 if (ArgsV.back() == 0) return 0;
252 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
257 <p>Code generation for function calls is quite straight-forward with LLVM. The
258 code above first looks the name of the function up in the LLVM Module's symbol
259 table. Recall that the LLVM Module is the container that holds all of the
260 functions we are JIT'ing. By giving each function the same name as what the
261 user specifies, we can use the LLVM symbol table to resolve function names for
264 <p>Once we have the function to call, we recursively codegen each argument that
265 is to be passed in, and create an LLVM <a href="../LangRef.html#i_call">call
266 instruction</a>. Note that LLVM uses the native C calling conventions by
267 default, allowing these calls to call into standard library functions like
268 "sin" and "cos" with no additional effort.</p>
270 <p>This wraps up our handling of the four basic expressions that we have so far
271 in Kaleidoscope. Feel free to go in and add some more. For example, by
272 browsing the <a href="../LangRef.html">LLVM language reference</a> you'll find
273 several other interesting instructions that are really easy to plug into our
278 <!-- *********************************************************************** -->
279 <div class="doc_section"><a name="funcs">Function Code Generation</a></div>
280 <!-- *********************************************************************** -->
282 <div class="doc_text">
284 <p>Code generation for prototypes and functions has to handle a number of
285 details, which make their code less beautiful and elegant than expression code
286 generation, but they illustrate some important points. First, lets talk about
287 code generation for prototypes: this is used both for function bodies as well
288 as external function declarations. The code starts with:</p>
290 <div class="doc_code">
292 Function *PrototypeAST::Codegen() {
293 // Make the function type: double(double,double) etc.
294 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
295 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
297 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
301 <p>This code packs a lot of power into a few lines. Note first that this
302 function returns a Function* instead of a Value*. Because a "prototype" really
303 talks about the external interface for a function (not the value computed by
304 an expression), it makes sense for it to return the LLVM Function it corresponds
305 to when codegen'd.</p>
307 <p>The next step is to create
308 the <tt>FunctionType</tt> that should be used for a given Prototype. Since all
309 function arguments in Kaleidoscope are of type double, the first line creates
310 a vector of "N" LLVM Double types. It then uses the <tt>FunctionType::get</tt>
311 method to create a function type that takes "N" doubles as arguments, returns
312 one double as a result, and that is not vararg (the false parameter indicates
313 this). Note that Types in LLVM are uniqued just like Constants are, so you
314 don't "new" a type, you "get" it.</p>
316 <p>The final line above actually creates the function that the prototype will
317 correspond to. This indicates which type, linkage, and name to use, and which
318 module to insert into. "<a href="LangRef.html#linkage">external linkage</a>"
319 means that the function may be defined outside the current module and/or that it
320 is callable by functions outside the module. The Name passed in is the name the
321 user specified: since "<tt>TheModule</tt>" is specified, this name is registered
322 in "<tt>TheModule</tt>"s symbol table, which is used by the function call code
325 <div class="doc_code">
327 // If F conflicted, there was already something named 'Name'. If it has a
328 // body, don't allow redefinition or reextern.
329 if (F->getName() != Name) {
330 // Delete the one we just made and get the existing one.
331 F->eraseFromParent();
332 F = TheModule->getFunction(Name);
336 <p>The Module symbol table works just like the Function symbol table when it
337 comes to name conflicts: if a new function is created with a name was previously
338 added to the symbol table, it will get implicitly renamed when added to the
339 Module. The code above exploits this fact to tell if there was a previous
340 definition of this function.</p>
342 <p>In Kaleidoscope, I choose to allow redefinitions of functions in two cases:
343 first, we want to allow 'extern'ing a function more than once, so long as the
344 prototypes for the externs match (since all arguments have the same type, we
345 just have to check that the number of arguments match). Second, we want to
346 allow 'extern'ing a function and then definining a body for it. This is useful
347 when defining mutually recursive functions.</p>
349 <p>In order to implement this, the code above first checks to see if there is
350 a collision on the name of the function. If so, it deletes the function we just
351 created (by calling <tt>eraseFromParent</tt>) and then calling
352 <tt>getFunction</tt> to get the existing function with the specified name. Note
353 that many APIs in LLVM have "erase" forms and "remove" forms. The "remove" form
354 unlinks the object from its parent (e.g. a Function from a Module) and returns
355 it. The "erase" form unlinks the object and then deletes it.</p>
357 <div class="doc_code">
359 // If F already has a body, reject this.
360 if (!F->empty()) {
361 ErrorF("redefinition of function");
365 // If F took a different number of args, reject.
366 if (F->arg_size() != Args.size()) {
367 ErrorF("redefinition of function with different # args");
374 <p>In order to verify the logic above, we first check to see if the preexisting
375 function is "empty". In this case, empty means that it has no basic blocks in
376 it, which means it has no body. If it has no body, this means its a forward
377 declaration. Since we don't allow anything after a full definition of the
378 function, the code rejects this case. If the previous reference to a function
379 was an 'extern', we simply verify that the number of arguments for that
380 definition and this one match up. If not, we emit an error.</p>
382 <div class="doc_code">
384 // Set names for all arguments.
386 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
388 AI->setName(Args[Idx]);
390 // Add arguments to variable symbol table.
391 NamedValues[Args[Idx]] = AI;
398 <p>The last bit of code for prototypes loops over all of the arguments in the
399 function, setting the name of the LLVM Argument objects to match and registering
400 the arguments in the <tt>NamedValues</tt> map for future use by the
401 <tt>VariableExprAST</tt> AST node. Once this is set up, it returns the Function
402 object to the caller. Note that we don't check for conflicting
403 argument names here (e.g. "extern foo(a b a)"). Doing so would be very
404 straight-forward.</p>
406 <div class="doc_code">
408 Function *FunctionAST::Codegen() {
411 Function *TheFunction = Proto->Codegen();
412 if (TheFunction == 0)
417 <p>Code generation for function definitions starts out simply enough: first we
418 codegen the prototype and verify that it is ok. We also clear out the
419 <tt>NamedValues</tt> map to make sure that there isn't anything in it from the
420 last function we compiled.</p>
422 <div class="doc_code">
424 // Create a new basic block to start insertion into.
425 BasicBlock *BB = new BasicBlock("entry", TheFunction);
426 Builder.SetInsertPoint(BB);
428 if (Value *RetVal = Body->Codegen()) {
432 <p>Now we get to the point where the <tt>Builder</tt> is set up. The first
433 line creates a new <a href="http://en.wikipedia.org/wiki/Basic_block">basic
434 block</a> (named "entry"), which is inserted into <tt>TheFunction</tt>. The
435 second line then tells the builder that new instructions should be inserted into
436 the end of the new basic block. Basic blocks in LLVM are an important part
437 of functions that define the <a
438 href="http://en.wikipedia.org/wiki/Control_flow_graph">Control Flow Graph</a>.
439 Since we don't have any control flow, our functions will only contain one
440 block so far. We'll fix this in a future installment :).</p>
442 <div class="doc_code">
444 if (Value *RetVal = Body->Codegen()) {
445 // Finish off the function.
446 Builder.CreateRet(RetVal);
448 // Validate the generated code, checking for consistency.
449 verifyFunction(*TheFunction);
455 <p>Once the insertion point is set up, we call the <tt>CodeGen()</tt> method for
456 the root expression of the function. If no error happens, this emits code to
457 compute the expression into the entry block and returns the value that was
458 computed. Assuming no error, we then create an LLVM <a
459 href="../LangRef.html#i_ret">ret instruction</a>, which completes the function.
460 Once the function is built, we call the <tt>verifyFunction</tt> function, which
461 is provided by LLVM. This function does a variety of consistency checks on the
462 generated code, to determine if our compiler is doing everything right. Using
463 this is important: it can catch a lot of bugs. Once the function is finished
464 and validated, we return it.</p>
466 <div class="doc_code">
468 // Error reading body, remove function.
469 TheFunction->eraseFromParent();
475 <p>The only piece left here is handling of the error case. For simplicity, we
476 simply handle this by deleting the function we produced with the
477 <tt>eraseFromParent</tt> method. This allows the user to redefine a function
478 that they incorrectly typed in before: if we didn't delete it, it would live in
479 the symbol table, with a body, preventing future redefinition.</p>
481 <p>This code does have a bug though. Since the <tt>PrototypeAST::Codegen</tt>
482 can return a previously defined forward declaration, this can actually delete
483 a forward declaration. There are a number of ways to fix this bug, see what you
484 can come up with! Here is a testcase:</p>
486 <div class="doc_code">
488 extern foo(a b); # ok, defines foo.
489 def foo(a b) c; # error, 'c' is invalid.
490 def bar() foo(1, 2); # error, unknown function "foo"
496 <!-- *********************************************************************** -->
497 <div class="doc_section"><a name="driver">Driver Changes and
498 Closing Thoughts</a></div>
499 <!-- *********************************************************************** -->
501 <div class="doc_text">
504 For now, code generation to LLVM doesn't really get us much, except that we can
505 look at the pretty IR calls. The sample code inserts calls to Codegen into the
506 "<tt>HandleDefinition</tt>", "<tt>HandleExtern</tt>" etc functions, and then
507 dumps out the LLVM IR. This gives a nice way to look at the LLVM IR for simple
508 functions. For example:
511 <div class="doc_code">
514 ready> Read top-level expression:
515 define double @""() {
517 %addtmp = add double 4.000000e+00, 5.000000e+00
523 <p>Note how the parser turns the top-level expression into anonymous functions
524 for us. This will be handy when we add JIT support in the next chapter. Also
525 note that the code is very literally transcribed, no optimizations are being
526 performed. We will add optimizations explicitly in the next chapter.</p>
528 <div class="doc_code">
530 ready> <b>def foo(a b) a*a + 2*a*b + b*b;</b>
531 ready> Read function definition:
532 define double @foo(double %a, double %b) {
534 %multmp = mul double %a, %a
535 %multmp1 = mul double 2.000000e+00, %a
536 %multmp2 = mul double %multmp1, %b
537 %addtmp = add double %multmp, %multmp2
538 %multmp3 = mul double %b, %b
539 %addtmp4 = add double %addtmp, %multmp3
545 <p>This shows some simple arithmetic. Notice the striking similarity to the
546 LLVM builder calls that we use to create the instructions.</p>
548 <div class="doc_code">
550 ready> <b>def bar(a) foo(a, 4.0) + bar(31337);</b>
551 ready> Read function definition:
552 define double @bar(double %a) {
554 %calltmp = call double @foo( double %a, double 4.000000e+00 )
555 %calltmp1 = call double @bar( double 3.133700e+04 )
556 %addtmp = add double %calltmp, %calltmp1
562 <p>This shows some function calls. Note that this function will take a long
563 time to execute if you call it. In the future we'll add conditional control
564 flow to make recursion actually be useful :).</p>
566 <div class="doc_code">
568 ready> <b>extern cos(x);</b>
569 ready> Read extern:
570 declare double @cos(double)
572 ready> <b>cos(1.234);</b>
573 ready> Read top-level expression:
574 define double @""() {
576 %calltmp = call double @cos( double 1.234000e+00 )
582 <p>This shows an extern for the libm "cos" function, and a call to it.</p>
585 <div class="doc_code">
588 ; ModuleID = 'my cool jit'
590 define double @""() {
592 %addtmp = add double 4.000000e+00, 5.000000e+00
596 define double @foo(double %a, double %b) {
598 %multmp = mul double %a, %a
599 %multmp1 = mul double 2.000000e+00, %a
600 %multmp2 = mul double %multmp1, %b
601 %addtmp = add double %multmp, %multmp2
602 %multmp3 = mul double %b, %b
603 %addtmp4 = add double %addtmp, %multmp3
607 define double @bar(double %a) {
609 %calltmp = call double @foo( double %a, double 4.000000e+00 )
610 %calltmp1 = call double @bar( double 3.133700e+04 )
611 %addtmp = add double %calltmp, %calltmp1
615 declare double @cos(double)
617 define double @""() {
619 %calltmp = call double @cos( double 1.234000e+00 )
625 <p>When you quit the current demo, it dumps out the IR for the entire module
626 generated. Here you can see the big picture with all the functions referencing
629 <p>This wraps up this chapter of the Kaleidoscope tutorial. Up next we'll
630 describe how to <a href="LangImpl4.html">add JIT codegen and optimizer
631 support</a> to this so we can actually start running code!</p>
636 <!-- *********************************************************************** -->
637 <div class="doc_section"><a name="code">Full Code Listing</a></div>
638 <!-- *********************************************************************** -->
640 <div class="doc_text">
643 Here is the complete code listing for our running example, enhanced with the
644 LLVM code generator. Because this uses the LLVM libraries, we need to link
645 them in. To do this, we use the <a
646 href="http://llvm.org/cmds/llvm-config.html">llvm-config</a> tool to inform
647 our makefile/command line about which options to use:</p>
649 <div class="doc_code">
652 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core` -o toy
658 <p>Here is the code:</p>
660 <div class="doc_code">
663 // See example below.
665 #include "llvm/DerivedTypes.h"
666 #include "llvm/Module.h"
667 #include "llvm/Analysis/Verifier.h"
668 #include "llvm/Support/LLVMBuilder.h"
669 #include <cstdio>
670 #include <string>
672 #include <vector>
673 using namespace llvm;
675 //===----------------------------------------------------------------------===//
677 //===----------------------------------------------------------------------===//
679 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
680 // of these for known things.
685 tok_def = -2, tok_extern = -3,
688 tok_identifier = -4, tok_number = -5,
691 static std::string IdentifierStr; // Filled in if tok_identifier
692 static double NumVal; // Filled in if tok_number
694 /// gettok - Return the next token from standard input.
695 static int gettok() {
696 static int LastChar = ' ';
698 // Skip any whitespace.
699 while (isspace(LastChar))
700 LastChar = getchar();
702 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
703 IdentifierStr = LastChar;
704 while (isalnum((LastChar = getchar())))
705 IdentifierStr += LastChar;
707 if (IdentifierStr == "def") return tok_def;
708 if (IdentifierStr == "extern") return tok_extern;
709 return tok_identifier;
712 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
716 LastChar = getchar();
717 } while (isdigit(LastChar) || LastChar == '.');
719 NumVal = strtod(NumStr.c_str(), 0);
723 if (LastChar == '#') {
724 // Comment until end of line.
725 do LastChar = getchar();
726 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
732 // Check for end of file. Don't eat the EOF.
736 // Otherwise, just return the character as its ascii value.
737 int ThisChar = LastChar;
738 LastChar = getchar();
742 //===----------------------------------------------------------------------===//
743 // Abstract Syntax Tree (aka Parse Tree)
744 //===----------------------------------------------------------------------===//
746 /// ExprAST - Base class for all expression nodes.
749 virtual ~ExprAST() {}
750 virtual Value *Codegen() = 0;
753 /// NumberExprAST - Expression class for numeric literals like "1.0".
754 class NumberExprAST : public ExprAST {
757 explicit NumberExprAST(double val) : Val(val) {}
758 virtual Value *Codegen();
761 /// VariableExprAST - Expression class for referencing a variable, like "a".
762 class VariableExprAST : public ExprAST {
765 explicit VariableExprAST(const std::string &name) : Name(name) {}
766 virtual Value *Codegen();
769 /// BinaryExprAST - Expression class for a binary operator.
770 class BinaryExprAST : public ExprAST {
774 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
775 : Op(op), LHS(lhs), RHS(rhs) {}
776 virtual Value *Codegen();
779 /// CallExprAST - Expression class for function calls.
780 class CallExprAST : public ExprAST {
782 std::vector<ExprAST*> Args;
784 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
785 : Callee(callee), Args(args) {}
786 virtual Value *Codegen();
789 /// PrototypeAST - This class represents the "prototype" for a function,
790 /// which captures its argument names as well as if it is an operator.
793 std::vector<std::string> Args;
795 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
796 : Name(name), Args(args) {}
801 /// FunctionAST - This class represents a function definition itself.
806 FunctionAST(PrototypeAST *proto, ExprAST *body)
807 : Proto(proto), Body(body) {}
812 //===----------------------------------------------------------------------===//
814 //===----------------------------------------------------------------------===//
816 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
817 /// token the parser it looking at. getNextToken reads another token from the
818 /// lexer and updates CurTok with its results.
820 static int getNextToken() {
821 return CurTok = gettok();
824 /// BinopPrecedence - This holds the precedence for each binary operator that is
826 static std::map<char, int> BinopPrecedence;
828 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
829 static int GetTokPrecedence() {
830 if (!isascii(CurTok))
833 // Make sure it's a declared binop.
834 int TokPrec = BinopPrecedence[CurTok];
835 if (TokPrec <= 0) return -1;
839 /// Error* - These are little helper functions for error handling.
840 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
841 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
842 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
844 static ExprAST *ParseExpression();
848 /// ::= identifier '(' expression* ')'
849 static ExprAST *ParseIdentifierExpr() {
850 std::string IdName = IdentifierStr;
852 getNextToken(); // eat identifier.
854 if (CurTok != '(') // Simple variable ref.
855 return new VariableExprAST(IdName);
858 getNextToken(); // eat (
859 std::vector<ExprAST*> Args;
861 ExprAST *Arg = ParseExpression();
865 if (CurTok == ')') break;
868 return Error("Expected ')'");
875 return new CallExprAST(IdName, Args);
878 /// numberexpr ::= number
879 static ExprAST *ParseNumberExpr() {
880 ExprAST *Result = new NumberExprAST(NumVal);
881 getNextToken(); // consume the number
885 /// parenexpr ::= '(' expression ')'
886 static ExprAST *ParseParenExpr() {
887 getNextToken(); // eat (.
888 ExprAST *V = ParseExpression();
892 return Error("expected ')'");
893 getNextToken(); // eat ).
898 /// ::= identifierexpr
901 static ExprAST *ParsePrimary() {
903 default: return Error("unknown token when expecting an expression");
904 case tok_identifier: return ParseIdentifierExpr();
905 case tok_number: return ParseNumberExpr();
906 case '(': return ParseParenExpr();
911 /// ::= ('+' primary)*
912 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
913 // If this is a binop, find its precedence.
915 int TokPrec = GetTokPrecedence();
917 // If this is a binop that binds at least as tightly as the current binop,
918 // consume it, otherwise we are done.
919 if (TokPrec < ExprPrec)
922 // Okay, we know this is a binop.
924 getNextToken(); // eat binop
926 // Parse the primary expression after the binary operator.
927 ExprAST *RHS = ParsePrimary();
930 // If BinOp binds less tightly with RHS than the operator after RHS, let
931 // the pending operator take RHS as its LHS.
932 int NextPrec = GetTokPrecedence();
933 if (TokPrec < NextPrec) {
934 RHS = ParseBinOpRHS(TokPrec+1, RHS);
935 if (RHS == 0) return 0;
939 LHS = new BinaryExprAST(BinOp, LHS, RHS);
944 /// ::= primary binoprhs
946 static ExprAST *ParseExpression() {
947 ExprAST *LHS = ParsePrimary();
950 return ParseBinOpRHS(0, LHS);
954 /// ::= id '(' id* ')'
955 static PrototypeAST *ParsePrototype() {
956 if (CurTok != tok_identifier)
957 return ErrorP("Expected function name in prototype");
959 std::string FnName = IdentifierStr;
963 return ErrorP("Expected '(' in prototype");
965 std::vector<std::string> ArgNames;
966 while (getNextToken() == tok_identifier)
967 ArgNames.push_back(IdentifierStr);
969 return ErrorP("Expected ')' in prototype");
972 getNextToken(); // eat ')'.
974 return new PrototypeAST(FnName, ArgNames);
977 /// definition ::= 'def' prototype expression
978 static FunctionAST *ParseDefinition() {
979 getNextToken(); // eat def.
980 PrototypeAST *Proto = ParsePrototype();
981 if (Proto == 0) return 0;
983 if (ExprAST *E = ParseExpression())
984 return new FunctionAST(Proto, E);
988 /// toplevelexpr ::= expression
989 static FunctionAST *ParseTopLevelExpr() {
990 if (ExprAST *E = ParseExpression()) {
991 // Make an anonymous proto.
992 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
993 return new FunctionAST(Proto, E);
998 /// external ::= 'extern' prototype
999 static PrototypeAST *ParseExtern() {
1000 getNextToken(); // eat extern.
1001 return ParsePrototype();
1004 //===----------------------------------------------------------------------===//
1006 //===----------------------------------------------------------------------===//
1008 static Module *TheModule;
1009 static LLVMBuilder Builder;
1010 static std::map<std::string, Value*> NamedValues;
1012 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1014 Value *NumberExprAST::Codegen() {
1015 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1018 Value *VariableExprAST::Codegen() {
1019 // Look this variable up in the function.
1020 Value *V = NamedValues[Name];
1021 return V ? V : ErrorV("Unknown variable name");
1024 Value *BinaryExprAST::Codegen() {
1025 Value *L = LHS->Codegen();
1026 Value *R = RHS->Codegen();
1027 if (L == 0 || R == 0) return 0;
1030 case '+': return Builder.CreateAdd(L, R, "addtmp");
1031 case '-': return Builder.CreateSub(L, R, "subtmp");
1032 case '*': return Builder.CreateMul(L, R, "multmp");
1034 L = Builder.CreateFCmpULT(L, R, "multmp");
1035 // Convert bool 0/1 to double 0.0 or 1.0
1036 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1037 default: return ErrorV("invalid binary operator");
1041 Value *CallExprAST::Codegen() {
1042 // Look up the name in the global module table.
1043 Function *CalleeF = TheModule->getFunction(Callee);
1045 return ErrorV("Unknown function referenced");
1047 // If argument mismatch error.
1048 if (CalleeF->arg_size() != Args.size())
1049 return ErrorV("Incorrect # arguments passed");
1051 std::vector<Value*> ArgsV;
1052 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1053 ArgsV.push_back(Args[i]->Codegen());
1054 if (ArgsV.back() == 0) return 0;
1057 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1060 Function *PrototypeAST::Codegen() {
1061 // Make the function type: double(double,double) etc.
1062 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
1063 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1065 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1067 // If F conflicted, there was already something named 'Name'. If it has a
1068 // body, don't allow redefinition or reextern.
1069 if (F->getName() != Name) {
1070 // Delete the one we just made and get the existing one.
1071 F->eraseFromParent();
1072 F = TheModule->getFunction(Name);
1074 // If F already has a body, reject this.
1075 if (!F->empty()) {
1076 ErrorF("redefinition of function");
1080 // If F took a different number of args, reject.
1081 if (F->arg_size() != Args.size()) {
1082 ErrorF("redefinition of function with different # args");
1087 // Set names for all arguments.
1089 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1091 AI->setName(Args[Idx]);
1093 // Add arguments to variable symbol table.
1094 NamedValues[Args[Idx]] = AI;
1100 Function *FunctionAST::Codegen() {
1101 NamedValues.clear();
1103 Function *TheFunction = Proto->Codegen();
1104 if (TheFunction == 0)
1107 // Create a new basic block to start insertion into.
1108 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1109 Builder.SetInsertPoint(BB);
1111 if (Value *RetVal = Body->Codegen()) {
1112 // Finish off the function.
1113 Builder.CreateRet(RetVal);
1115 // Validate the generated code, checking for consistency.
1116 verifyFunction(*TheFunction);
1120 // Error reading body, remove function.
1121 TheFunction->eraseFromParent();
1125 //===----------------------------------------------------------------------===//
1126 // Top-Level parsing and JIT Driver
1127 //===----------------------------------------------------------------------===//
1129 static void HandleDefinition() {
1130 if (FunctionAST *F = ParseDefinition()) {
1131 if (Function *LF = F->Codegen()) {
1132 fprintf(stderr, "Read function definition:");
1136 // Skip token for error recovery.
1141 static void HandleExtern() {
1142 if (PrototypeAST *P = ParseExtern()) {
1143 if (Function *F = P->Codegen()) {
1144 fprintf(stderr, "Read extern: ");
1148 // Skip token for error recovery.
1153 static void HandleTopLevelExpression() {
1154 // Evaluate a top level expression into an anonymous function.
1155 if (FunctionAST *F = ParseTopLevelExpr()) {
1156 if (Function *LF = F->Codegen()) {
1157 fprintf(stderr, "Read top-level expression:");
1161 // Skip token for error recovery.
1166 /// top ::= definition | external | expression | ';'
1167 static void MainLoop() {
1169 fprintf(stderr, "ready> ");
1171 case tok_eof: return;
1172 case ';': getNextToken(); break; // ignore top level semicolons.
1173 case tok_def: HandleDefinition(); break;
1174 case tok_extern: HandleExtern(); break;
1175 default: HandleTopLevelExpression(); break;
1182 //===----------------------------------------------------------------------===//
1183 // "Library" functions that can be "extern'd" from user code.
1184 //===----------------------------------------------------------------------===//
1186 /// putchard - putchar that takes a double and returns 0.
1188 double putchard(double X) {
1193 //===----------------------------------------------------------------------===//
1194 // Main driver code.
1195 //===----------------------------------------------------------------------===//
1198 TheModule = new Module("my cool jit");
1200 // Install standard binary operators.
1201 // 1 is lowest precedence.
1202 BinopPrecedence['<'] = 10;
1203 BinopPrecedence['+'] = 20;
1204 BinopPrecedence['-'] = 20;
1205 BinopPrecedence['*'] = 40; // highest.
1207 // Prime the first token.
1208 fprintf(stderr, "ready> ");
1212 TheModule->dump();
1219 <!-- *********************************************************************** -->
1222 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1223 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1224 <a href="http://validator.w3.org/check/referer"><img
1225 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
1227 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1228 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1229 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $