1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
20 <li><a href="#intro">Chapter 7 Introduction</a></li>
21 <li><a href="#why">Why is this a hard problem?</a></li>
22 <li><a href="#memory">Memory in LLVM</a></li>
23 <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
24 <li><a href="#adjustments">Adjusting Existing Variables for
26 <li><a href="#assignment">New Assignment Operator</a></li>
27 <li><a href="#localvars">User-defined Local Variables</a></li>
28 <li><a href="#code">Full Code Listing</a></li>
33 <div class="doc_author">
34 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
37 <!-- *********************************************************************** -->
38 <div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
39 <!-- *********************************************************************** -->
41 <div class="doc_text">
43 <p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
44 with LLVM</a>" tutorial. In chapters 1 through 6, we've built a very
45 respectable, albeit simple, <a
46 href="http://en.wikipedia.org/wiki/Functional_programming">functional
47 programming language</a>. In our journey, we learned some parsing techniques,
48 how to build and represent an AST, how to build LLVM IR, and how to optimize
49 the resultant code and JIT compile it.</p>
51 <p>While Kaleidoscope is interesting as a functional language, this makes it
52 "too easy" to generate LLVM IR for it. In particular, a functional language
53 makes it very easy to build LLVM IR directly in <a
54 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
55 Since LLVM requires that the input code be in SSA form, this is a very nice
56 property and it is often unclear to newcomers how to generate code for an
57 imperative language with mutable variables.</p>
59 <p>The short (and happy) summary of this chapter is that there is no need for
60 your front-end to build SSA form: LLVM provides highly tuned and well tested
61 support for this, though the way it works is a bit unexpected for some.</p>
65 <!-- *********************************************************************** -->
66 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
67 <!-- *********************************************************************** -->
69 <div class="doc_text">
72 To understand why mutable variables cause complexities in SSA construction,
73 consider this extremely simple C example:
76 <div class="doc_code">
79 int test(_Bool Condition) {
90 <p>In this case, we have the variable "X", whose value depends on the path
91 executed in the program. Because there are two different possible values for X
92 before the return instruction, a PHI node is inserted to merge the two values.
93 The LLVM IR that we want for this example looks like this:</p>
95 <div class="doc_code">
97 @G = weak global i32 0 ; type of @G is i32*
98 @H = weak global i32 0 ; type of @H is i32*
100 define i32 @test(i1 %Condition) {
102 br i1 %Condition, label %cond_true, label %cond_false
113 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
119 <p>In this example, the loads from the G and H global variables are explicit in
120 the LLVM IR, and they live in the then/else branches of the if statement
121 (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
122 in the cond_next block selects the right value to use based on where control
123 flow is coming from: if control flow comes from the cond_false block, X.2 gets
124 the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
125 the value of X.0. The intent of this chapter is not to explain the details of
126 SSA form. For more information, see one of the many <a
127 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
130 <p>The question for this article is "who places phi nodes when lowering
131 assignments to mutable variables?". The issue here is that LLVM
132 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
133 However, SSA construction requires non-trivial algorithms and data structures,
134 so it is inconvenient and wasteful for every front-end to have to reproduce this
139 <!-- *********************************************************************** -->
140 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
141 <!-- *********************************************************************** -->
143 <div class="doc_text">
145 <p>The 'trick' here is that while LLVM does require all register values to be
146 in SSA form, it does not require (or permit) memory objects to be in SSA form.
147 In the example above, note that the loads from G and H are direct accesses to
148 G and H: they are not renamed or versioned. This differs from some other
149 compiler systems, which do try to version memory objects. In LLVM, instead of
150 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
151 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
155 With this in mind, the high-level idea is that we want to make a stack variable
156 (which lives in memory, because it is on the stack) for each mutable object in
157 a function. To take advantage of this trick, we need to talk about how LLVM
158 represents stack variables.
161 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
162 it is carefully designed to not have (or need) an "address-of" operator. Notice
163 how the type of the @G/@H global variables is actually "i32*" even though the
164 variable is defined as "i32". What this means is that @G defines <em>space</em>
165 for an i32 in the global data area, but its <em>name</em> actually refers to the
166 address for that space. Stack variables work the same way, but instead of being
167 declared with global variable definitions, they are declared with the
168 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
170 <div class="doc_code">
172 define i32 @test(i1 %Condition) {
174 %X = alloca i32 ; type of %X is i32*.
176 %tmp = load i32* %X ; load the stack value %X from the stack.
177 %tmp2 = add i32 %tmp, 1 ; increment it
178 store i32 %tmp2, i32* %X ; store it back
183 <p>This code shows an example of how you can declare and manipulate a stack
184 variable in the LLVM IR. Stack memory allocated with the alloca instruction is
185 fully general: you can pass the address of the stack slot to functions, you can
186 store it in other variables, etc. In our example above, we could rewrite the
187 example to use the alloca technique to avoid using a PHI node:</p>
189 <div class="doc_code">
191 @G = weak global i32 0 ; type of @G is i32*
192 @H = weak global i32 0 ; type of @H is i32*
194 define i32 @test(i1 %Condition) {
196 %X = alloca i32 ; type of %X is i32*.
197 br i1 %Condition, label %cond_true, label %cond_false
201 store i32 %X.0, i32* %X ; Update X
206 store i32 %X.1, i32* %X ; Update X
210 %X.2 = load i32* %X ; Read X
216 <p>With this, we have discovered a way to handle arbitrary mutable variables
217 without the need to create Phi nodes at all:</p>
220 <li>Each mutable variable becomes a stack allocation.</li>
221 <li>Each read of the variable becomes a load from the stack.</li>
222 <li>Each update of the variable becomes a store to the stack.</li>
223 <li>Taking the address of a variable just uses the stack address directly.</li>
226 <p>While this solution has solved our immediate problem, it introduced another
227 one: we have now apparently introduced a lot of stack traffic for very simple
228 and common operations, a major performance problem. Fortunately for us, the
229 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
230 this case, promoting allocas like this into SSA registers, inserting Phi nodes
231 as appropriate. If you run this example through the pass, for example, you'll
234 <div class="doc_code">
236 $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b>
237 @G = weak global i32 0
238 @H = weak global i32 0
240 define i32 @test(i1 %Condition) {
242 br i1 %Condition, label %cond_true, label %cond_false
253 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
259 <p>The mem2reg pass implements the standard "iterated dominator frontier"
260 algorithm for constructing SSA form and has a number of optimizations that speed
261 up very common degenerate cases. mem2reg really is the answer for dealing with
262 mutable variables, and we highly recommend that you depend on it. Note that
263 mem2reg only works on variables in certain circumstances:</p>
266 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
267 promotes them. It does not apply to global variables or heap allocations.</li>
269 <li>mem2reg only looks for alloca instructions in the entry block of the
270 function. Being in the entry block guarantees that the alloca is only executed
271 once, which makes analysis simpler.</li>
273 <li>mem2reg only promotes allocas whose uses are direct loads and stores. If
274 the address of the stack object is passed to a function, or if any funny pointer
275 arithmetic is involved, the alloca will not be promoted.</li>
277 <li>mem2reg only works on allocas of <a
278 href="../LangRef.html#t_classifications">first class</a>
279 values (such as pointers, scalars and vectors), and only if the array size
280 of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
281 promoting structs or arrays to registers. Note that the "scalarrepl" pass is
282 more powerful and can promote structs, "unions", and arrays in many cases.</li>
287 All of these properties are easy to satisfy for most imperative languages, and
288 we'll illustrate this below with Kaleidoscope. The final question you may be
289 asking is: should I bother with this nonsense for my front-end? Wouldn't it be
290 better if I just did SSA construction directly, avoiding use of the mem2reg
291 optimization pass? In short, we strongly recommend that use you this technique
292 for building SSA form, unless there is an extremely good reason not to. Using
293 this technique is:</p>
296 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
297 mutable variables. As such, the most common clients of LLVM are using this to
298 handle a bulk of their variables. You can be sure that bugs are found fast and
301 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
302 common cases as well as fully general. For example, it has fast-paths for
303 variables that are only used in a single block, variables that only have one
304 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
307 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
308 Debug information in LLVM</a> relies on having the address of the variable
309 exposed to attach debug info to it. This technique dovetails very naturally
310 with this style of debug info.</li>
313 <p>If nothing else, this makes it much easier to get your front-end up and
314 running, and is very simple to implement. Lets extend Kaleidoscope with mutable
320 <!-- *********************************************************************** -->
321 <div class="doc_section"><a name="kalvars">Mutable Variables in
322 Kaleidoscope</a></div>
323 <!-- *********************************************************************** -->
325 <div class="doc_text">
327 <p>Now that we know the sort of problem we want to tackle, lets see what this
328 looks like in the context of our little Kaleidoscope language. We're going to
329 add two features:</p>
332 <li>The ability to mutate variables with the '=' operator.</li>
333 <li>The ability to define new variables.</li>
336 <p>While the first item is really what this is about, we only have variables
337 for incoming arguments and for induction variables, and redefining them only
338 goes so far :). Also, the ability to define new variables is a
339 useful thing regardless of whether you will be mutating them. Here's a
340 motivating example that shows how we could use these:</p>
342 <div class="doc_code">
344 # Define ':' for sequencing: as a low-precedence operator that ignores operands
345 # and just returns the RHS.
346 def binary : 1 (x y) y;
348 # Recursive fib, we could do this before.
357 <b>var a = 1, b = 1, c in</b>
358 (for i = 3, i &;t; x in
370 In order to mutate variables, we have to change our existing variables to use
371 the "alloca trick". Once we have that, we'll add our new operator, then extend
372 Kaleidoscope to support new variable definitions.
377 <!-- *********************************************************************** -->
378 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
380 <!-- *********************************************************************** -->
382 <div class="doc_text">
385 The symbol table in Kaleidoscope is managed at code generation time by the
386 '<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
387 that holds the double value for the named variable. In order to support
388 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
389 the <em>memory location</em> of the variable in question. Note that this
390 change is a refactoring: it changes the structure of the code, but does not
391 (by itself) change the behavior of the compiler. All of these changes are
392 isolated in the Kaleidoscope code generator.</p>
395 At this point in Kaleidoscope's development, it only supports variables for two
396 things: incoming arguments to functions and the induction variable of 'for'
397 loops. For consistency, we'll allow mutation of these variables in addition to
398 other user-defined variables. This means that these will both need memory
402 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
403 map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
404 will tell use what parts of the code we need to update:</p>
406 <div class="doc_code">
408 static std::map<std::string, AllocaInst*> NamedValues;
412 <p>Also, since we will need to create these alloca's, we'll use a helper
413 function that ensures that the allocas are created in the entry block of the
416 <div class="doc_code">
418 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
419 /// the function. This is used for mutable variables etc.
420 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
421 const std::string &VarName) {
422 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
423 TheFunction->getEntryBlock().begin());
424 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
429 <p>This funny looking code creates an LLVMBuilder object that is pointing at
430 the first instruction (.begin()) of the entry block. It then creates an alloca
431 with the expected name and returns it. Because all values in Kaleidoscope are
432 doubles, there is no need to pass in a type to use.</p>
434 <p>With this in place, the first functionality change we want to make is to
435 variable references. In our new scheme, variables live on the stack, so code
436 generating a reference to them actually needs to produce a load from the stack
439 <div class="doc_code">
441 Value *VariableExprAST::Codegen() {
442 // Look this variable up in the function.
443 Value *V = NamedValues[Name];
444 if (V == 0) return ErrorV("Unknown variable name");
447 return Builder.CreateLoad(V, Name.c_str());
452 <p>As you can see, this is pretty straight-forward. Next we need to update the
453 things that define the variables to set up the alloca. We'll start with
454 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
455 the unabridged code):</p>
457 <div class="doc_code">
459 Function *TheFunction = Builder.GetInsertBlock()->getParent();
461 <b>// Create an alloca for the variable in the entry block.
462 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
464 // Emit the start code first, without 'variable' in scope.
465 Value *StartVal = Start->Codegen();
466 if (StartVal == 0) return 0;
468 <b>// Store the value into the alloca.
469 Builder.CreateStore(StartVal, Alloca);</b>
472 // Compute the end condition.
473 Value *EndCond = End->Codegen();
474 if (EndCond == 0) return EndCond;
476 <b>// Reload, increment, and restore the alloca. This handles the case where
477 // the body of the loop mutates the variable.
478 Value *CurVar = Builder.CreateLoad(Alloca);
479 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
480 Builder.CreateStore(NextVar, Alloca);</b>
485 <p>This code is virtually identical to the code <a
486 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
487 big difference is that we no longer have to construct a PHI node, and we use
488 load/store to access the variable as needed.</p>
490 <p>To support mutable argument variables, we need to also make allocas for them.
491 The code for this is also pretty simple:</p>
493 <div class="doc_code">
495 /// CreateArgumentAllocas - Create an alloca for each argument and register the
496 /// argument in the symbol table so that references to it will succeed.
497 void PrototypeAST::CreateArgumentAllocas(Function *F) {
498 Function::arg_iterator AI = F->arg_begin();
499 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
500 // Create an alloca for this variable.
501 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
503 // Store the initial value into the alloca.
504 Builder.CreateStore(AI, Alloca);
506 // Add arguments to variable symbol table.
507 NamedValues[Args[Idx]] = Alloca;
513 <p>For each argument, we make an alloca, store the input value to the function
514 into the alloca, and register the alloca as the memory location for the
515 argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
516 it sets up the entry block for the function.</p>
518 <p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
519 good codegen once again:</p>
521 <div class="doc_code">
523 // Set up the optimizer pipeline. Start with registering info about how the
524 // target lays out data structures.
525 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
526 <b>// Promote allocas to registers.
527 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
528 // Do simple "peephole" optimizations and bit-twiddling optzns.
529 OurFPM.add(createInstructionCombiningPass());
530 // Reassociate expressions.
531 OurFPM.add(createReassociatePass());
535 <p>It is interesting to see what the code looks like before and after the
536 mem2reg optimization runs. For example, this is the before/after code for our
537 recursive fib. Before the optimization:</p>
539 <div class="doc_code">
541 define double @fib(double %x) {
543 <b>%x1 = alloca double
544 store double %x, double* %x1
545 %x2 = load double* %x1</b>
546 %multmp = fcmp ult double %x2, 3.000000e+00
547 %booltmp = uitofp i1 %multmp to double
548 %ifcond = fcmp one double %booltmp, 0.000000e+00
549 br i1 %ifcond, label %then, label %else
551 then: ; preds = %entry
554 else: ; preds = %entry
555 <b>%x3 = load double* %x1</b>
556 %subtmp = sub double %x3, 1.000000e+00
557 %calltmp = call double @fib( double %subtmp )
558 <b>%x4 = load double* %x1</b>
559 %subtmp5 = sub double %x4, 2.000000e+00
560 %calltmp6 = call double @fib( double %subtmp5 )
561 %addtmp = add double %calltmp, %calltmp6
564 ifcont: ; preds = %else, %then
565 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
571 <p>Here there is only one variable (x, the input argument) but you can still
572 see the extremely simple-minded code generation strategy we are using. In the
573 entry block, an alloca is created, and the initial input value is stored into
574 it. Each reference to the variable does a reload from the stack. Also, note
575 that we didn't modify the if/then/else expression, so it still inserts a PHI
576 node. While we could make an alloca for it, it is actually easier to create a
577 PHI node for it, so we still just make the PHI.</p>
579 <p>Here is the code after the mem2reg pass runs:</p>
581 <div class="doc_code">
583 define double @fib(double %x) {
585 %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
586 %booltmp = uitofp i1 %multmp to double
587 %ifcond = fcmp one double %booltmp, 0.000000e+00
588 br i1 %ifcond, label %then, label %else
594 %subtmp = sub double <b>%x</b>, 1.000000e+00
595 %calltmp = call double @fib( double %subtmp )
596 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
597 %calltmp6 = call double @fib( double %subtmp5 )
598 %addtmp = add double %calltmp, %calltmp6
601 ifcont: ; preds = %else, %then
602 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
608 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
609 variable. The point of showing this is to calm your tension about inserting
610 such blatent inefficiencies :).</p>
612 <p>After the rest of the optimizers run, we get:</p>
614 <div class="doc_code">
616 define double @fib(double %x) {
618 %multmp = fcmp ult double %x, 3.000000e+00
619 %booltmp = uitofp i1 %multmp to double
620 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
621 br i1 %ifcond, label %else, label %ifcont
624 %subtmp = sub double %x, 1.000000e+00
625 %calltmp = call double @fib( double %subtmp )
626 %subtmp5 = sub double %x, 2.000000e+00
627 %calltmp6 = call double @fib( double %subtmp5 )
628 %addtmp = add double %calltmp, %calltmp6
632 ret double 1.000000e+00
637 <p>Here we see that the simplifycfg pass decided to clone the return instruction
638 into the end of the 'else' block. This allowed it to eliminate some branches
639 and the PHI node.</p>
641 <p>Now that all symbol table references are updated to use stack variables,
642 we'll add the assignment operator.</p>
646 <!-- *********************************************************************** -->
647 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
648 <!-- *********************************************************************** -->
650 <div class="doc_text">
652 <p>With our current framework, adding a new assignment operator is really
653 simple. We will parse it just like any other binary operator, but handle it
654 internally (instead of allowing the user to define it). The first step is to
655 set a precedence:</p>
657 <div class="doc_code">
660 // Install standard binary operators.
661 // 1 is lowest precedence.
662 <b>BinopPrecedence['='] = 2;</b>
663 BinopPrecedence['<'] = 10;
664 BinopPrecedence['+'] = 20;
665 BinopPrecedence['-'] = 20;
669 <p>Now that the parser knows the precedence of the binary operator, it takes
670 care of all the parsing and AST generation. We just need to implement codegen
671 for the assignment operator. This looks like:</p>
673 <div class="doc_code">
675 Value *BinaryExprAST::Codegen() {
676 // Special case '=' because we don't want to emit the LHS as an expression.
678 // Assignment requires the LHS to be an identifier.
679 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
681 return ErrorV("destination of '=' must be a variable");
685 <p>Unlike the rest of the binary operators, our assignment operator doesn't
686 follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
687 as a special case before the other binary operators are handled. The other
688 strange thing about it is that it requires the LHS to be a variable directly.
691 <div class="doc_code">
694 Value *Val = RHS->Codegen();
695 if (Val == 0) return 0;
698 Value *Variable = NamedValues[LHSE->getName()];
699 if (Variable == 0) return ErrorV("Unknown variable name");
701 Builder.CreateStore(Val, Variable);
708 <p>Once it has the variable, codegen'ing the assignment is straight-forward:
709 we emit the RHS of the assignment, create a store, and return the computed
710 value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
712 <p>Now that we have an assignment operator, we can mutate loop variables and
713 arguments. For example, we can now run code like this:</p>
715 <div class="doc_code">
717 # Function to print a double.
720 # Define ':' for sequencing: as a low-precedence operator that ignores operands
721 # and just returns the RHS.
722 def binary : 1 (x y) y;
733 <p>When run, this example prints "123" and then "4", showing that we did
734 actually mutate the value! Okay, we have now officially implemented our goal:
735 getting this to work requires SSA construction in the general case. However,
736 to be really useful, we want the ability to define our own local variables, lets
742 <!-- *********************************************************************** -->
743 <div class="doc_section"><a name="localvars">User-defined Local
745 <!-- *********************************************************************** -->
747 <div class="doc_text">
749 <p>Adding var/in is just like any other other extensions we made to
750 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
751 The first step for adding our new 'var/in' construct is to extend the lexer.
752 As before, this is pretty trivial, the code looks like this:</p>
754 <div class="doc_code">
763 static int gettok() {
765 if (IdentifierStr == "in") return tok_in;
766 if (IdentifierStr == "binary") return tok_binary;
767 if (IdentifierStr == "unary") return tok_unary;
768 <b>if (IdentifierStr == "var") return tok_var;</b>
769 return tok_identifier;
774 <p>The next step is to define the AST node that we will construct. For var/in,
775 it will look like this:</p>
777 <div class="doc_code">
779 /// VarExprAST - Expression class for var/in
780 class VarExprAST : public ExprAST {
781 std::vector<std::pair<std::string, ExprAST*> > VarNames;
784 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
786 : VarNames(varnames), Body(body) {}
788 virtual Value *Codegen();
793 <p>var/in allows a list of names to be defined all at once, and each name can
794 optionally have an initializer value. As such, we capture this information in
795 the VarNames vector. Also, var/in has a body, this body is allowed to access
796 the variables defined by the let/in.</p>
798 <p>With this ready, we can define the parser pieces. First thing we do is add
799 it as a primary expression:</p>
801 <div class="doc_code">
804 /// ::= identifierexpr
809 <b>/// ::= varexpr</b>
810 static ExprAST *ParsePrimary() {
812 default: return Error("unknown token when expecting an expression");
813 case tok_identifier: return ParseIdentifierExpr();
814 case tok_number: return ParseNumberExpr();
815 case '(': return ParseParenExpr();
816 case tok_if: return ParseIfExpr();
817 case tok_for: return ParseForExpr();
818 <b>case tok_var: return ParseVarExpr();</b>
824 <p>Next we define ParseVarExpr:</p>
826 <div class="doc_code">
828 /// varexpr ::= 'var' identifier ('=' expression)?
829 // (',' identifier ('=' expression)?)* 'in' expression
830 static ExprAST *ParseVarExpr() {
831 getNextToken(); // eat the var.
833 std::vector<std::pair<std::string, ExprAST*> > VarNames;
835 // At least one variable name is required.
836 if (CurTok != tok_identifier)
837 return Error("expected identifier after var");
841 <p>The first part of this code parses the list of identifier/expr pairs into the
842 local <tt>VarNames</tt> vector.
844 <div class="doc_code">
847 std::string Name = IdentifierStr;
848 getNextToken(); // eat identifier.
850 // Read the optional initializer.
853 getNextToken(); // eat the '='.
855 Init = ParseExpression();
856 if (Init == 0) return 0;
859 VarNames.push_back(std::make_pair(Name, Init));
861 // End of var list, exit loop.
862 if (CurTok != ',') break;
863 getNextToken(); // eat the ','.
865 if (CurTok != tok_identifier)
866 return Error("expected identifier list after var");
871 <p>Once all the variables are parsed, we then parse the body and create the
874 <div class="doc_code">
876 // At this point, we have to have 'in'.
877 if (CurTok != tok_in)
878 return Error("expected 'in' keyword after 'var'");
879 getNextToken(); // eat 'in'.
881 ExprAST *Body = ParseExpression();
882 if (Body == 0) return 0;
884 return new VarExprAST(VarNames, Body);
889 <p>Now that we can parse and represent the code, we need to support emission of
890 LLVM IR for it. This code starts out with:</p>
892 <div class="doc_code">
894 Value *VarExprAST::Codegen() {
895 std::vector<AllocaInst *> OldBindings;
897 Function *TheFunction = Builder.GetInsertBlock()->getParent();
899 // Register all variables and emit their initializer.
900 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
901 const std::string &VarName = VarNames[i].first;
902 ExprAST *Init = VarNames[i].second;
906 <p>Basically it loops over all the variables, installing them one at a time.
907 For each variable we put into the symbol table, we remember the previous value
908 that we replace in OldBindings.</p>
910 <div class="doc_code">
912 // Emit the initializer before adding the variable to scope, this prevents
913 // the initializer from referencing the variable itself, and permits stuff
916 // var a = a in ... # refers to outer 'a'.
919 InitVal = Init->Codegen();
920 if (InitVal == 0) return 0;
921 } else { // If not specified, use 0.0.
922 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
925 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
926 Builder.CreateStore(InitVal, Alloca);
928 // Remember the old variable binding so that we can restore the binding when
930 OldBindings.push_back(NamedValues[VarName]);
932 // Remember this binding.
933 NamedValues[VarName] = Alloca;
938 <p>There are more comments here than code. The basic idea is that we emit the
939 initializer, create the alloca, then update the symbol table to point to it.
940 Once all the variables are installed in the symbol table, we evaluate the body
941 of the var/in expression:</p>
943 <div class="doc_code">
945 // Codegen the body, now that all vars are in scope.
946 Value *BodyVal = Body->Codegen();
947 if (BodyVal == 0) return 0;
951 <p>Finally, before returning, we restore the previous variable bindings:</p>
953 <div class="doc_code">
955 // Pop all our variables from scope.
956 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
957 NamedValues[VarNames[i].first] = OldBindings[i];
959 // Return the body computation.
965 <p>The end result of all of this is that we get properly scoped variable
966 definitions, and we even (trivially) allow mutation of them :).</p>
968 <p>With this, we completed what we set out to do. Our nice iterative fib
969 example from the intro compiles and runs just fine. The mem2reg pass optimizes
970 all of our stack variables into SSA registers, inserting PHI nodes where needed,
971 and our front-end remains simple: no iterated dominator frontier computation
972 anywhere in sight.</p>
976 <!-- *********************************************************************** -->
977 <div class="doc_section"><a name="code">Full Code Listing</a></div>
978 <!-- *********************************************************************** -->
980 <div class="doc_text">
983 Here is the complete code listing for our running example, enhanced with mutable
984 variables and var/in support. To build this example, use:
987 <div class="doc_code">
990 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
996 <p>Here is the code:</p>
998 <div class="doc_code">
1000 #include "llvm/DerivedTypes.h"
1001 #include "llvm/ExecutionEngine/ExecutionEngine.h"
1002 #include "llvm/Module.h"
1003 #include "llvm/ModuleProvider.h"
1004 #include "llvm/PassManager.h"
1005 #include "llvm/Analysis/Verifier.h"
1006 #include "llvm/Target/TargetData.h"
1007 #include "llvm/Transforms/Scalar.h"
1008 #include "llvm/Support/LLVMBuilder.h"
1009 #include <cstdio>
1010 #include <string>
1011 #include <map>
1012 #include <vector>
1013 using namespace llvm;
1015 //===----------------------------------------------------------------------===//
1017 //===----------------------------------------------------------------------===//
1019 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1020 // of these for known things.
1025 tok_def = -2, tok_extern = -3,
1028 tok_identifier = -4, tok_number = -5,
1031 tok_if = -6, tok_then = -7, tok_else = -8,
1032 tok_for = -9, tok_in = -10,
1035 tok_binary = -11, tok_unary = -12,
1041 static std::string IdentifierStr; // Filled in if tok_identifier
1042 static double NumVal; // Filled in if tok_number
1044 /// gettok - Return the next token from standard input.
1045 static int gettok() {
1046 static int LastChar = ' ';
1048 // Skip any whitespace.
1049 while (isspace(LastChar))
1050 LastChar = getchar();
1052 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1053 IdentifierStr = LastChar;
1054 while (isalnum((LastChar = getchar())))
1055 IdentifierStr += LastChar;
1057 if (IdentifierStr == "def") return tok_def;
1058 if (IdentifierStr == "extern") return tok_extern;
1059 if (IdentifierStr == "if") return tok_if;
1060 if (IdentifierStr == "then") return tok_then;
1061 if (IdentifierStr == "else") return tok_else;
1062 if (IdentifierStr == "for") return tok_for;
1063 if (IdentifierStr == "in") return tok_in;
1064 if (IdentifierStr == "binary") return tok_binary;
1065 if (IdentifierStr == "unary") return tok_unary;
1066 if (IdentifierStr == "var") return tok_var;
1067 return tok_identifier;
1070 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1074 LastChar = getchar();
1075 } while (isdigit(LastChar) || LastChar == '.');
1077 NumVal = strtod(NumStr.c_str(), 0);
1081 if (LastChar == '#') {
1082 // Comment until end of line.
1083 do LastChar = getchar();
1084 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
1086 if (LastChar != EOF)
1090 // Check for end of file. Don't eat the EOF.
1091 if (LastChar == EOF)
1094 // Otherwise, just return the character as its ascii value.
1095 int ThisChar = LastChar;
1096 LastChar = getchar();
1100 //===----------------------------------------------------------------------===//
1101 // Abstract Syntax Tree (aka Parse Tree)
1102 //===----------------------------------------------------------------------===//
1104 /// ExprAST - Base class for all expression nodes.
1107 virtual ~ExprAST() {}
1108 virtual Value *Codegen() = 0;
1111 /// NumberExprAST - Expression class for numeric literals like "1.0".
1112 class NumberExprAST : public ExprAST {
1115 NumberExprAST(double val) : Val(val) {}
1116 virtual Value *Codegen();
1119 /// VariableExprAST - Expression class for referencing a variable, like "a".
1120 class VariableExprAST : public ExprAST {
1123 VariableExprAST(const std::string &name) : Name(name) {}
1124 const std::string &getName() const { return Name; }
1125 virtual Value *Codegen();
1128 /// UnaryExprAST - Expression class for a unary operator.
1129 class UnaryExprAST : public ExprAST {
1133 UnaryExprAST(char opcode, ExprAST *operand)
1134 : Opcode(opcode), Operand(operand) {}
1135 virtual Value *Codegen();
1138 /// BinaryExprAST - Expression class for a binary operator.
1139 class BinaryExprAST : public ExprAST {
1143 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1144 : Op(op), LHS(lhs), RHS(rhs) {}
1145 virtual Value *Codegen();
1148 /// CallExprAST - Expression class for function calls.
1149 class CallExprAST : public ExprAST {
1151 std::vector<ExprAST*> Args;
1153 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
1154 : Callee(callee), Args(args) {}
1155 virtual Value *Codegen();
1158 /// IfExprAST - Expression class for if/then/else.
1159 class IfExprAST : public ExprAST {
1160 ExprAST *Cond, *Then, *Else;
1162 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1163 : Cond(cond), Then(then), Else(_else) {}
1164 virtual Value *Codegen();
1167 /// ForExprAST - Expression class for for/in.
1168 class ForExprAST : public ExprAST {
1169 std::string VarName;
1170 ExprAST *Start, *End, *Step, *Body;
1172 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
1173 ExprAST *step, ExprAST *body)
1174 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1175 virtual Value *Codegen();
1178 /// VarExprAST - Expression class for var/in
1179 class VarExprAST : public ExprAST {
1180 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1183 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
1185 : VarNames(varnames), Body(body) {}
1187 virtual Value *Codegen();
1190 /// PrototypeAST - This class represents the "prototype" for a function,
1191 /// which captures its argument names as well as if it is an operator.
1192 class PrototypeAST {
1194 std::vector<std::string> Args;
1196 unsigned Precedence; // Precedence if a binary op.
1198 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
1199 bool isoperator = false, unsigned prec = 0)
1200 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1202 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
1203 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
1205 char getOperatorName() const {
1206 assert(isUnaryOp() || isBinaryOp());
1207 return Name[Name.size()-1];
1210 unsigned getBinaryPrecedence() const { return Precedence; }
1212 Function *Codegen();
1214 void CreateArgumentAllocas(Function *F);
1217 /// FunctionAST - This class represents a function definition itself.
1219 PrototypeAST *Proto;
1222 FunctionAST(PrototypeAST *proto, ExprAST *body)
1223 : Proto(proto), Body(body) {}
1225 Function *Codegen();
1228 //===----------------------------------------------------------------------===//
1230 //===----------------------------------------------------------------------===//
1232 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1233 /// token the parser it looking at. getNextToken reads another token from the
1234 /// lexer and updates CurTok with its results.
1236 static int getNextToken() {
1237 return CurTok = gettok();
1240 /// BinopPrecedence - This holds the precedence for each binary operator that is
1242 static std::map<char, int> BinopPrecedence;
1244 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1245 static int GetTokPrecedence() {
1246 if (!isascii(CurTok))
1249 // Make sure it's a declared binop.
1250 int TokPrec = BinopPrecedence[CurTok];
1251 if (TokPrec <= 0) return -1;
1255 /// Error* - These are little helper functions for error handling.
1256 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1257 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1258 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1260 static ExprAST *ParseExpression();
1264 /// ::= identifier '(' expression* ')'
1265 static ExprAST *ParseIdentifierExpr() {
1266 std::string IdName = IdentifierStr;
1268 getNextToken(); // eat identifier.
1270 if (CurTok != '(') // Simple variable ref.
1271 return new VariableExprAST(IdName);
1274 getNextToken(); // eat (
1275 std::vector<ExprAST*> Args;
1276 if (CurTok != ')') {
1278 ExprAST *Arg = ParseExpression();
1280 Args.push_back(Arg);
1282 if (CurTok == ')') break;
1285 return Error("Expected ')'");
1293 return new CallExprAST(IdName, Args);
1296 /// numberexpr ::= number
1297 static ExprAST *ParseNumberExpr() {
1298 ExprAST *Result = new NumberExprAST(NumVal);
1299 getNextToken(); // consume the number
1303 /// parenexpr ::= '(' expression ')'
1304 static ExprAST *ParseParenExpr() {
1305 getNextToken(); // eat (.
1306 ExprAST *V = ParseExpression();
1310 return Error("expected ')'");
1311 getNextToken(); // eat ).
1315 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1316 static ExprAST *ParseIfExpr() {
1317 getNextToken(); // eat the if.
1320 ExprAST *Cond = ParseExpression();
1321 if (!Cond) return 0;
1323 if (CurTok != tok_then)
1324 return Error("expected then");
1325 getNextToken(); // eat the then
1327 ExprAST *Then = ParseExpression();
1328 if (Then == 0) return 0;
1330 if (CurTok != tok_else)
1331 return Error("expected else");
1335 ExprAST *Else = ParseExpression();
1336 if (!Else) return 0;
1338 return new IfExprAST(Cond, Then, Else);
1341 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1342 static ExprAST *ParseForExpr() {
1343 getNextToken(); // eat the for.
1345 if (CurTok != tok_identifier)
1346 return Error("expected identifier after for");
1348 std::string IdName = IdentifierStr;
1349 getNextToken(); // eat identifier.
1352 return Error("expected '=' after for");
1353 getNextToken(); // eat '='.
1356 ExprAST *Start = ParseExpression();
1357 if (Start == 0) return 0;
1359 return Error("expected ',' after for start value");
1362 ExprAST *End = ParseExpression();
1363 if (End == 0) return 0;
1365 // The step value is optional.
1367 if (CurTok == ',') {
1369 Step = ParseExpression();
1370 if (Step == 0) return 0;
1373 if (CurTok != tok_in)
1374 return Error("expected 'in' after for");
1375 getNextToken(); // eat 'in'.
1377 ExprAST *Body = ParseExpression();
1378 if (Body == 0) return 0;
1380 return new ForExprAST(IdName, Start, End, Step, Body);
1383 /// varexpr ::= 'var' identifier ('=' expression)?
1384 // (',' identifier ('=' expression)?)* 'in' expression
1385 static ExprAST *ParseVarExpr() {
1386 getNextToken(); // eat the var.
1388 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1390 // At least one variable name is required.
1391 if (CurTok != tok_identifier)
1392 return Error("expected identifier after var");
1395 std::string Name = IdentifierStr;
1396 getNextToken(); // eat identifier.
1398 // Read the optional initializer.
1400 if (CurTok == '=') {
1401 getNextToken(); // eat the '='.
1403 Init = ParseExpression();
1404 if (Init == 0) return 0;
1407 VarNames.push_back(std::make_pair(Name, Init));
1409 // End of var list, exit loop.
1410 if (CurTok != ',') break;
1411 getNextToken(); // eat the ','.
1413 if (CurTok != tok_identifier)
1414 return Error("expected identifier list after var");
1417 // At this point, we have to have 'in'.
1418 if (CurTok != tok_in)
1419 return Error("expected 'in' keyword after 'var'");
1420 getNextToken(); // eat 'in'.
1422 ExprAST *Body = ParseExpression();
1423 if (Body == 0) return 0;
1425 return new VarExprAST(VarNames, Body);
1430 /// ::= identifierexpr
1436 static ExprAST *ParsePrimary() {
1438 default: return Error("unknown token when expecting an expression");
1439 case tok_identifier: return ParseIdentifierExpr();
1440 case tok_number: return ParseNumberExpr();
1441 case '(': return ParseParenExpr();
1442 case tok_if: return ParseIfExpr();
1443 case tok_for: return ParseForExpr();
1444 case tok_var: return ParseVarExpr();
1451 static ExprAST *ParseUnary() {
1452 // If the current token is not an operator, it must be a primary expr.
1453 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1454 return ParsePrimary();
1456 // If this is a unary operator, read it.
1459 if (ExprAST *Operand = ParseUnary())
1460 return new UnaryExprAST(Opc, Operand);
1465 /// ::= ('+' unary)*
1466 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1467 // If this is a binop, find its precedence.
1469 int TokPrec = GetTokPrecedence();
1471 // If this is a binop that binds at least as tightly as the current binop,
1472 // consume it, otherwise we are done.
1473 if (TokPrec < ExprPrec)
1476 // Okay, we know this is a binop.
1478 getNextToken(); // eat binop
1480 // Parse the unary expression after the binary operator.
1481 ExprAST *RHS = ParseUnary();
1484 // If BinOp binds less tightly with RHS than the operator after RHS, let
1485 // the pending operator take RHS as its LHS.
1486 int NextPrec = GetTokPrecedence();
1487 if (TokPrec < NextPrec) {
1488 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1489 if (RHS == 0) return 0;
1493 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1498 /// ::= unary binoprhs
1500 static ExprAST *ParseExpression() {
1501 ExprAST *LHS = ParseUnary();
1504 return ParseBinOpRHS(0, LHS);
1508 /// ::= id '(' id* ')'
1509 /// ::= binary LETTER number? (id, id)
1510 /// ::= unary LETTER (id)
1511 static PrototypeAST *ParsePrototype() {
1514 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1515 unsigned BinaryPrecedence = 30;
1519 return ErrorP("Expected function name in prototype");
1520 case tok_identifier:
1521 FnName = IdentifierStr;
1527 if (!isascii(CurTok))
1528 return ErrorP("Expected unary operator");
1530 FnName += (char)CurTok;
1536 if (!isascii(CurTok))
1537 return ErrorP("Expected binary operator");
1539 FnName += (char)CurTok;
1543 // Read the precedence if present.
1544 if (CurTok == tok_number) {
1545 if (NumVal < 1 || NumVal > 100)
1546 return ErrorP("Invalid precedecnce: must be 1..100");
1547 BinaryPrecedence = (unsigned)NumVal;
1554 return ErrorP("Expected '(' in prototype");
1556 std::vector<std::string> ArgNames;
1557 while (getNextToken() == tok_identifier)
1558 ArgNames.push_back(IdentifierStr);
1560 return ErrorP("Expected ')' in prototype");
1563 getNextToken(); // eat ')'.
1565 // Verify right number of names for operator.
1566 if (Kind && ArgNames.size() != Kind)
1567 return ErrorP("Invalid number of operands for operator");
1569 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1572 /// definition ::= 'def' prototype expression
1573 static FunctionAST *ParseDefinition() {
1574 getNextToken(); // eat def.
1575 PrototypeAST *Proto = ParsePrototype();
1576 if (Proto == 0) return 0;
1578 if (ExprAST *E = ParseExpression())
1579 return new FunctionAST(Proto, E);
1583 /// toplevelexpr ::= expression
1584 static FunctionAST *ParseTopLevelExpr() {
1585 if (ExprAST *E = ParseExpression()) {
1586 // Make an anonymous proto.
1587 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1588 return new FunctionAST(Proto, E);
1593 /// external ::= 'extern' prototype
1594 static PrototypeAST *ParseExtern() {
1595 getNextToken(); // eat extern.
1596 return ParsePrototype();
1599 //===----------------------------------------------------------------------===//
1601 //===----------------------------------------------------------------------===//
1603 static Module *TheModule;
1604 static LLVMFoldingBuilder Builder;
1605 static std::map<std::string, AllocaInst*> NamedValues;
1606 static FunctionPassManager *TheFPM;
1608 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1610 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1611 /// the function. This is used for mutable variables etc.
1612 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1613 const std::string &VarName) {
1614 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
1615 TheFunction->getEntryBlock().begin());
1616 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1620 Value *NumberExprAST::Codegen() {
1621 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1624 Value *VariableExprAST::Codegen() {
1625 // Look this variable up in the function.
1626 Value *V = NamedValues[Name];
1627 if (V == 0) return ErrorV("Unknown variable name");
1630 return Builder.CreateLoad(V, Name.c_str());
1633 Value *UnaryExprAST::Codegen() {
1634 Value *OperandV = Operand->Codegen();
1635 if (OperandV == 0) return 0;
1637 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1639 return ErrorV("Unknown unary operator");
1641 return Builder.CreateCall(F, OperandV, "unop");
1645 Value *BinaryExprAST::Codegen() {
1646 // Special case '=' because we don't want to emit the LHS as an expression.
1648 // Assignment requires the LHS to be an identifier.
1649 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
1651 return ErrorV("destination of '=' must be a variable");
1653 Value *Val = RHS->Codegen();
1654 if (Val == 0) return 0;
1656 // Look up the name.
1657 Value *Variable = NamedValues[LHSE->getName()];
1658 if (Variable == 0) return ErrorV("Unknown variable name");
1660 Builder.CreateStore(Val, Variable);
1665 Value *L = LHS->Codegen();
1666 Value *R = RHS->Codegen();
1667 if (L == 0 || R == 0) return 0;
1670 case '+': return Builder.CreateAdd(L, R, "addtmp");
1671 case '-': return Builder.CreateSub(L, R, "subtmp");
1672 case '*': return Builder.CreateMul(L, R, "multmp");
1674 L = Builder.CreateFCmpULT(L, R, "multmp");
1675 // Convert bool 0/1 to double 0.0 or 1.0
1676 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1680 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1682 Function *F = TheModule->getFunction(std::string("binary")+Op);
1683 assert(F && "binary operator not found!");
1685 Value *Ops[] = { L, R };
1686 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1689 Value *CallExprAST::Codegen() {
1690 // Look up the name in the global module table.
1691 Function *CalleeF = TheModule->getFunction(Callee);
1693 return ErrorV("Unknown function referenced");
1695 // If argument mismatch error.
1696 if (CalleeF->arg_size() != Args.size())
1697 return ErrorV("Incorrect # arguments passed");
1699 std::vector<Value*> ArgsV;
1700 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1701 ArgsV.push_back(Args[i]->Codegen());
1702 if (ArgsV.back() == 0) return 0;
1705 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1708 Value *IfExprAST::Codegen() {
1709 Value *CondV = Cond->Codegen();
1710 if (CondV == 0) return 0;
1712 // Convert condition to a bool by comparing equal to 0.0.
1713 CondV = Builder.CreateFCmpONE(CondV,
1714 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1717 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1719 // Create blocks for the then and else cases. Insert the 'then' block at the
1720 // end of the function.
1721 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1722 BasicBlock *ElseBB = new BasicBlock("else");
1723 BasicBlock *MergeBB = new BasicBlock("ifcont");
1725 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1728 Builder.SetInsertPoint(ThenBB);
1730 Value *ThenV = Then->Codegen();
1731 if (ThenV == 0) return 0;
1733 Builder.CreateBr(MergeBB);
1734 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1735 ThenBB = Builder.GetInsertBlock();
1738 TheFunction->getBasicBlockList().push_back(ElseBB);
1739 Builder.SetInsertPoint(ElseBB);
1741 Value *ElseV = Else->Codegen();
1742 if (ElseV == 0) return 0;
1744 Builder.CreateBr(MergeBB);
1745 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1746 ElseBB = Builder.GetInsertBlock();
1748 // Emit merge block.
1749 TheFunction->getBasicBlockList().push_back(MergeBB);
1750 Builder.SetInsertPoint(MergeBB);
1751 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1753 PN->addIncoming(ThenV, ThenBB);
1754 PN->addIncoming(ElseV, ElseBB);
1758 Value *ForExprAST::Codegen() {
1760 // var = alloca double
1762 // start = startexpr
1763 // store start -> var
1771 // endcond = endexpr
1773 // curvar = load var
1774 // nextvar = curvar + step
1775 // store nextvar -> var
1776 // br endcond, loop, endloop
1779 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1781 // Create an alloca for the variable in the entry block.
1782 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1784 // Emit the start code first, without 'variable' in scope.
1785 Value *StartVal = Start->Codegen();
1786 if (StartVal == 0) return 0;
1788 // Store the value into the alloca.
1789 Builder.CreateStore(StartVal, Alloca);
1791 // Make the new basic block for the loop header, inserting after current
1793 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1794 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1796 // Insert an explicit fall through from the current block to the LoopBB.
1797 Builder.CreateBr(LoopBB);
1799 // Start insertion in LoopBB.
1800 Builder.SetInsertPoint(LoopBB);
1802 // Within the loop, the variable is defined equal to the PHI node. If it
1803 // shadows an existing variable, we have to restore it, so save it now.
1804 AllocaInst *OldVal = NamedValues[VarName];
1805 NamedValues[VarName] = Alloca;
1807 // Emit the body of the loop. This, like any other expr, can change the
1808 // current BB. Note that we ignore the value computed by the body, but don't
1810 if (Body->Codegen() == 0)
1813 // Emit the step value.
1816 StepVal = Step->Codegen();
1817 if (StepVal == 0) return 0;
1819 // If not specified, use 1.0.
1820 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1823 // Compute the end condition.
1824 Value *EndCond = End->Codegen();
1825 if (EndCond == 0) return EndCond;
1827 // Reload, increment, and restore the alloca. This handles the case where
1828 // the body of the loop mutates the variable.
1829 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1830 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1831 Builder.CreateStore(NextVar, Alloca);
1833 // Convert condition to a bool by comparing equal to 0.0.
1834 EndCond = Builder.CreateFCmpONE(EndCond,
1835 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1838 // Create the "after loop" block and insert it.
1839 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1840 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1842 // Insert the conditional branch into the end of LoopEndBB.
1843 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1845 // Any new code will be inserted in AfterBB.
1846 Builder.SetInsertPoint(AfterBB);
1848 // Restore the unshadowed variable.
1850 NamedValues[VarName] = OldVal;
1852 NamedValues.erase(VarName);
1855 // for expr always returns 0.0.
1856 return Constant::getNullValue(Type::DoubleTy);
1859 Value *VarExprAST::Codegen() {
1860 std::vector<AllocaInst *> OldBindings;
1862 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1864 // Register all variables and emit their initializer.
1865 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1866 const std::string &VarName = VarNames[i].first;
1867 ExprAST *Init = VarNames[i].second;
1869 // Emit the initializer before adding the variable to scope, this prevents
1870 // the initializer from referencing the variable itself, and permits stuff
1873 // var a = a in ... # refers to outer 'a'.
1876 InitVal = Init->Codegen();
1877 if (InitVal == 0) return 0;
1878 } else { // If not specified, use 0.0.
1879 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1882 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1883 Builder.CreateStore(InitVal, Alloca);
1885 // Remember the old variable binding so that we can restore the binding when
1887 OldBindings.push_back(NamedValues[VarName]);
1889 // Remember this binding.
1890 NamedValues[VarName] = Alloca;
1893 // Codegen the body, now that all vars are in scope.
1894 Value *BodyVal = Body->Codegen();
1895 if (BodyVal == 0) return 0;
1897 // Pop all our variables from scope.
1898 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1899 NamedValues[VarNames[i].first] = OldBindings[i];
1901 // Return the body computation.
1906 Function *PrototypeAST::Codegen() {
1907 // Make the function type: double(double,double) etc.
1908 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
1909 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1911 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1913 // If F conflicted, there was already something named 'Name'. If it has a
1914 // body, don't allow redefinition or reextern.
1915 if (F->getName() != Name) {
1916 // Delete the one we just made and get the existing one.
1917 F->eraseFromParent();
1918 F = TheModule->getFunction(Name);
1920 // If F already has a body, reject this.
1921 if (!F->empty()) {
1922 ErrorF("redefinition of function");
1926 // If F took a different number of args, reject.
1927 if (F->arg_size() != Args.size()) {
1928 ErrorF("redefinition of function with different # args");
1933 // Set names for all arguments.
1935 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1937 AI->setName(Args[Idx]);
1942 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1943 /// argument in the symbol table so that references to it will succeed.
1944 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1945 Function::arg_iterator AI = F->arg_begin();
1946 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1947 // Create an alloca for this variable.
1948 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1950 // Store the initial value into the alloca.
1951 Builder.CreateStore(AI, Alloca);
1953 // Add arguments to variable symbol table.
1954 NamedValues[Args[Idx]] = Alloca;
1959 Function *FunctionAST::Codegen() {
1960 NamedValues.clear();
1962 Function *TheFunction = Proto->Codegen();
1963 if (TheFunction == 0)
1966 // If this is an operator, install it.
1967 if (Proto->isBinaryOp())
1968 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1970 // Create a new basic block to start insertion into.
1971 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1972 Builder.SetInsertPoint(BB);
1974 // Add all arguments to the symbol table and create their allocas.
1975 Proto->CreateArgumentAllocas(TheFunction);
1977 if (Value *RetVal = Body->Codegen()) {
1978 // Finish off the function.
1979 Builder.CreateRet(RetVal);
1981 // Validate the generated code, checking for consistency.
1982 verifyFunction(*TheFunction);
1984 // Optimize the function.
1985 TheFPM->run(*TheFunction);
1990 // Error reading body, remove function.
1991 TheFunction->eraseFromParent();
1993 if (Proto->isBinaryOp())
1994 BinopPrecedence.erase(Proto->getOperatorName());
1998 //===----------------------------------------------------------------------===//
1999 // Top-Level parsing and JIT Driver
2000 //===----------------------------------------------------------------------===//
2002 static ExecutionEngine *TheExecutionEngine;
2004 static void HandleDefinition() {
2005 if (FunctionAST *F = ParseDefinition()) {
2006 if (Function *LF = F->Codegen()) {
2007 fprintf(stderr, "Read function definition:");
2011 // Skip token for error recovery.
2016 static void HandleExtern() {
2017 if (PrototypeAST *P = ParseExtern()) {
2018 if (Function *F = P->Codegen()) {
2019 fprintf(stderr, "Read extern: ");
2023 // Skip token for error recovery.
2028 static void HandleTopLevelExpression() {
2029 // Evaluate a top level expression into an anonymous function.
2030 if (FunctionAST *F = ParseTopLevelExpr()) {
2031 if (Function *LF = F->Codegen()) {
2032 // JIT the function, returning a function pointer.
2033 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
2035 // Cast it to the right type (takes no arguments, returns a double) so we
2036 // can call it as a native function.
2037 double (*FP)() = (double (*)())FPtr;
2038 fprintf(stderr, "Evaluated to %f\n", FP());
2041 // Skip token for error recovery.
2046 /// top ::= definition | external | expression | ';'
2047 static void MainLoop() {
2049 fprintf(stderr, "ready> ");
2051 case tok_eof: return;
2052 case ';': getNextToken(); break; // ignore top level semicolons.
2053 case tok_def: HandleDefinition(); break;
2054 case tok_extern: HandleExtern(); break;
2055 default: HandleTopLevelExpression(); break;
2062 //===----------------------------------------------------------------------===//
2063 // "Library" functions that can be "extern'd" from user code.
2064 //===----------------------------------------------------------------------===//
2066 /// putchard - putchar that takes a double and returns 0.
2068 double putchard(double X) {
2073 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2075 double printd(double X) {
2080 //===----------------------------------------------------------------------===//
2081 // Main driver code.
2082 //===----------------------------------------------------------------------===//
2085 // Install standard binary operators.
2086 // 1 is lowest precedence.
2087 BinopPrecedence['='] = 2;
2088 BinopPrecedence['<'] = 10;
2089 BinopPrecedence['+'] = 20;
2090 BinopPrecedence['-'] = 20;
2091 BinopPrecedence['*'] = 40; // highest.
2093 // Prime the first token.
2094 fprintf(stderr, "ready> ");
2097 // Make the module, which holds all the code.
2098 TheModule = new Module("my cool jit");
2101 TheExecutionEngine = ExecutionEngine::create(TheModule);
2104 ExistingModuleProvider OurModuleProvider(TheModule);
2105 FunctionPassManager OurFPM(&OurModuleProvider);
2107 // Set up the optimizer pipeline. Start with registering info about how the
2108 // target lays out data structures.
2109 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
2110 // Promote allocas to registers.
2111 OurFPM.add(createPromoteMemoryToRegisterPass());
2112 // Do simple "peephole" optimizations and bit-twiddling optzns.
2113 OurFPM.add(createInstructionCombiningPass());
2114 // Reassociate expressions.
2115 OurFPM.add(createReassociatePass());
2116 // Eliminate Common SubExpressions.
2117 OurFPM.add(createGVNPass());
2118 // Simplify the control flow graph (deleting unreachable blocks, etc).
2119 OurFPM.add(createCFGSimplificationPass());
2121 // Set the global so the code gen can use this.
2122 TheFPM = &OurFPM;
2124 // Run the main "interpreter loop" now.
2128 } // Free module provider and pass manager.
2131 // Print out all of the generated code.
2132 TheModule->dump();
2140 <!-- *********************************************************************** -->
2143 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2144 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2145 <a href="http://validator.w3.org/check/referer"><img
2146 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2148 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2149 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2150 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $