1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
6 <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta name="author" content="Chris Lattner">
10 <link rel="stylesheet" href="../llvm.css" type="text/css">
15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
17 <div class="doc_author">
18 <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
21 <!-- *********************************************************************** -->
22 <div class="doc_section"><a name="intro">Part 7 Introduction</a></div>
23 <!-- *********************************************************************** -->
25 <div class="doc_text">
27 <p>Welcome to Part 7 of the "<a href="index.html">Implementing a language with
28 LLVM</a>" tutorial. In parts 1 through 6, we've built a very respectable,
30 href="http://en.wikipedia.org/wiki/Functional_programming">functional
31 programming language</a>. In our journey, we learned some parsing techniques,
32 how to build and represent an AST, how to build LLVM IR, and how to optimize
33 the resultant code and JIT compile it.</p>
35 <p>While Kaleidoscope is interesting as a functional language, this makes it
36 "too easy" to generate LLVM IR for it. In particular, a functional language
37 makes it very easy to build LLVM IR directly in <a
38 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
39 Since LLVM requires that the input code be in SSA form, this is a very nice
40 property and it is often unclear to newcomers how to generate code for an
41 imperative language with mutable variables.</p>
43 <p>The short (and happy) summary of this chapter is that there is no need for
44 your front-end to build SSA form: LLVM provides highly tuned and well tested
45 support for this, though the way it works is a bit unexpected for some.</p>
49 <!-- *********************************************************************** -->
50 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
51 <!-- *********************************************************************** -->
53 <div class="doc_text">
56 To understand why mutable variables cause complexities in SSA construction,
57 consider this extremely simple C example:
60 <div class="doc_code">
63 int test(_Bool Condition) {
74 <p>In this case, we have the variable "X", whose value depends on the path
75 executed in the program. Because there are two different possible values for X
76 before the return instruction, a PHI node is inserted to merge the two values.
77 The LLVM IR that we want for this example looks like this:</p>
79 <div class="doc_code">
81 @G = weak global i32 0 ; type of @G is i32*
82 @H = weak global i32 0 ; type of @H is i32*
84 define i32 @test(i1 %Condition) {
86 br i1 %Condition, label %cond_true, label %cond_false
97 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
103 <p>In this example, the loads from the G and H global variables are explicit in
104 the LLVM IR, and they live in the then/else branches of the if statement
105 (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
106 in the cond_next block selects the right value to use based on where control
107 flow is coming from: if control flow comes from the cond_false block, X.2 gets
108 the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
109 the value of X.0. The intent of this chapter is not to explain the details of
110 SSA form. For more information, see one of the many <a
111 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
114 <p>The question for this article is "who places phi nodes when lowering
115 assignments to mutable variables?". The issue here is that LLVM
116 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
117 However, SSA construction requires non-trivial algorithms and data structures,
118 so it is inconvenient and wasteful for every front-end to have to reproduce this
123 <!-- *********************************************************************** -->
124 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
125 <!-- *********************************************************************** -->
127 <div class="doc_text">
129 <p>The 'trick' here is that while LLVM does require all register values to be
130 in SSA form, it does not require (or permit) memory objects to be in SSA form.
131 In the example above, note that the loads from G and H are direct accesses to
132 G and H: they are not renamed or versioned. This differs from some other
133 compiler systems, which do try to version memory objects. In LLVM, instead of
134 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
135 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
139 With this in mind, the high-level idea is that we want to make a stack variable
140 (which lives in memory, because it is on the stack) for each mutable object in
141 a function. To take advantage of this trick, we need to talk about how LLVM
142 represents stack variables.
145 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
146 it is carefully designed to not have (or need) an "address-of" operator. Notice
147 how the type of the @G/@H global variables is actually "i32*" even though the
148 variable is defined as "i32". What this means is that @G defines <em>space</em>
149 for an i32 in the global data area, but its <em>name</em> actually refers to the
150 address for that space. Stack variables work the same way, but instead of being
151 declared with global variable definitions, they are declared with the
152 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
154 <div class="doc_code">
156 define i32 @test(i1 %Condition) {
158 %X = alloca i32 ; type of %X is i32*.
160 %tmp = load i32* %X ; load the stack value %X from the stack.
161 %tmp2 = add i32 %tmp, 1 ; increment it
162 store i32 %tmp2, i32* %X ; store it back
167 <p>This code shows an example of how you can declare and manipulate a stack
168 variable in the LLVM IR. Stack memory allocated with the alloca instruction is
169 fully general: you can pass the address of the stack slot to functions, you can
170 store it in other variables, etc. In our example above, we could rewrite the
171 example to use the alloca technique to avoid using a PHI node:</p>
173 <div class="doc_code">
175 @G = weak global i32 0 ; type of @G is i32*
176 @H = weak global i32 0 ; type of @H is i32*
178 define i32 @test(i1 %Condition) {
180 %X = alloca i32 ; type of %X is i32*.
181 br i1 %Condition, label %cond_true, label %cond_false
185 store i32 %X.0, i32* %X ; Update X
190 store i32 %X.1, i32* %X ; Update X
194 %X.2 = load i32* %X ; Read X
200 <p>With this, we have discovered a way to handle arbitrary mutable variables
201 without the need to create Phi nodes at all:</p>
204 <li>Each mutable variable becomes a stack allocation.</li>
205 <li>Each read of the variable becomes a load from the stack.</li>
206 <li>Each update of the variable becomes a store to the stack.</li>
207 <li>Taking the address of a variable just uses the stack address directly.</li>
210 <p>While this solution has solved our immediate problem, it introduced another
211 one: we have now apparently introduced a lot of stack traffic for very simple
212 and common operations, a major performance problem. Fortunately for us, the
213 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
214 this case, promoting allocas like this into SSA registers, inserting Phi nodes
215 as appropriate. If you run this example through the pass, for example, you'll
218 <div class="doc_code">
220 $ <b>llvm-as < example.ll | opt -mem2reg | llvm-dis</b>
221 @G = weak global i32 0
222 @H = weak global i32 0
224 define i32 @test(i1 %Condition) {
226 br i1 %Condition, label %cond_true, label %cond_false
237 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
243 <p>The mem2reg pass implements the standard "iterated dominator frontier"
244 algorithm for constructing SSA form and has a number of optimizations that speed
245 up very common degenerate cases. mem2reg really is the answer for dealing with
246 mutable variables, and we highly recommend that you depend on it. Note that
247 mem2reg only works on variables in certain circumstances:</p>
250 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
251 promotes them. It does not apply to global variables or heap allocations.</li>
253 <li>mem2reg only looks for alloca instructions in the entry block of the
254 function. Being in the entry block guarantees that the alloca is only executed
255 once, which makes analysis simpler.</li>
257 <li>mem2reg only promotes allocas whose uses are direct loads and stores. If
258 the address of the stack object is passed to a function, or if any funny pointer
259 arithmetic is involved, the alloca will not be promoted.</li>
261 <li>mem2reg only works on allocas of scalar values, and only if the array size
262 of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of
263 promoting structs or arrays to registers. Note that the "scalarrepl" pass is
264 more powerful and can promote structs, "unions", and arrays in many cases.</li>
269 All of these properties are easy to satisfy for most imperative languages, and
270 we'll illustrate this below with Kaleidoscope. The final question you may be
271 asking is: should I bother with this nonsense for my front-end? Wouldn't it be
272 better if I just did SSA construction directly, avoiding use of the mem2reg
273 optimization pass? In short, we strongly recommend that use you this technique
274 for building SSA form, unless there is an extremely good reason not to. Using
275 this technique is:</p>
278 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
279 mutable variables. As such, the most common clients of LLVM are using this to
280 handle a bulk of their variables. You can be sure that bugs are found fast and
283 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
284 common cases as well as fully general. For example, it has fast-paths for
285 variables that are only used in a single block, variables that only have one
286 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
289 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
290 Debug information in LLVM</a> relies on having the address of the variable
291 exposed to attach debug info to it. This technique dovetails very naturally
292 with this style of debug info.</li>
295 <p>If nothing else, this makes it much easier to get your front-end up and
296 running, and is very simple to implement. Lets extend Kaleidoscope with mutable
302 <!-- *********************************************************************** -->
303 <div class="doc_section"><a name="kalvars">Mutable Variables in
304 Kaleidoscope</a></div>
305 <!-- *********************************************************************** -->
307 <div class="doc_text">
309 <p>Now that we know the sort of problem we want to tackle, lets see what this
310 looks like in the context of our little Kaleidoscope language. We're going to
311 add two features:</p>
314 <li>The ability to mutate variables with the '=' operator.</li>
315 <li>The ability to define new variables.</li>
318 <p>While the first item is really what this is about, we only have variables
319 for incoming arguments and for induction variables, and redefining them only
320 goes so far :). Also, the ability to define new variables is a
321 useful thing regardless of whether you will be mutating them. Here's a
322 motivating example that shows how we could use these:</p>
324 <div class="doc_code">
326 # Define ':' for sequencing: as a low-precedence operator that ignores operands
327 # and just returns the RHS.
328 def binary : 1 (x y) y;
330 # Recursive fib, we could do this before.
339 <b>var a = 1, b = 1, c in</b>
340 (for i = 3, i &;t; x in
352 In order to mutate variables, we have to change our existing variables to use
353 the "alloca trick". Once we have that, we'll add our new operator, then extend
354 Kaleidoscope to support new variable definitions.
359 <!-- *********************************************************************** -->
360 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
362 <!-- *********************************************************************** -->
364 <div class="doc_text">
367 The symbol table in Kaleidoscope is managed at code generation time by the
368 '<tt>NamedValues</tt>' map. This map currently keeps track of the LLVM "Value*"
369 that holds the double value for the named variable. In order to support
370 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
371 the <em>memory location</em> of the variable in question. Note that this
372 change is a refactoring: it changes the structure of the code, but does not
373 (by itself) change the behavior of the compiler. All of these changes are
374 isolated in the Kaleidoscope code generator.</p>
377 At this point in Kaleidoscope's development, it only supports variables for two
378 things: incoming arguments to functions and the induction variable of 'for'
379 loops. For consistency, we'll allow mutation of these variables in addition to
380 other user-defined variables. This means that these will both need memory
384 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
385 map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
386 will tell use what parts of the code we need to update:</p>
388 <div class="doc_code">
390 static std::map<std::string, AllocaInst*> NamedValues;
394 <p>Also, since we will need to create these alloca's, we'll use a helper
395 function that ensures that the allocas are created in the entry block of the
398 <div class="doc_code">
400 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
401 /// the function. This is used for mutable variables etc.
402 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
403 const std::string &VarName) {
404 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
405 TheFunction->getEntryBlock().begin());
406 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
411 <p>This funny looking code creates an LLVMBuilder object that is pointing at
412 the first instruction (.begin()) of the entry block. It then creates an alloca
413 with the expected name and returns it. Because all values in Kaleidoscope are
414 doubles, there is no need to pass in a type to use.</p>
416 <p>With this in place, the first functionality change we want to make is to
417 variable references. In our new scheme, variables live on the stack, so code
418 generating a reference to them actually needs to produce a load from the stack
421 <div class="doc_code">
423 Value *VariableExprAST::Codegen() {
424 // Look this variable up in the function.
425 Value *V = NamedValues[Name];
426 if (V == 0) return ErrorV("Unknown variable name");
429 return Builder.CreateLoad(V, Name.c_str());
434 <p>As you can see, this is pretty straight-forward. Next we need to update the
435 things that define the variables to set up the alloca. We'll start with
436 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
437 the unabridged code):</p>
439 <div class="doc_code">
441 Function *TheFunction = Builder.GetInsertBlock()->getParent();
443 <b>// Create an alloca for the variable in the entry block.
444 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
446 // Emit the start code first, without 'variable' in scope.
447 Value *StartVal = Start->Codegen();
448 if (StartVal == 0) return 0;
450 <b>// Store the value into the alloca.
451 Builder.CreateStore(StartVal, Alloca);</b>
454 // Compute the end condition.
455 Value *EndCond = End->Codegen();
456 if (EndCond == 0) return EndCond;
458 <b>// Reload, increment, and restore the alloca. This handles the case where
459 // the body of the loop mutates the variable.
460 Value *CurVar = Builder.CreateLoad(Alloca);
461 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
462 Builder.CreateStore(NextVar, Alloca);</b>
467 <p>This code is virtually identical to the code <a
468 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>. The
469 big difference is that we no longer have to construct a PHI node, and we use
470 load/store to access the variable as needed.</p>
472 <p>To support mutable argument variables, we need to also make allocas for them.
473 The code for this is also pretty simple:</p>
475 <div class="doc_code">
477 /// CreateArgumentAllocas - Create an alloca for each argument and register the
478 /// argument in the symbol table so that references to it will succeed.
479 void PrototypeAST::CreateArgumentAllocas(Function *F) {
480 Function::arg_iterator AI = F->arg_begin();
481 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
482 // Create an alloca for this variable.
483 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
485 // Store the initial value into the alloca.
486 Builder.CreateStore(AI, Alloca);
488 // Add arguments to variable symbol table.
489 NamedValues[Args[Idx]] = Alloca;
495 <p>For each argument, we make an alloca, store the input value to the function
496 into the alloca, and register the alloca as the memory location for the
497 argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
498 it sets up the entry block for the function.</p>
500 <p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
501 good codegen once again:</p>
503 <div class="doc_code">
505 // Set up the optimizer pipeline. Start with registering info about how the
506 // target lays out data structures.
507 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
508 <b>// Promote allocas to registers.
509 OurFPM.add(createPromoteMemoryToRegisterPass());</b>
510 // Do simple "peephole" optimizations and bit-twiddling optzns.
511 OurFPM.add(createInstructionCombiningPass());
512 // Reassociate expressions.
513 OurFPM.add(createReassociatePass());
517 <p>It is interesting to see what the code looks like before and after the
518 mem2reg optimization runs. For example, this is the before/after code for our
519 recursive fib. Before the optimization:</p>
521 <div class="doc_code">
523 define double @fib(double %x) {
525 <b>%x1 = alloca double
526 store double %x, double* %x1
527 %x2 = load double* %x1</b>
528 %multmp = fcmp ult double %x2, 3.000000e+00
529 %booltmp = uitofp i1 %multmp to double
530 %ifcond = fcmp one double %booltmp, 0.000000e+00
531 br i1 %ifcond, label %then, label %else
533 then: ; preds = %entry
536 else: ; preds = %entry
537 <b>%x3 = load double* %x1</b>
538 %subtmp = sub double %x3, 1.000000e+00
539 %calltmp = call double @fib( double %subtmp )
540 <b>%x4 = load double* %x1</b>
541 %subtmp5 = sub double %x4, 2.000000e+00
542 %calltmp6 = call double @fib( double %subtmp5 )
543 %addtmp = add double %calltmp, %calltmp6
546 ifcont: ; preds = %else, %then
547 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
553 <p>Here there is only one variable (x, the input argument) but you can still
554 see the extremely simple-minded code generation strategy we are using. In the
555 entry block, an alloca is created, and the initial input value is stored into
556 it. Each reference to the variable does a reload from the stack. Also, note
557 that we didn't modify the if/then/else expression, so it still inserts a PHI
558 node. While we could make an alloca for it, it is actually easier to create a
559 PHI node for it, so we still just make the PHI.</p>
561 <p>Here is the code after the mem2reg pass runs:</p>
563 <div class="doc_code">
565 define double @fib(double %x) {
567 %multmp = fcmp ult double <b>%x</b>, 3.000000e+00
568 %booltmp = uitofp i1 %multmp to double
569 %ifcond = fcmp one double %booltmp, 0.000000e+00
570 br i1 %ifcond, label %then, label %else
576 %subtmp = sub double <b>%x</b>, 1.000000e+00
577 %calltmp = call double @fib( double %subtmp )
578 %subtmp5 = sub double <b>%x</b>, 2.000000e+00
579 %calltmp6 = call double @fib( double %subtmp5 )
580 %addtmp = add double %calltmp, %calltmp6
583 ifcont: ; preds = %else, %then
584 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
590 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
591 variable. The point of showing this is to calm your tension about inserting
592 such blatent inefficiencies :).</p>
594 <p>After the rest of the optimizers run, we get:</p>
596 <div class="doc_code">
598 define double @fib(double %x) {
600 %multmp = fcmp ult double %x, 3.000000e+00
601 %booltmp = uitofp i1 %multmp to double
602 %ifcond = fcmp ueq double %booltmp, 0.000000e+00
603 br i1 %ifcond, label %else, label %ifcont
606 %subtmp = sub double %x, 1.000000e+00
607 %calltmp = call double @fib( double %subtmp )
608 %subtmp5 = sub double %x, 2.000000e+00
609 %calltmp6 = call double @fib( double %subtmp5 )
610 %addtmp = add double %calltmp, %calltmp6
614 ret double 1.000000e+00
619 <p>Here we see that the simplifycfg pass decided to clone the return instruction
620 into the end of the 'else' block. This allowed it to eliminate some branches
621 and the PHI node.</p>
623 <p>Now that all symbol table references are updated to use stack variables,
624 we'll add the assignment operator.</p>
628 <!-- *********************************************************************** -->
629 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
630 <!-- *********************************************************************** -->
632 <div class="doc_text">
634 <p>With our current framework, adding a new assignment operator is really
635 simple. We will parse it just like any other binary operator, but handle it
636 internally (instead of allowing the user to define it). The first step is to
637 set a precedence:</p>
639 <div class="doc_code">
642 // Install standard binary operators.
643 // 1 is lowest precedence.
644 <b>BinopPrecedence['='] = 2;</b>
645 BinopPrecedence['<'] = 10;
646 BinopPrecedence['+'] = 20;
647 BinopPrecedence['-'] = 20;
651 <p>Now that the parser knows the precedence of the binary operator, it takes
652 care of all the parsing and AST generation. We just need to implement codegen
653 for the assignment operator. This looks like:</p>
655 <div class="doc_code">
657 Value *BinaryExprAST::Codegen() {
658 // Special case '=' because we don't want to emit the LHS as an expression.
660 // Assignment requires the LHS to be an identifier.
661 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
663 return ErrorV("destination of '=' must be a variable");
667 <p>Unlike the rest of the binary operators, our assignment operator doesn't
668 follow the "emit LHS, emit RHS, do computation" model. As such, it is handled
669 as a special case before the other binary operators are handled. The other
670 strange thing about it is that it requires the LHS to be a variable directly.
673 <div class="doc_code">
676 Value *Val = RHS->Codegen();
677 if (Val == 0) return 0;
680 Value *Variable = NamedValues[LHSE->getName()];
681 if (Variable == 0) return ErrorV("Unknown variable name");
683 Builder.CreateStore(Val, Variable);
690 <p>Once it has the variable, codegen'ing the assignment is straight-forward:
691 we emit the RHS of the assignment, create a store, and return the computed
692 value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>
694 <p>Now that we have an assignment operator, we can mutate loop variables and
695 arguments. For example, we can now run code like this:</p>
697 <div class="doc_code">
699 # Function to print a double.
702 # Define ':' for sequencing: as a low-precedence operator that ignores operands
703 # and just returns the RHS.
704 def binary : 1 (x y) y;
715 <p>When run, this example prints "123" and then "4", showing that we did
716 actually mutate the value! Okay, we have now officially implemented our goal:
717 getting this to work requires SSA construction in the general case. However,
718 to be really useful, we want the ability to define our own local variables, lets
724 <!-- *********************************************************************** -->
725 <div class="doc_section"><a name="localvars">User-defined Local
727 <!-- *********************************************************************** -->
729 <div class="doc_text">
731 <p>Adding var/in is just like any other other extensions we made to
732 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
733 The first step for adding our new 'var/in' construct is to extend the lexer.
734 As before, this is pretty trivial, the code looks like this:</p>
736 <div class="doc_code">
745 static int gettok() {
747 if (IdentifierStr == "in") return tok_in;
748 if (IdentifierStr == "binary") return tok_binary;
749 if (IdentifierStr == "unary") return tok_unary;
750 <b>if (IdentifierStr == "var") return tok_var;</b>
751 return tok_identifier;
756 <p>The next step is to define the AST node that we will construct. For var/in,
757 it will look like this:</p>
759 <div class="doc_code">
761 /// VarExprAST - Expression class for var/in
762 class VarExprAST : public ExprAST {
763 std::vector<std::pair<std::string, ExprAST*> > VarNames;
766 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
768 : VarNames(varnames), Body(body) {}
770 virtual Value *Codegen();
775 <p>var/in allows a list of names to be defined all at once, and each name can
776 optionally have an initializer value. As such, we capture this information in
777 the VarNames vector. Also, var/in has a body, this body is allowed to access
778 the variables defined by the let/in.</p>
780 <p>With this ready, we can define the parser pieces. First thing we do is add
781 it as a primary expression:</p>
783 <div class="doc_code">
786 /// ::= identifierexpr
791 <b>/// ::= varexpr</b>
792 static ExprAST *ParsePrimary() {
794 default: return Error("unknown token when expecting an expression");
795 case tok_identifier: return ParseIdentifierExpr();
796 case tok_number: return ParseNumberExpr();
797 case '(': return ParseParenExpr();
798 case tok_if: return ParseIfExpr();
799 case tok_for: return ParseForExpr();
800 <b>case tok_var: return ParseVarExpr();</b>
806 <p>Next we define ParseVarExpr:</p>
808 <div class="doc_code">
810 /// varexpr ::= 'var' identifer ('=' expression)?
811 // (',' identifer ('=' expression)?)* 'in' expression
812 static ExprAST *ParseVarExpr() {
813 getNextToken(); // eat the var.
815 std::vector<std::pair<std::string, ExprAST*> > VarNames;
817 // At least one variable name is required.
818 if (CurTok != tok_identifier)
819 return Error("expected identifier after var");
823 <p>The first part of this code parses the list of identifier/expr pairs into the
824 local <tt>VarNames</tt> vector.
826 <div class="doc_code">
829 std::string Name = IdentifierStr;
830 getNextToken(); // eat identifer.
832 // Read the optional initializer.
835 getNextToken(); // eat the '='.
837 Init = ParseExpression();
838 if (Init == 0) return 0;
841 VarNames.push_back(std::make_pair(Name, Init));
843 // End of var list, exit loop.
844 if (CurTok != ',') break;
845 getNextToken(); // eat the ','.
847 if (CurTok != tok_identifier)
848 return Error("expected identifier list after var");
853 <p>Once all the variables are parsed, we then parse the body and create the
856 <div class="doc_code">
858 // At this point, we have to have 'in'.
859 if (CurTok != tok_in)
860 return Error("expected 'in' keyword after 'var'");
861 getNextToken(); // eat 'in'.
863 ExprAST *Body = ParseExpression();
864 if (Body == 0) return 0;
866 return new VarExprAST(VarNames, Body);
871 <p>Now that we can parse and represent the code, we need to support emission of
872 LLVM IR for it. This code starts out with:</p>
874 <div class="doc_code">
876 Value *VarExprAST::Codegen() {
877 std::vector<AllocaInst *> OldBindings;
879 Function *TheFunction = Builder.GetInsertBlock()->getParent();
881 // Register all variables and emit their initializer.
882 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
883 const std::string &VarName = VarNames[i].first;
884 ExprAST *Init = VarNames[i].second;
888 <p>Basically it loops over all the variables, installing them one at a time.
889 For each variable we put into the symbol table, we remember the previous value
890 that we replace in OldBindings.</p>
892 <div class="doc_code">
894 // Emit the initializer before adding the variable to scope, this prevents
895 // the initializer from referencing the variable itself, and permits stuff
898 // var a = a in ... # refers to outer 'a'.
901 InitVal = Init->Codegen();
902 if (InitVal == 0) return 0;
903 } else { // If not specified, use 0.0.
904 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
907 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
908 Builder.CreateStore(InitVal, Alloca);
910 // Remember the old variable binding so that we can restore the binding when
912 OldBindings.push_back(NamedValues[VarName]);
914 // Remember this binding.
915 NamedValues[VarName] = Alloca;
920 <p>There are more comments here than code. The basic idea is that we emit the
921 initializer, create the alloca, then update the symbol table to point to it.
922 Once all the variables are installed in the symbol table, we evaluate the body
923 of the var/in expression:</p>
925 <div class="doc_code">
927 // Codegen the body, now that all vars are in scope.
928 Value *BodyVal = Body->Codegen();
929 if (BodyVal == 0) return 0;
933 <p>Finally, before returning, we restore the previous variable bindings:</p>
935 <div class="doc_code">
937 // Pop all our variables from scope.
938 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
939 NamedValues[VarNames[i].first] = OldBindings[i];
941 // Return the body computation.
947 <p>The end result of all of this is that we get properly scoped variable
948 definitions, and we even (trivially) allow mutation of them :).</p>
950 <p>With this, we completed what we set out to do. Our nice iterative fib
951 example from the intro compiles and runs just fine. The mem2reg pass optimizes
952 all of our stack variables into SSA registers, inserting PHI nodes where needed,
953 and our front-end remains simple: no iterated dominator frontier computation
954 anywhere in sight.</p>
958 <!-- *********************************************************************** -->
959 <div class="doc_section"><a name="code">Full Code Listing</a></div>
960 <!-- *********************************************************************** -->
962 <div class="doc_text">
965 Here is the complete code listing for our running example, enhanced with mutable
966 variables and var/in support. To build this example, use:
969 <div class="doc_code">
972 g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
978 <p>Here is the code:</p>
980 <div class="doc_code">
982 #include "llvm/DerivedTypes.h"
983 #include "llvm/ExecutionEngine/ExecutionEngine.h"
984 #include "llvm/Module.h"
985 #include "llvm/ModuleProvider.h"
986 #include "llvm/PassManager.h"
987 #include "llvm/Analysis/Verifier.h"
988 #include "llvm/Target/TargetData.h"
989 #include "llvm/Transforms/Scalar.h"
990 #include "llvm/Support/LLVMBuilder.h"
991 #include <cstdio>
992 #include <string>
994 #include <vector>
995 using namespace llvm;
997 //===----------------------------------------------------------------------===//
999 //===----------------------------------------------------------------------===//
1001 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1002 // of these for known things.
1007 tok_def = -2, tok_extern = -3,
1010 tok_identifier = -4, tok_number = -5,
1013 tok_if = -6, tok_then = -7, tok_else = -8,
1014 tok_for = -9, tok_in = -10,
1017 tok_binary = -11, tok_unary = -12,
1023 static std::string IdentifierStr; // Filled in if tok_identifier
1024 static double NumVal; // Filled in if tok_number
1026 /// gettok - Return the next token from standard input.
1027 static int gettok() {
1028 static int LastChar = ' ';
1030 // Skip any whitespace.
1031 while (isspace(LastChar))
1032 LastChar = getchar();
1034 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1035 IdentifierStr = LastChar;
1036 while (isalnum((LastChar = getchar())))
1037 IdentifierStr += LastChar;
1039 if (IdentifierStr == "def") return tok_def;
1040 if (IdentifierStr == "extern") return tok_extern;
1041 if (IdentifierStr == "if") return tok_if;
1042 if (IdentifierStr == "then") return tok_then;
1043 if (IdentifierStr == "else") return tok_else;
1044 if (IdentifierStr == "for") return tok_for;
1045 if (IdentifierStr == "in") return tok_in;
1046 if (IdentifierStr == "binary") return tok_binary;
1047 if (IdentifierStr == "unary") return tok_unary;
1048 if (IdentifierStr == "var") return tok_var;
1049 return tok_identifier;
1052 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
1056 LastChar = getchar();
1057 } while (isdigit(LastChar) || LastChar == '.');
1059 NumVal = strtod(NumStr.c_str(), 0);
1063 if (LastChar == '#') {
1064 // Comment until end of line.
1065 do LastChar = getchar();
1066 while (LastChar != EOF && LastChar != '\n' & LastChar != '\r');
1068 if (LastChar != EOF)
1072 // Check for end of file. Don't eat the EOF.
1073 if (LastChar == EOF)
1076 // Otherwise, just return the character as its ascii value.
1077 int ThisChar = LastChar;
1078 LastChar = getchar();
1082 //===----------------------------------------------------------------------===//
1083 // Abstract Syntax Tree (aka Parse Tree)
1084 //===----------------------------------------------------------------------===//
1086 /// ExprAST - Base class for all expression nodes.
1089 virtual ~ExprAST() {}
1090 virtual Value *Codegen() = 0;
1093 /// NumberExprAST - Expression class for numeric literals like "1.0".
1094 class NumberExprAST : public ExprAST {
1097 NumberExprAST(double val) : Val(val) {}
1098 virtual Value *Codegen();
1101 /// VariableExprAST - Expression class for referencing a variable, like "a".
1102 class VariableExprAST : public ExprAST {
1105 VariableExprAST(const std::string &name) : Name(name) {}
1106 const std::string &getName() const { return Name; }
1107 virtual Value *Codegen();
1110 /// UnaryExprAST - Expression class for a unary operator.
1111 class UnaryExprAST : public ExprAST {
1115 UnaryExprAST(char opcode, ExprAST *operand)
1116 : Opcode(opcode), Operand(operand) {}
1117 virtual Value *Codegen();
1120 /// BinaryExprAST - Expression class for a binary operator.
1121 class BinaryExprAST : public ExprAST {
1125 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1126 : Op(op), LHS(lhs), RHS(rhs) {}
1127 virtual Value *Codegen();
1130 /// CallExprAST - Expression class for function calls.
1131 class CallExprAST : public ExprAST {
1133 std::vector<ExprAST*> Args;
1135 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
1136 : Callee(callee), Args(args) {}
1137 virtual Value *Codegen();
1140 /// IfExprAST - Expression class for if/then/else.
1141 class IfExprAST : public ExprAST {
1142 ExprAST *Cond, *Then, *Else;
1144 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1145 : Cond(cond), Then(then), Else(_else) {}
1146 virtual Value *Codegen();
1149 /// ForExprAST - Expression class for for/in.
1150 class ForExprAST : public ExprAST {
1151 std::string VarName;
1152 ExprAST *Start, *End, *Step, *Body;
1154 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end,
1155 ExprAST *step, ExprAST *body)
1156 : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1157 virtual Value *Codegen();
1160 /// VarExprAST - Expression class for var/in
1161 class VarExprAST : public ExprAST {
1162 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1165 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames,
1167 : VarNames(varnames), Body(body) {}
1169 virtual Value *Codegen();
1172 /// PrototypeAST - This class represents the "prototype" for a function,
1173 /// which captures its argument names as well as if it is an operator.
1174 class PrototypeAST {
1176 std::vector<std::string> Args;
1178 unsigned Precedence; // Precedence if a binary op.
1180 PrototypeAST(const std::string &name, const std::vector<std::string> &args,
1181 bool isoperator = false, unsigned prec = 0)
1182 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1184 bool isUnaryOp() const { return isOperator && Args.size() == 1; }
1185 bool isBinaryOp() const { return isOperator && Args.size() == 2; }
1187 char getOperatorName() const {
1188 assert(isUnaryOp() || isBinaryOp());
1189 return Name[Name.size()-1];
1192 unsigned getBinaryPrecedence() const { return Precedence; }
1194 Function *Codegen();
1196 void CreateArgumentAllocas(Function *F);
1199 /// FunctionAST - This class represents a function definition itself.
1201 PrototypeAST *Proto;
1204 FunctionAST(PrototypeAST *proto, ExprAST *body)
1205 : Proto(proto), Body(body) {}
1207 Function *Codegen();
1210 //===----------------------------------------------------------------------===//
1212 //===----------------------------------------------------------------------===//
1214 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
1215 /// token the parser it looking at. getNextToken reads another token from the
1216 /// lexer and updates CurTok with its results.
1218 static int getNextToken() {
1219 return CurTok = gettok();
1222 /// BinopPrecedence - This holds the precedence for each binary operator that is
1224 static std::map<char, int> BinopPrecedence;
1226 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1227 static int GetTokPrecedence() {
1228 if (!isascii(CurTok))
1231 // Make sure it's a declared binop.
1232 int TokPrec = BinopPrecedence[CurTok];
1233 if (TokPrec <= 0) return -1;
1237 /// Error* - These are little helper functions for error handling.
1238 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1239 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1240 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1242 static ExprAST *ParseExpression();
1246 /// ::= identifer '(' expression* ')'
1247 static ExprAST *ParseIdentifierExpr() {
1248 std::string IdName = IdentifierStr;
1250 getNextToken(); // eat identifer.
1252 if (CurTok != '(') // Simple variable ref.
1253 return new VariableExprAST(IdName);
1256 getNextToken(); // eat (
1257 std::vector<ExprAST*> Args;
1258 if (CurTok != ')') {
1260 ExprAST *Arg = ParseExpression();
1262 Args.push_back(Arg);
1264 if (CurTok == ')') break;
1267 return Error("Expected ')'");
1275 return new CallExprAST(IdName, Args);
1278 /// numberexpr ::= number
1279 static ExprAST *ParseNumberExpr() {
1280 ExprAST *Result = new NumberExprAST(NumVal);
1281 getNextToken(); // consume the number
1285 /// parenexpr ::= '(' expression ')'
1286 static ExprAST *ParseParenExpr() {
1287 getNextToken(); // eat (.
1288 ExprAST *V = ParseExpression();
1292 return Error("expected ')'");
1293 getNextToken(); // eat ).
1297 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1298 static ExprAST *ParseIfExpr() {
1299 getNextToken(); // eat the if.
1302 ExprAST *Cond = ParseExpression();
1303 if (!Cond) return 0;
1305 if (CurTok != tok_then)
1306 return Error("expected then");
1307 getNextToken(); // eat the then
1309 ExprAST *Then = ParseExpression();
1310 if (Then == 0) return 0;
1312 if (CurTok != tok_else)
1313 return Error("expected else");
1317 ExprAST *Else = ParseExpression();
1318 if (!Else) return 0;
1320 return new IfExprAST(Cond, Then, Else);
1323 /// forexpr ::= 'for' identifer '=' expr ',' expr (',' expr)? 'in' expression
1324 static ExprAST *ParseForExpr() {
1325 getNextToken(); // eat the for.
1327 if (CurTok != tok_identifier)
1328 return Error("expected identifier after for");
1330 std::string IdName = IdentifierStr;
1331 getNextToken(); // eat identifer.
1334 return Error("expected '=' after for");
1335 getNextToken(); // eat '='.
1338 ExprAST *Start = ParseExpression();
1339 if (Start == 0) return 0;
1341 return Error("expected ',' after for start value");
1344 ExprAST *End = ParseExpression();
1345 if (End == 0) return 0;
1347 // The step value is optional.
1349 if (CurTok == ',') {
1351 Step = ParseExpression();
1352 if (Step == 0) return 0;
1355 if (CurTok != tok_in)
1356 return Error("expected 'in' after for");
1357 getNextToken(); // eat 'in'.
1359 ExprAST *Body = ParseExpression();
1360 if (Body == 0) return 0;
1362 return new ForExprAST(IdName, Start, End, Step, Body);
1365 /// varexpr ::= 'var' identifer ('=' expression)?
1366 // (',' identifer ('=' expression)?)* 'in' expression
1367 static ExprAST *ParseVarExpr() {
1368 getNextToken(); // eat the var.
1370 std::vector<std::pair<std::string, ExprAST*> > VarNames;
1372 // At least one variable name is required.
1373 if (CurTok != tok_identifier)
1374 return Error("expected identifier after var");
1377 std::string Name = IdentifierStr;
1378 getNextToken(); // eat identifer.
1380 // Read the optional initializer.
1382 if (CurTok == '=') {
1383 getNextToken(); // eat the '='.
1385 Init = ParseExpression();
1386 if (Init == 0) return 0;
1389 VarNames.push_back(std::make_pair(Name, Init));
1391 // End of var list, exit loop.
1392 if (CurTok != ',') break;
1393 getNextToken(); // eat the ','.
1395 if (CurTok != tok_identifier)
1396 return Error("expected identifier list after var");
1399 // At this point, we have to have 'in'.
1400 if (CurTok != tok_in)
1401 return Error("expected 'in' keyword after 'var'");
1402 getNextToken(); // eat 'in'.
1404 ExprAST *Body = ParseExpression();
1405 if (Body == 0) return 0;
1407 return new VarExprAST(VarNames, Body);
1412 /// ::= identifierexpr
1418 static ExprAST *ParsePrimary() {
1420 default: return Error("unknown token when expecting an expression");
1421 case tok_identifier: return ParseIdentifierExpr();
1422 case tok_number: return ParseNumberExpr();
1423 case '(': return ParseParenExpr();
1424 case tok_if: return ParseIfExpr();
1425 case tok_for: return ParseForExpr();
1426 case tok_var: return ParseVarExpr();
1433 static ExprAST *ParseUnary() {
1434 // If the current token is not an operator, it must be a primary expr.
1435 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1436 return ParsePrimary();
1438 // If this is a unary operator, read it.
1441 if (ExprAST *Operand = ParseUnary())
1442 return new UnaryExprAST(Opc, Operand);
1447 /// ::= ('+' unary)*
1448 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1449 // If this is a binop, find its precedence.
1451 int TokPrec = GetTokPrecedence();
1453 // If this is a binop that binds at least as tightly as the current binop,
1454 // consume it, otherwise we are done.
1455 if (TokPrec < ExprPrec)
1458 // Okay, we know this is a binop.
1460 getNextToken(); // eat binop
1462 // Parse the unary expression after the binary operator.
1463 ExprAST *RHS = ParseUnary();
1466 // If BinOp binds less tightly with RHS than the operator after RHS, let
1467 // the pending operator take RHS as its LHS.
1468 int NextPrec = GetTokPrecedence();
1469 if (TokPrec < NextPrec) {
1470 RHS = ParseBinOpRHS(TokPrec+1, RHS);
1471 if (RHS == 0) return 0;
1475 LHS = new BinaryExprAST(BinOp, LHS, RHS);
1480 /// ::= unary binoprhs
1482 static ExprAST *ParseExpression() {
1483 ExprAST *LHS = ParseUnary();
1486 return ParseBinOpRHS(0, LHS);
1490 /// ::= id '(' id* ')'
1491 /// ::= binary LETTER number? (id, id)
1492 /// ::= unary LETTER (id)
1493 static PrototypeAST *ParsePrototype() {
1496 int Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1497 unsigned BinaryPrecedence = 30;
1501 return ErrorP("Expected function name in prototype");
1502 case tok_identifier:
1503 FnName = IdentifierStr;
1509 if (!isascii(CurTok))
1510 return ErrorP("Expected unary operator");
1512 FnName += (char)CurTok;
1518 if (!isascii(CurTok))
1519 return ErrorP("Expected binary operator");
1521 FnName += (char)CurTok;
1525 // Read the precedence if present.
1526 if (CurTok == tok_number) {
1527 if (NumVal < 1 || NumVal > 100)
1528 return ErrorP("Invalid precedecnce: must be 1..100");
1529 BinaryPrecedence = (unsigned)NumVal;
1536 return ErrorP("Expected '(' in prototype");
1538 std::vector<std::string> ArgNames;
1539 while (getNextToken() == tok_identifier)
1540 ArgNames.push_back(IdentifierStr);
1542 return ErrorP("Expected ')' in prototype");
1545 getNextToken(); // eat ')'.
1547 // Verify right number of names for operator.
1548 if (Kind && ArgNames.size() != Kind)
1549 return ErrorP("Invalid number of operands for operator");
1551 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1554 /// definition ::= 'def' prototype expression
1555 static FunctionAST *ParseDefinition() {
1556 getNextToken(); // eat def.
1557 PrototypeAST *Proto = ParsePrototype();
1558 if (Proto == 0) return 0;
1560 if (ExprAST *E = ParseExpression())
1561 return new FunctionAST(Proto, E);
1565 /// toplevelexpr ::= expression
1566 static FunctionAST *ParseTopLevelExpr() {
1567 if (ExprAST *E = ParseExpression()) {
1568 // Make an anonymous proto.
1569 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1570 return new FunctionAST(Proto, E);
1575 /// external ::= 'extern' prototype
1576 static PrototypeAST *ParseExtern() {
1577 getNextToken(); // eat extern.
1578 return ParsePrototype();
1581 //===----------------------------------------------------------------------===//
1583 //===----------------------------------------------------------------------===//
1585 static Module *TheModule;
1586 static LLVMFoldingBuilder Builder;
1587 static std::map<std::string, AllocaInst*> NamedValues;
1588 static FunctionPassManager *TheFPM;
1590 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1592 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1593 /// the function. This is used for mutable variables etc.
1594 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1595 const std::string &VarName) {
1596 LLVMBuilder TmpB(&TheFunction->getEntryBlock(),
1597 TheFunction->getEntryBlock().begin());
1598 return TmpB.CreateAlloca(Type::DoubleTy, 0, VarName.c_str());
1602 Value *NumberExprAST::Codegen() {
1603 return ConstantFP::get(Type::DoubleTy, APFloat(Val));
1606 Value *VariableExprAST::Codegen() {
1607 // Look this variable up in the function.
1608 Value *V = NamedValues[Name];
1609 if (V == 0) return ErrorV("Unknown variable name");
1612 return Builder.CreateLoad(V, Name.c_str());
1615 Value *UnaryExprAST::Codegen() {
1616 Value *OperandV = Operand->Codegen();
1617 if (OperandV == 0) return 0;
1619 Function *F = TheModule->getFunction(std::string("unary")+Opcode);
1621 return ErrorV("Unknown unary operator");
1623 return Builder.CreateCall(F, OperandV, "unop");
1627 Value *BinaryExprAST::Codegen() {
1628 // Special case '=' because we don't want to emit the LHS as an expression.
1630 // Assignment requires the LHS to be an identifier.
1631 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS);
1633 return ErrorV("destination of '=' must be a variable");
1635 Value *Val = RHS->Codegen();
1636 if (Val == 0) return 0;
1638 // Look up the name.
1639 Value *Variable = NamedValues[LHSE->getName()];
1640 if (Variable == 0) return ErrorV("Unknown variable name");
1642 Builder.CreateStore(Val, Variable);
1647 Value *L = LHS->Codegen();
1648 Value *R = RHS->Codegen();
1649 if (L == 0 || R == 0) return 0;
1652 case '+': return Builder.CreateAdd(L, R, "addtmp");
1653 case '-': return Builder.CreateSub(L, R, "subtmp");
1654 case '*': return Builder.CreateMul(L, R, "multmp");
1656 L = Builder.CreateFCmpULT(L, R, "multmp");
1657 // Convert bool 0/1 to double 0.0 or 1.0
1658 return Builder.CreateUIToFP(L, Type::DoubleTy, "booltmp");
1662 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1664 Function *F = TheModule->getFunction(std::string("binary")+Op);
1665 assert(F && "binary operator not found!");
1667 Value *Ops[] = { L, R };
1668 return Builder.CreateCall(F, Ops, Ops+2, "binop");
1671 Value *CallExprAST::Codegen() {
1672 // Look up the name in the global module table.
1673 Function *CalleeF = TheModule->getFunction(Callee);
1675 return ErrorV("Unknown function referenced");
1677 // If argument mismatch error.
1678 if (CalleeF->arg_size() != Args.size())
1679 return ErrorV("Incorrect # arguments passed");
1681 std::vector<Value*> ArgsV;
1682 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1683 ArgsV.push_back(Args[i]->Codegen());
1684 if (ArgsV.back() == 0) return 0;
1687 return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1690 Value *IfExprAST::Codegen() {
1691 Value *CondV = Cond->Codegen();
1692 if (CondV == 0) return 0;
1694 // Convert condition to a bool by comparing equal to 0.0.
1695 CondV = Builder.CreateFCmpONE(CondV,
1696 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1699 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1701 // Create blocks for the then and else cases. Insert the 'then' block at the
1702 // end of the function.
1703 BasicBlock *ThenBB = new BasicBlock("then", TheFunction);
1704 BasicBlock *ElseBB = new BasicBlock("else");
1705 BasicBlock *MergeBB = new BasicBlock("ifcont");
1707 Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1710 Builder.SetInsertPoint(ThenBB);
1712 Value *ThenV = Then->Codegen();
1713 if (ThenV == 0) return 0;
1715 Builder.CreateBr(MergeBB);
1716 // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1717 ThenBB = Builder.GetInsertBlock();
1720 TheFunction->getBasicBlockList().push_back(ElseBB);
1721 Builder.SetInsertPoint(ElseBB);
1723 Value *ElseV = Else->Codegen();
1724 if (ElseV == 0) return 0;
1726 Builder.CreateBr(MergeBB);
1727 // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1728 ElseBB = Builder.GetInsertBlock();
1730 // Emit merge block.
1731 TheFunction->getBasicBlockList().push_back(MergeBB);
1732 Builder.SetInsertPoint(MergeBB);
1733 PHINode *PN = Builder.CreatePHI(Type::DoubleTy, "iftmp");
1735 PN->addIncoming(ThenV, ThenBB);
1736 PN->addIncoming(ElseV, ElseBB);
1740 Value *ForExprAST::Codegen() {
1742 // var = alloca double
1744 // start = startexpr
1745 // store start -> var
1753 // endcond = endexpr
1755 // curvar = load var
1756 // nextvar = curvar + step
1757 // store nextvar -> var
1758 // br endcond, loop, endloop
1761 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1763 // Create an alloca for the variable in the entry block.
1764 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1766 // Emit the start code first, without 'variable' in scope.
1767 Value *StartVal = Start->Codegen();
1768 if (StartVal == 0) return 0;
1770 // Store the value into the alloca.
1771 Builder.CreateStore(StartVal, Alloca);
1773 // Make the new basic block for the loop header, inserting after current
1775 BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1776 BasicBlock *LoopBB = new BasicBlock("loop", TheFunction);
1778 // Insert an explicit fall through from the current block to the LoopBB.
1779 Builder.CreateBr(LoopBB);
1781 // Start insertion in LoopBB.
1782 Builder.SetInsertPoint(LoopBB);
1784 // Within the loop, the variable is defined equal to the PHI node. If it
1785 // shadows an existing variable, we have to restore it, so save it now.
1786 AllocaInst *OldVal = NamedValues[VarName];
1787 NamedValues[VarName] = Alloca;
1789 // Emit the body of the loop. This, like any other expr, can change the
1790 // current BB. Note that we ignore the value computed by the body, but don't
1792 if (Body->Codegen() == 0)
1795 // Emit the step value.
1798 StepVal = Step->Codegen();
1799 if (StepVal == 0) return 0;
1801 // If not specified, use 1.0.
1802 StepVal = ConstantFP::get(Type::DoubleTy, APFloat(1.0));
1805 // Compute the end condition.
1806 Value *EndCond = End->Codegen();
1807 if (EndCond == 0) return EndCond;
1809 // Reload, increment, and restore the alloca. This handles the case where
1810 // the body of the loop mutates the variable.
1811 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1812 Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1813 Builder.CreateStore(NextVar, Alloca);
1815 // Convert condition to a bool by comparing equal to 0.0.
1816 EndCond = Builder.CreateFCmpONE(EndCond,
1817 ConstantFP::get(Type::DoubleTy, APFloat(0.0)),
1820 // Create the "after loop" block and insert it.
1821 BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1822 BasicBlock *AfterBB = new BasicBlock("afterloop", TheFunction);
1824 // Insert the conditional branch into the end of LoopEndBB.
1825 Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1827 // Any new code will be inserted in AfterBB.
1828 Builder.SetInsertPoint(AfterBB);
1830 // Restore the unshadowed variable.
1832 NamedValues[VarName] = OldVal;
1834 NamedValues.erase(VarName);
1837 // for expr always returns 0.0.
1838 return Constant::getNullValue(Type::DoubleTy);
1841 Value *VarExprAST::Codegen() {
1842 std::vector<AllocaInst *> OldBindings;
1844 Function *TheFunction = Builder.GetInsertBlock()->getParent();
1846 // Register all variables and emit their initializer.
1847 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1848 const std::string &VarName = VarNames[i].first;
1849 ExprAST *Init = VarNames[i].second;
1851 // Emit the initializer before adding the variable to scope, this prevents
1852 // the initializer from referencing the variable itself, and permits stuff
1855 // var a = a in ... # refers to outer 'a'.
1858 InitVal = Init->Codegen();
1859 if (InitVal == 0) return 0;
1860 } else { // If not specified, use 0.0.
1861 InitVal = ConstantFP::get(Type::DoubleTy, APFloat(0.0));
1864 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1865 Builder.CreateStore(InitVal, Alloca);
1867 // Remember the old variable binding so that we can restore the binding when
1869 OldBindings.push_back(NamedValues[VarName]);
1871 // Remember this binding.
1872 NamedValues[VarName] = Alloca;
1875 // Codegen the body, now that all vars are in scope.
1876 Value *BodyVal = Body->Codegen();
1877 if (BodyVal == 0) return 0;
1879 // Pop all our variables from scope.
1880 for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1881 NamedValues[VarNames[i].first] = OldBindings[i];
1883 // Return the body computation.
1888 Function *PrototypeAST::Codegen() {
1889 // Make the function type: double(double,double) etc.
1890 std::vector<const Type*> Doubles(Args.size(), Type::DoubleTy);
1891 FunctionType *FT = FunctionType::get(Type::DoubleTy, Doubles, false);
1893 Function *F = new Function(FT, Function::ExternalLinkage, Name, TheModule);
1895 // If F conflicted, there was already something named 'Name'. If it has a
1896 // body, don't allow redefinition or reextern.
1897 if (F->getName() != Name) {
1898 // Delete the one we just made and get the existing one.
1899 F->eraseFromParent();
1900 F = TheModule->getFunction(Name);
1902 // If F already has a body, reject this.
1903 if (!F->empty()) {
1904 ErrorF("redefinition of function");
1908 // If F took a different number of args, reject.
1909 if (F->arg_size() != Args.size()) {
1910 ErrorF("redefinition of function with different # args");
1915 // Set names for all arguments.
1917 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size();
1919 AI->setName(Args[Idx]);
1924 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1925 /// argument in the symbol table so that references to it will succeed.
1926 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1927 Function::arg_iterator AI = F->arg_begin();
1928 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1929 // Create an alloca for this variable.
1930 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1932 // Store the initial value into the alloca.
1933 Builder.CreateStore(AI, Alloca);
1935 // Add arguments to variable symbol table.
1936 NamedValues[Args[Idx]] = Alloca;
1941 Function *FunctionAST::Codegen() {
1942 NamedValues.clear();
1944 Function *TheFunction = Proto->Codegen();
1945 if (TheFunction == 0)
1948 // If this is an operator, install it.
1949 if (Proto->isBinaryOp())
1950 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
1952 // Create a new basic block to start insertion into.
1953 BasicBlock *BB = new BasicBlock("entry", TheFunction);
1954 Builder.SetInsertPoint(BB);
1956 // Add all arguments to the symbol table and create their allocas.
1957 Proto->CreateArgumentAllocas(TheFunction);
1959 if (Value *RetVal = Body->Codegen()) {
1960 // Finish off the function.
1961 Builder.CreateRet(RetVal);
1963 // Validate the generated code, checking for consistency.
1964 verifyFunction(*TheFunction);
1966 // Optimize the function.
1967 TheFPM->run(*TheFunction);
1972 // Error reading body, remove function.
1973 TheFunction->eraseFromParent();
1975 if (Proto->isBinaryOp())
1976 BinopPrecedence.erase(Proto->getOperatorName());
1980 //===----------------------------------------------------------------------===//
1981 // Top-Level parsing and JIT Driver
1982 //===----------------------------------------------------------------------===//
1984 static ExecutionEngine *TheExecutionEngine;
1986 static void HandleDefinition() {
1987 if (FunctionAST *F = ParseDefinition()) {
1988 if (Function *LF = F->Codegen()) {
1989 fprintf(stderr, "Read function definition:");
1993 // Skip token for error recovery.
1998 static void HandleExtern() {
1999 if (PrototypeAST *P = ParseExtern()) {
2000 if (Function *F = P->Codegen()) {
2001 fprintf(stderr, "Read extern: ");
2005 // Skip token for error recovery.
2010 static void HandleTopLevelExpression() {
2011 // Evaluate a top level expression into an anonymous function.
2012 if (FunctionAST *F = ParseTopLevelExpr()) {
2013 if (Function *LF = F->Codegen()) {
2014 // JIT the function, returning a function pointer.
2015 void *FPtr = TheExecutionEngine->getPointerToFunction(LF);
2017 // Cast it to the right type (takes no arguments, returns a double) so we
2018 // can call it as a native function.
2019 double (*FP)() = (double (*)())FPtr;
2020 fprintf(stderr, "Evaluated to %f\n", FP());
2023 // Skip token for error recovery.
2028 /// top ::= definition | external | expression | ';'
2029 static void MainLoop() {
2031 fprintf(stderr, "ready> ");
2033 case tok_eof: return;
2034 case ';': getNextToken(); break; // ignore top level semicolons.
2035 case tok_def: HandleDefinition(); break;
2036 case tok_extern: HandleExtern(); break;
2037 default: HandleTopLevelExpression(); break;
2044 //===----------------------------------------------------------------------===//
2045 // "Library" functions that can be "extern'd" from user code.
2046 //===----------------------------------------------------------------------===//
2048 /// putchard - putchar that takes a double and returns 0.
2050 double putchard(double X) {
2055 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2057 double printd(double X) {
2062 //===----------------------------------------------------------------------===//
2063 // Main driver code.
2064 //===----------------------------------------------------------------------===//
2067 // Install standard binary operators.
2068 // 1 is lowest precedence.
2069 BinopPrecedence['='] = 2;
2070 BinopPrecedence['<'] = 10;
2071 BinopPrecedence['+'] = 20;
2072 BinopPrecedence['-'] = 20;
2073 BinopPrecedence['*'] = 40; // highest.
2075 // Prime the first token.
2076 fprintf(stderr, "ready> ");
2079 // Make the module, which holds all the code.
2080 TheModule = new Module("my cool jit");
2083 TheExecutionEngine = ExecutionEngine::create(TheModule);
2086 ExistingModuleProvider OurModuleProvider(TheModule);
2087 FunctionPassManager OurFPM(&OurModuleProvider);
2089 // Set up the optimizer pipeline. Start with registering info about how the
2090 // target lays out data structures.
2091 OurFPM.add(new TargetData(*TheExecutionEngine->getTargetData()));
2092 // Promote allocas to registers.
2093 OurFPM.add(createPromoteMemoryToRegisterPass());
2094 // Do simple "peephole" optimizations and bit-twiddling optzns.
2095 OurFPM.add(createInstructionCombiningPass());
2096 // Reassociate expressions.
2097 OurFPM.add(createReassociatePass());
2098 // Eliminate Common SubExpressions.
2099 OurFPM.add(createGVNPass());
2100 // Simplify the control flow graph (deleting unreachable blocks, etc).
2101 OurFPM.add(createCFGSimplificationPass());
2103 // Set the global so the code gen can use this.
2104 TheFPM = &OurFPM;
2106 // Run the main "interpreter loop" now.
2110 } // Free module provider and pass manager.
2113 // Print out all of the generated code.
2114 TheModule->dump();
2122 <!-- *********************************************************************** -->
2125 <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2126 src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2127 <a href="http://validator.w3.org/check/referer"><img
2128 src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2130 <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2131 <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2132 Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $