1 ===========================================
2 Kaleidoscope: Implementing a Parser and AST
3 ===========================================
8 Written by `Chris Lattner <mailto:sabre@nondot.org>`_
10 Chapter 2 Introduction
11 ======================
13 Welcome to Chapter 2 of the "`Implementing a language with
14 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
15 lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
16 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
17 language. Once we have a parser, we'll define and build an `Abstract
18 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
20 The parser we will build uses a combination of `Recursive Descent
21 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
23 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
24 parse the Kaleidoscope language (the latter for binary expressions and
25 the former for everything else). Before we get to parsing though, lets
26 talk about the output of the parser: the Abstract Syntax Tree.
28 The Abstract Syntax Tree (AST)
29 ==============================
31 The AST for a program captures its behavior in such a way that it is
32 easy for later stages of the compiler (e.g. code generation) to
33 interpret. We basically want one object for each construct in the
34 language, and the AST should closely model the language. In
35 Kaleidoscope, we have expressions, a prototype, and a function object.
36 We'll start with expressions first:
40 /// ExprAST - Base class for all expression nodes.
46 /// NumberExprAST - Expression class for numeric literals like "1.0".
47 class NumberExprAST : public ExprAST {
50 NumberExprAST(double val) : Val(val) {}
53 The code above shows the definition of the base ExprAST class and one
54 subclass which we use for numeric literals. The important thing to note
55 about this code is that the NumberExprAST class captures the numeric
56 value of the literal as an instance variable. This allows later phases
57 of the compiler to know what the stored numeric value is.
59 Right now we only create the AST, so there are no useful accessor
60 methods on them. It would be very easy to add a virtual method to pretty
61 print the code, for example. Here are the other expression AST node
62 definitions that we'll use in the basic form of the Kaleidoscope
67 /// VariableExprAST - Expression class for referencing a variable, like "a".
68 class VariableExprAST : public ExprAST {
71 VariableExprAST(const std::string &name) : Name(name) {}
74 /// BinaryExprAST - Expression class for a binary operator.
75 class BinaryExprAST : public ExprAST {
79 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
80 : Op(op), LHS(lhs), RHS(rhs) {}
83 /// CallExprAST - Expression class for function calls.
84 class CallExprAST : public ExprAST {
86 std::vector<ExprAST*> Args;
88 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
89 : Callee(callee), Args(args) {}
92 This is all (intentionally) rather straight-forward: variables capture
93 the variable name, binary operators capture their opcode (e.g. '+'), and
94 calls capture a function name as well as a list of any argument
95 expressions. One thing that is nice about our AST is that it captures
96 the language features without talking about the syntax of the language.
97 Note that there is no discussion about precedence of binary operators,
98 lexical structure, etc.
100 For our basic language, these are all of the expression nodes we'll
101 define. Because it doesn't have conditional control flow, it isn't
102 Turing-complete; we'll fix that in a later installment. The two things
103 we need next are a way to talk about the interface to a function, and a
104 way to talk about functions themselves:
108 /// PrototypeAST - This class represents the "prototype" for a function,
109 /// which captures its name, and its argument names (thus implicitly the number
110 /// of arguments the function takes).
113 std::vector<std::string> Args;
115 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
116 : Name(name), Args(args) {}
119 /// FunctionAST - This class represents a function definition itself.
124 FunctionAST(PrototypeAST *proto, ExprAST *body)
125 : Proto(proto), Body(body) {}
128 In Kaleidoscope, functions are typed with just a count of their
129 arguments. Since all values are double precision floating point, the
130 type of each argument doesn't need to be stored anywhere. In a more
131 aggressive and realistic language, the "ExprAST" class would probably
134 With this scaffolding, we can now talk about parsing expressions and
135 function bodies in Kaleidoscope.
140 Now that we have an AST to build, we need to define the parser code to
141 build it. The idea here is that we want to parse something like "x+y"
142 (which is returned as three tokens by the lexer) into an AST that could
143 be generated with calls like this:
147 ExprAST *X = new VariableExprAST("x");
148 ExprAST *Y = new VariableExprAST("y");
149 ExprAST *Result = new BinaryExprAST('+', X, Y);
151 In order to do this, we'll start by defining some basic helper routines:
155 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
156 /// token the parser is looking at. getNextToken reads another token from the
157 /// lexer and updates CurTok with its results.
159 static int getNextToken() {
160 return CurTok = gettok();
163 This implements a simple token buffer around the lexer. This allows us
164 to look one token ahead at what the lexer is returning. Every function
165 in our parser will assume that CurTok is the current token that needs to
171 /// Error* - These are little helper functions for error handling.
172 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
173 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
174 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
176 The ``Error`` routines are simple helper routines that our parser will
177 use to handle errors. The error recovery in our parser will not be the
178 best and is not particular user-friendly, but it will be enough for our
179 tutorial. These routines make it easier to handle errors in routines
180 that have various return types: they always return null.
182 With these basic helper functions, we can implement the first piece of
183 our grammar: numeric literals.
185 Basic Expression Parsing
186 ========================
188 We start with numeric literals, because they are the simplest to
189 process. For each production in our grammar, we'll define a function
190 which parses that production. For numeric literals, we have:
194 /// numberexpr ::= number
195 static ExprAST *ParseNumberExpr() {
196 ExprAST *Result = new NumberExprAST(NumVal);
197 getNextToken(); // consume the number
201 This routine is very simple: it expects to be called when the current
202 token is a ``tok_number`` token. It takes the current number value,
203 creates a ``NumberExprAST`` node, advances the lexer to the next token,
206 There are some interesting aspects to this. The most important one is
207 that this routine eats all of the tokens that correspond to the
208 production and returns the lexer buffer with the next token (which is
209 not part of the grammar production) ready to go. This is a fairly
210 standard way to go for recursive descent parsers. For a better example,
211 the parenthesis operator is defined like this:
215 /// parenexpr ::= '(' expression ')'
216 static ExprAST *ParseParenExpr() {
217 getNextToken(); // eat (.
218 ExprAST *V = ParseExpression();
222 return Error("expected ')'");
223 getNextToken(); // eat ).
227 This function illustrates a number of interesting things about the
230 1) It shows how we use the Error routines. When called, this function
231 expects that the current token is a '(' token, but after parsing the
232 subexpression, it is possible that there is no ')' waiting. For example,
233 if the user types in "(4 x" instead of "(4)", the parser should emit an
234 error. Because errors can occur, the parser needs a way to indicate that
235 they happened: in our parser, we return null on an error.
237 2) Another interesting aspect of this function is that it uses recursion
238 by calling ``ParseExpression`` (we will soon see that
239 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
240 because it allows us to handle recursive grammars, and keeps each
241 production very simple. Note that parentheses do not cause construction
242 of AST nodes themselves. While we could do it this way, the most
243 important role of parentheses are to guide the parser and provide
244 grouping. Once the parser constructs the AST, parentheses are not
247 The next simple production is for handling variable references and
254 /// ::= identifier '(' expression* ')'
255 static ExprAST *ParseIdentifierExpr() {
256 std::string IdName = IdentifierStr;
258 getNextToken(); // eat identifier.
260 if (CurTok != '(') // Simple variable ref.
261 return new VariableExprAST(IdName);
264 getNextToken(); // eat (
265 std::vector<ExprAST*> Args;
268 ExprAST *Arg = ParseExpression();
272 if (CurTok == ')') break;
275 return Error("Expected ')' or ',' in argument list");
283 return new CallExprAST(IdName, Args);
286 This routine follows the same style as the other routines. (It expects
287 to be called if the current token is a ``tok_identifier`` token). It
288 also has recursion and error handling. One interesting aspect of this is
289 that it uses *look-ahead* to determine if the current identifier is a
290 stand alone variable reference or if it is a function call expression.
291 It handles this by checking to see if the token after the identifier is
292 a '(' token, constructing either a ``VariableExprAST`` or
293 ``CallExprAST`` node as appropriate.
295 Now that we have all of our simple expression-parsing logic in place, we
296 can define a helper function to wrap it together into one entry point.
297 We call this class of expressions "primary" expressions, for reasons
298 that will become more clear `later in the
299 tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
300 primary expression, we need to determine what sort of expression it is:
305 /// ::= identifierexpr
308 static ExprAST *ParsePrimary() {
310 default: return Error("unknown token when expecting an expression");
311 case tok_identifier: return ParseIdentifierExpr();
312 case tok_number: return ParseNumberExpr();
313 case '(': return ParseParenExpr();
317 Now that you see the definition of this function, it is more obvious why
318 we can assume the state of CurTok in the various functions. This uses
319 look-ahead to determine which sort of expression is being inspected, and
320 then parses it with a function call.
322 Now that basic expressions are handled, we need to handle binary
323 expressions. They are a bit more complex.
325 Binary Expression Parsing
326 =========================
328 Binary expressions are significantly harder to parse because they are
329 often ambiguous. For example, when given the string "x+y\*z", the parser
330 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
331 definitions from mathematics, we expect the later parse, because "\*"
332 (multiplication) has higher *precedence* than "+" (addition).
334 There are many ways to handle this, but an elegant and efficient way is
335 to use `Operator-Precedence
336 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
337 This parsing technique uses the precedence of binary operators to guide
338 recursion. To start with, we need a table of precedences:
342 /// BinopPrecedence - This holds the precedence for each binary operator that is
344 static std::map<char, int> BinopPrecedence;
346 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
347 static int GetTokPrecedence() {
348 if (!isascii(CurTok))
351 // Make sure it's a declared binop.
352 int TokPrec = BinopPrecedence[CurTok];
353 if (TokPrec <= 0) return -1;
358 // Install standard binary operators.
359 // 1 is lowest precedence.
360 BinopPrecedence['<'] = 10;
361 BinopPrecedence['+'] = 20;
362 BinopPrecedence['-'] = 20;
363 BinopPrecedence['*'] = 40; // highest.
367 For the basic form of Kaleidoscope, we will only support 4 binary
368 operators (this can obviously be extended by you, our brave and intrepid
369 reader). The ``GetTokPrecedence`` function returns the precedence for
370 the current token, or -1 if the token is not a binary operator. Having a
371 map makes it easy to add new operators and makes it clear that the
372 algorithm doesn't depend on the specific operators involved, but it
373 would be easy enough to eliminate the map and do the comparisons in the
374 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
376 With the helper above defined, we can now start parsing binary
377 expressions. The basic idea of operator precedence parsing is to break
378 down an expression with potentially ambiguous binary operators into
379 pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
380 Operator precedence parsing considers this as a stream of primary
381 expressions separated by binary operators. As such, it will first parse
382 the leading primary expression "a", then it will see the pairs [+, b]
383 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
384 primary expressions, the binary expression parser doesn't need to worry
385 about nested subexpressions like (c+d) at all.
387 To start, an expression is a primary expression potentially followed by
388 a sequence of [binop,primaryexpr] pairs:
393 /// ::= primary binoprhs
395 static ExprAST *ParseExpression() {
396 ExprAST *LHS = ParsePrimary();
399 return ParseBinOpRHS(0, LHS);
402 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
403 us. It takes a precedence and a pointer to an expression for the part
404 that has been parsed so far. Note that "x" is a perfectly valid
405 expression: As such, "binoprhs" is allowed to be empty, in which case it
406 returns the expression that is passed into it. In our example above, the
407 code passes the expression for "a" into ``ParseBinOpRHS`` and the
408 current token is "+".
410 The precedence value passed into ``ParseBinOpRHS`` indicates the
411 *minimal operator precedence* that the function is allowed to eat. For
412 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
413 passed in a precedence of 40, it will not consume any tokens (because
414 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
420 /// ::= ('+' primary)*
421 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
422 // If this is a binop, find its precedence.
424 int TokPrec = GetTokPrecedence();
426 // If this is a binop that binds at least as tightly as the current binop,
427 // consume it, otherwise we are done.
428 if (TokPrec < ExprPrec)
431 This code gets the precedence of the current token and checks to see if
432 if is too low. Because we defined invalid tokens to have a precedence of
433 -1, this check implicitly knows that the pair-stream ends when the token
434 stream runs out of binary operators. If this check succeeds, we know
435 that the token is a binary operator and that it will be included in this
440 // Okay, we know this is a binop.
442 getNextToken(); // eat binop
444 // Parse the primary expression after the binary operator.
445 ExprAST *RHS = ParsePrimary();
448 As such, this code eats (and remembers) the binary operator and then
449 parses the primary expression that follows. This builds up the whole
450 pair, the first of which is [+, b] for the running example.
452 Now that we parsed the left-hand side of an expression and one pair of
453 the RHS sequence, we have to decide which way the expression associates.
454 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
455 unparsed)". To determine this, we look ahead at "binop" to determine its
456 precedence and compare it to BinOp's precedence (which is '+' in this
461 // If BinOp binds less tightly with RHS than the operator after RHS, let
462 // the pending operator take RHS as its LHS.
463 int NextPrec = GetTokPrecedence();
464 if (TokPrec < NextPrec) {
466 If the precedence of the binop to the right of "RHS" is lower or equal
467 to the precedence of our current operator, then we know that the
468 parentheses associate as "(a+b) binop ...". In our example, the current
469 operator is "+" and the next operator is "+", we know that they have the
470 same precedence. In this case we'll create the AST node for "a+b", and
471 then continue parsing:
475 ... if body omitted ...
479 LHS = new BinaryExprAST(BinOp, LHS, RHS);
480 } // loop around to the top of the while loop.
483 In our example above, this will turn "a+b+" into "(a+b)" and execute the
484 next iteration of the loop, with "+" as the current token. The code
485 above will eat, remember, and parse "(c+d)" as the primary expression,
486 which makes the current pair equal to [+, (c+d)]. It will then evaluate
487 the 'if' conditional above with "\*" as the binop to the right of the
488 primary. In this case, the precedence of "\*" is higher than the
489 precedence of "+" so the if condition will be entered.
491 The critical question left here is "how can the if condition parse the
492 right hand side in full"? In particular, to build the AST correctly for
493 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
494 variable. The code to do this is surprisingly simple (code from the
495 above two blocks duplicated for context):
499 // If BinOp binds less tightly with RHS than the operator after RHS, let
500 // the pending operator take RHS as its LHS.
501 int NextPrec = GetTokPrecedence();
502 if (TokPrec < NextPrec) {
503 RHS = ParseBinOpRHS(TokPrec+1, RHS);
504 if (RHS == 0) return 0;
507 LHS = new BinaryExprAST(BinOp, LHS, RHS);
508 } // loop around to the top of the while loop.
511 At this point, we know that the binary operator to the RHS of our
512 primary has higher precedence than the binop we are currently parsing.
513 As such, we know that any sequence of pairs whose operators are all
514 higher precedence than "+" should be parsed together and returned as
515 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
516 specifying "TokPrec+1" as the minimum precedence required for it to
517 continue. In our example above, this will cause it to return the AST
518 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
521 Finally, on the next iteration of the while loop, the "+g" piece is
522 parsed and added to the AST. With this little bit of code (14
523 non-trivial lines), we correctly handle fully general binary expression
524 parsing in a very elegant way. This was a whirlwind tour of this code,
525 and it is somewhat subtle. I recommend running through it with a few
526 tough examples to see how it works.
528 This wraps up handling of expressions. At this point, we can point the
529 parser at an arbitrary token stream and build an expression from it,
530 stopping at the first token that is not part of the expression. Next up
531 we need to handle function definitions, etc.
536 The next thing missing is handling of function prototypes. In
537 Kaleidoscope, these are used both for 'extern' function declarations as
538 well as function body definitions. The code to do this is
539 straight-forward and not very interesting (once you've survived
545 /// ::= id '(' id* ')'
546 static PrototypeAST *ParsePrototype() {
547 if (CurTok != tok_identifier)
548 return ErrorP("Expected function name in prototype");
550 std::string FnName = IdentifierStr;
554 return ErrorP("Expected '(' in prototype");
556 // Read the list of argument names.
557 std::vector<std::string> ArgNames;
558 while (getNextToken() == tok_identifier)
559 ArgNames.push_back(IdentifierStr);
561 return ErrorP("Expected ')' in prototype");
564 getNextToken(); // eat ')'.
566 return new PrototypeAST(FnName, ArgNames);
569 Given this, a function definition is very simple, just a prototype plus
570 an expression to implement the body:
574 /// definition ::= 'def' prototype expression
575 static FunctionAST *ParseDefinition() {
576 getNextToken(); // eat def.
577 PrototypeAST *Proto = ParsePrototype();
578 if (Proto == 0) return 0;
580 if (ExprAST *E = ParseExpression())
581 return new FunctionAST(Proto, E);
585 In addition, we support 'extern' to declare functions like 'sin' and
586 'cos' as well as to support forward declaration of user functions. These
587 'extern's are just prototypes with no body:
591 /// external ::= 'extern' prototype
592 static PrototypeAST *ParseExtern() {
593 getNextToken(); // eat extern.
594 return ParsePrototype();
597 Finally, we'll also let the user type in arbitrary top-level expressions
598 and evaluate them on the fly. We will handle this by defining anonymous
599 nullary (zero argument) functions for them:
603 /// toplevelexpr ::= expression
604 static FunctionAST *ParseTopLevelExpr() {
605 if (ExprAST *E = ParseExpression()) {
606 // Make an anonymous proto.
607 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
608 return new FunctionAST(Proto, E);
613 Now that we have all the pieces, let's build a little driver that will
614 let us actually *execute* this code we've built!
619 The driver for this simply invokes all of the parsing pieces with a
620 top-level dispatch loop. There isn't much interesting here, so I'll just
621 include the top-level loop. See `below <#code>`_ for full code in the
622 "Top-Level Parsing" section.
626 /// top ::= definition | external | expression | ';'
627 static void MainLoop() {
629 fprintf(stderr, "ready> ");
631 case tok_eof: return;
632 case ';': getNextToken(); break; // ignore top-level semicolons.
633 case tok_def: HandleDefinition(); break;
634 case tok_extern: HandleExtern(); break;
635 default: HandleTopLevelExpression(); break;
640 The most interesting part of this is that we ignore top-level
641 semicolons. Why is this, you ask? The basic reason is that if you type
642 "4 + 5" at the command line, the parser doesn't know whether that is the
643 end of what you will type or not. For example, on the next line you
644 could type "def foo..." in which case 4+5 is the end of a top-level
645 expression. Alternatively you could type "\* 6", which would continue
646 the expression. Having top-level semicolons allows you to type "4+5;",
647 and the parser will know you are done.
652 With just under 400 lines of commented code (240 lines of non-comment,
653 non-blank code), we fully defined our minimal language, including a
654 lexer, parser, and AST builder. With this done, the executable will
655 validate Kaleidoscope code and tell us if it is grammatically invalid.
656 For example, here is a sample interaction:
661 ready> def foo(x y) x+foo(y, 4.0);
662 Parsed a function definition.
663 ready> def foo(x y) x+y y;
664 Parsed a function definition.
665 Parsed a top-level expr
666 ready> def foo(x y) x+y );
667 Parsed a function definition.
668 Error: unknown token when expecting an expression
669 ready> extern sin(a);
670 ready> Parsed an extern
674 There is a lot of room for extension here. You can define new AST nodes,
675 extend the language in many ways, etc. In the `next
676 installment <LangImpl3.html>`_, we will describe how to generate LLVM
677 Intermediate Representation (IR) from the AST.
682 Here is the complete code listing for this and the previous chapter.
683 Note that it is fully self-contained: you don't need LLVM or any
684 external libraries at all for this. (Besides the C and C++ standard
685 libraries, of course.) To build this, just compile with:
690 clang++ -g -O3 toy.cpp
704 //===----------------------------------------------------------------------===//
706 //===----------------------------------------------------------------------===//
708 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
709 // of these for known things.
714 tok_def = -2, tok_extern = -3,
717 tok_identifier = -4, tok_number = -5
720 static std::string IdentifierStr; // Filled in if tok_identifier
721 static double NumVal; // Filled in if tok_number
723 /// gettok - Return the next token from standard input.
724 static int gettok() {
725 static int LastChar = ' ';
727 // Skip any whitespace.
728 while (isspace(LastChar))
729 LastChar = getchar();
731 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
732 IdentifierStr = LastChar;
733 while (isalnum((LastChar = getchar())))
734 IdentifierStr += LastChar;
736 if (IdentifierStr == "def") return tok_def;
737 if (IdentifierStr == "extern") return tok_extern;
738 return tok_identifier;
741 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
745 LastChar = getchar();
746 } while (isdigit(LastChar) || LastChar == '.');
748 NumVal = strtod(NumStr.c_str(), 0);
752 if (LastChar == '#') {
753 // Comment until end of line.
754 do LastChar = getchar();
755 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
761 // Check for end of file. Don't eat the EOF.
765 // Otherwise, just return the character as its ascii value.
766 int ThisChar = LastChar;
767 LastChar = getchar();
771 //===----------------------------------------------------------------------===//
772 // Abstract Syntax Tree (aka Parse Tree)
773 //===----------------------------------------------------------------------===//
775 /// ExprAST - Base class for all expression nodes.
778 virtual ~ExprAST() {}
781 /// NumberExprAST - Expression class for numeric literals like "1.0".
782 class NumberExprAST : public ExprAST {
785 NumberExprAST(double val) : Val(val) {}
788 /// VariableExprAST - Expression class for referencing a variable, like "a".
789 class VariableExprAST : public ExprAST {
792 VariableExprAST(const std::string &name) : Name(name) {}
795 /// BinaryExprAST - Expression class for a binary operator.
796 class BinaryExprAST : public ExprAST {
800 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
801 : Op(op), LHS(lhs), RHS(rhs) {}
804 /// CallExprAST - Expression class for function calls.
805 class CallExprAST : public ExprAST {
807 std::vector<ExprAST*> Args;
809 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args)
810 : Callee(callee), Args(args) {}
813 /// PrototypeAST - This class represents the "prototype" for a function,
814 /// which captures its name, and its argument names (thus implicitly the number
815 /// of arguments the function takes).
818 std::vector<std::string> Args;
820 PrototypeAST(const std::string &name, const std::vector<std::string> &args)
821 : Name(name), Args(args) {}
825 /// FunctionAST - This class represents a function definition itself.
830 FunctionAST(PrototypeAST *proto, ExprAST *body)
831 : Proto(proto), Body(body) {}
835 //===----------------------------------------------------------------------===//
837 //===----------------------------------------------------------------------===//
839 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current
840 /// token the parser is looking at. getNextToken reads another token from the
841 /// lexer and updates CurTok with its results.
843 static int getNextToken() {
844 return CurTok = gettok();
847 /// BinopPrecedence - This holds the precedence for each binary operator that is
849 static std::map<char, int> BinopPrecedence;
851 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
852 static int GetTokPrecedence() {
853 if (!isascii(CurTok))
856 // Make sure it's a declared binop.
857 int TokPrec = BinopPrecedence[CurTok];
858 if (TokPrec <= 0) return -1;
862 /// Error* - These are little helper functions for error handling.
863 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
864 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
865 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
867 static ExprAST *ParseExpression();
871 /// ::= identifier '(' expression* ')'
872 static ExprAST *ParseIdentifierExpr() {
873 std::string IdName = IdentifierStr;
875 getNextToken(); // eat identifier.
877 if (CurTok != '(') // Simple variable ref.
878 return new VariableExprAST(IdName);
881 getNextToken(); // eat (
882 std::vector<ExprAST*> Args;
885 ExprAST *Arg = ParseExpression();
889 if (CurTok == ')') break;
892 return Error("Expected ')' or ',' in argument list");
900 return new CallExprAST(IdName, Args);
903 /// numberexpr ::= number
904 static ExprAST *ParseNumberExpr() {
905 ExprAST *Result = new NumberExprAST(NumVal);
906 getNextToken(); // consume the number
910 /// parenexpr ::= '(' expression ')'
911 static ExprAST *ParseParenExpr() {
912 getNextToken(); // eat (.
913 ExprAST *V = ParseExpression();
917 return Error("expected ')'");
918 getNextToken(); // eat ).
923 /// ::= identifierexpr
926 static ExprAST *ParsePrimary() {
928 default: return Error("unknown token when expecting an expression");
929 case tok_identifier: return ParseIdentifierExpr();
930 case tok_number: return ParseNumberExpr();
931 case '(': return ParseParenExpr();
936 /// ::= ('+' primary)*
937 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
938 // If this is a binop, find its precedence.
940 int TokPrec = GetTokPrecedence();
942 // If this is a binop that binds at least as tightly as the current binop,
943 // consume it, otherwise we are done.
944 if (TokPrec < ExprPrec)
947 // Okay, we know this is a binop.
949 getNextToken(); // eat binop
951 // Parse the primary expression after the binary operator.
952 ExprAST *RHS = ParsePrimary();
955 // If BinOp binds less tightly with RHS than the operator after RHS, let
956 // the pending operator take RHS as its LHS.
957 int NextPrec = GetTokPrecedence();
958 if (TokPrec < NextPrec) {
959 RHS = ParseBinOpRHS(TokPrec+1, RHS);
960 if (RHS == 0) return 0;
964 LHS = new BinaryExprAST(BinOp, LHS, RHS);
969 /// ::= primary binoprhs
971 static ExprAST *ParseExpression() {
972 ExprAST *LHS = ParsePrimary();
975 return ParseBinOpRHS(0, LHS);
979 /// ::= id '(' id* ')'
980 static PrototypeAST *ParsePrototype() {
981 if (CurTok != tok_identifier)
982 return ErrorP("Expected function name in prototype");
984 std::string FnName = IdentifierStr;
988 return ErrorP("Expected '(' in prototype");
990 std::vector<std::string> ArgNames;
991 while (getNextToken() == tok_identifier)
992 ArgNames.push_back(IdentifierStr);
994 return ErrorP("Expected ')' in prototype");
997 getNextToken(); // eat ')'.
999 return new PrototypeAST(FnName, ArgNames);
1002 /// definition ::= 'def' prototype expression
1003 static FunctionAST *ParseDefinition() {
1004 getNextToken(); // eat def.
1005 PrototypeAST *Proto = ParsePrototype();
1006 if (Proto == 0) return 0;
1008 if (ExprAST *E = ParseExpression())
1009 return new FunctionAST(Proto, E);
1013 /// toplevelexpr ::= expression
1014 static FunctionAST *ParseTopLevelExpr() {
1015 if (ExprAST *E = ParseExpression()) {
1016 // Make an anonymous proto.
1017 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>());
1018 return new FunctionAST(Proto, E);
1023 /// external ::= 'extern' prototype
1024 static PrototypeAST *ParseExtern() {
1025 getNextToken(); // eat extern.
1026 return ParsePrototype();
1029 //===----------------------------------------------------------------------===//
1030 // Top-Level parsing
1031 //===----------------------------------------------------------------------===//
1033 static void HandleDefinition() {
1034 if (ParseDefinition()) {
1035 fprintf(stderr, "Parsed a function definition.\n");
1037 // Skip token for error recovery.
1042 static void HandleExtern() {
1043 if (ParseExtern()) {
1044 fprintf(stderr, "Parsed an extern\n");
1046 // Skip token for error recovery.
1051 static void HandleTopLevelExpression() {
1052 // Evaluate a top-level expression into an anonymous function.
1053 if (ParseTopLevelExpr()) {
1054 fprintf(stderr, "Parsed a top-level expr\n");
1056 // Skip token for error recovery.
1061 /// top ::= definition | external | expression | ';'
1062 static void MainLoop() {
1064 fprintf(stderr, "ready> ");
1066 case tok_eof: return;
1067 case ';': getNextToken(); break; // ignore top-level semicolons.
1068 case tok_def: HandleDefinition(); break;
1069 case tok_extern: HandleExtern(); break;
1070 default: HandleTopLevelExpression(); break;
1075 //===----------------------------------------------------------------------===//
1076 // Main driver code.
1077 //===----------------------------------------------------------------------===//
1080 // Install standard binary operators.
1081 // 1 is lowest precedence.
1082 BinopPrecedence['<'] = 10;
1083 BinopPrecedence['+'] = 20;
1084 BinopPrecedence['-'] = 20;
1085 BinopPrecedence['*'] = 40; // highest.
1087 // Prime the first token.
1088 fprintf(stderr, "ready> ");
1091 // Run the main "interpreter loop" now.
1097 `Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_