docs/tutorial/LangImpl2.rst

   1 ===========================================
   2 Kaleidoscope: Implementing a Parser and AST
   3 ===========================================
   4
   5 .. contents::
   6    :local:
   7
   8 Chapter 2 Introduction
   9 ======================
  10
  11 Welcome to Chapter 2 of the "`Implementing a language with
  12 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
  13 lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
  14 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
  15 language. Once we have a parser, we'll define and build an `Abstract
  16 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
  17
  18 The parser we will build uses a combination of `Recursive Descent
  19 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
  20 `Operator-Precedence
  21 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
  22 parse the Kaleidoscope language (the latter for binary expressions and
  23 the former for everything else). Before we get to parsing though, lets
  24 talk about the output of the parser: the Abstract Syntax Tree.
  25
  26 The Abstract Syntax Tree (AST)
  27 ==============================
  28
  29 The AST for a program captures its behavior in such a way that it is
  30 easy for later stages of the compiler (e.g. code generation) to
  31 interpret. We basically want one object for each construct in the
  32 language, and the AST should closely model the language. In
  33 Kaleidoscope, we have expressions, a prototype, and a function object.
  34 We'll start with expressions first:
  35
  36 .. code-block:: c++
  37
  38     /// ExprAST - Base class for all expression nodes.
  39     class ExprAST {
  40     public:
  41       virtual ~ExprAST() {}
  42     };
  43
  44     /// NumberExprAST - Expression class for numeric literals like "1.0".
  45     class NumberExprAST : public ExprAST {
  46       double Val;
  47
  48     public:
  49       NumberExprAST(double Val) : Val(Val) {}
  50     };
  51
  52 The code above shows the definition of the base ExprAST class and one
  53 subclass which we use for numeric literals. The important thing to note
  54 about this code is that the NumberExprAST class captures the numeric
  55 value of the literal as an instance variable. This allows later phases
  56 of the compiler to know what the stored numeric value is.
  57
  58 Right now we only create the AST, so there are no useful accessor
  59 methods on them. It would be very easy to add a virtual method to pretty
  60 print the code, for example. Here are the other expression AST node
  61 definitions that we'll use in the basic form of the Kaleidoscope
  62 language:
  63
  64 .. code-block:: c++
  65
  66     /// VariableExprAST - Expression class for referencing a variable, like "a".
  67     class VariableExprAST : public ExprAST {
  68       std::string Name;
  69
  70     public:
  71       VariableExprAST(const std::string &Name) : Name(Name) {}
  72     };
  73
  74     /// BinaryExprAST - Expression class for a binary operator.
  75     class BinaryExprAST : public ExprAST {
  76       char Op;
  77       std::unique_ptr<ExprAST> LHS, RHS;
  78
  79     public:
  80       BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
  81                     std::unique_ptr<ExprAST> RHS)
  82         : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
  83     };
  84
  85     /// CallExprAST - Expression class for function calls.
  86     class CallExprAST : public ExprAST {
  87       std::string Callee;
  88       std::vector<std::unique_ptr<ExprAST>> Args;
  89
  90     public:
  91       CallExprAST(const std::string &Callee,
  92                   std::vector<std::unique_ptr<ExprAST>> Args)
  93         : Callee(Callee), Args(std::move(Args)) {}
  94     };
  95
  96 This is all (intentionally) rather straight-forward: variables capture
  97 the variable name, binary operators capture their opcode (e.g. '+'), and
  98 calls capture a function name as well as a list of any argument
  99 expressions. One thing that is nice about our AST is that it captures
 100 the language features without talking about the syntax of the language.
 101 Note that there is no discussion about precedence of binary operators,
 102 lexical structure, etc.
 103
 104 For our basic language, these are all of the expression nodes we'll
 105 define. Because it doesn't have conditional control flow, it isn't
 106 Turing-complete; we'll fix that in a later installment. The two things
 107 we need next are a way to talk about the interface to a function, and a
 108 way to talk about functions themselves:
 109
 110 .. code-block:: c++
 111
 112     /// PrototypeAST - This class represents the "prototype" for a function,
 113     /// which captures its name, and its argument names (thus implicitly the number
 114     /// of arguments the function takes).
 115     class PrototypeAST {
 116       std::string Name;
 117       std::vector<std::string> Args;
 118
 119     public:
 120       PrototypeAST(const std::string &name, std::vector<std::string> Args)
 121         : Name(name), Args(std::move(Args)) {}
 122     };
 123
 124     /// FunctionAST - This class represents a function definition itself.
 125     class FunctionAST {
 126       std::unique_ptr<PrototypeAST> Proto;
 127       std::unique_ptr<ExprAST> Body;
 128
 129     public:
 130       FunctionAST(std::unique_ptr<PrototypeAST> Proto,
 131                   std::unique_ptr<ExprAST> Body)
 132         : Proto(std::move(Proto)), Body(std::move(Body)) {}
 133     };
 134
 135 In Kaleidoscope, functions are typed with just a count of their
 136 arguments. Since all values are double precision floating point, the
 137 type of each argument doesn't need to be stored anywhere. In a more
 138 aggressive and realistic language, the "ExprAST" class would probably
 139 have a type field.
 140
 141 With this scaffolding, we can now talk about parsing expressions and
 142 function bodies in Kaleidoscope.
 143
 144 Parser Basics
 145 =============
 146
 147 Now that we have an AST to build, we need to define the parser code to
 148 build it. The idea here is that we want to parse something like "x+y"
 149 (which is returned as three tokens by the lexer) into an AST that could
 150 be generated with calls like this:
 151
 152 .. code-block:: c++
 153
 154       auto LHS = llvm::make_unique<VariableExprAST>("x");
 155       auto RHS = llvm::make_unique<VariableExprAST>("y");
 156       auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
 157                                                     std::move(RHS));
 158
 159 In order to do this, we'll start by defining some basic helper routines:
 160
 161 .. code-block:: c++
 162
 163     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
 164     /// token the parser is looking at.  getNextToken reads another token from the
 165     /// lexer and updates CurTok with its results.
 166     static int CurTok;
 167     static int getNextToken() {
 168       return CurTok = gettok();
 169     }
 170
 171 This implements a simple token buffer around the lexer. This allows us
 172 to look one token ahead at what the lexer is returning. Every function
 173 in our parser will assume that CurTok is the current token that needs to
 174 be parsed.
 175
 176 .. code-block:: c++
 177
 178
 179     /// Error* - These are little helper functions for error handling.
 180     std::unique_ptr<ExprAST> Error(const char *Str) {
 181       fprintf(stderr, "Error: %s\n", Str);
 182       return nullptr;
 183     }
 184     std::unique_ptr<PrototypeAST> ErrorP(const char *Str) {
 185       Error(Str);
 186       return nullptr;
 187     }
 188
 189 The ``Error`` routines are simple helper routines that our parser will
 190 use to handle errors. The error recovery in our parser will not be the
 191 best and is not particular user-friendly, but it will be enough for our
 192 tutorial. These routines make it easier to handle errors in routines
 193 that have various return types: they always return null.
 194
 195 With these basic helper functions, we can implement the first piece of
 196 our grammar: numeric literals.
 197
 198 Basic Expression Parsing
 199 ========================
 200
 201 We start with numeric literals, because they are the simplest to
 202 process. For each production in our grammar, we'll define a function
 203 which parses that production. For numeric literals, we have:
 204
 205 .. code-block:: c++
 206
 207     /// numberexpr ::= number
 208     static std::unique_ptr<ExprAST> ParseNumberExpr() {
 209       auto Result = llvm::make_unique<NumberExprAST>(NumVal);
 210       getNextToken(); // consume the number
 211       return std::move(Result);
 212     }
 213
 214 This routine is very simple: it expects to be called when the current
 215 token is a ``tok_number`` token. It takes the current number value,
 216 creates a ``NumberExprAST`` node, advances the lexer to the next token,
 217 and finally returns.
 218
 219 There are some interesting aspects to this. The most important one is
 220 that this routine eats all of the tokens that correspond to the
 221 production and returns the lexer buffer with the next token (which is
 222 not part of the grammar production) ready to go. This is a fairly
 223 standard way to go for recursive descent parsers. For a better example,
 224 the parenthesis operator is defined like this:
 225
 226 .. code-block:: c++
 227
 228     /// parenexpr ::= '(' expression ')'
 229     static std::unique_ptr<ExprAST> ParseParenExpr() {
 230       getNextToken(); // eat (.
 231       auto V = ParseExpression();
 232       if (!V)
 233         return nullptr;
 234
 235       if (CurTok != ')')
 236         return Error("expected ')'");
 237       getNextToken(); // eat ).
 238       return V;
 239     }
 240
 241 This function illustrates a number of interesting things about the
 242 parser:
 243
 244 1) It shows how we use the Error routines. When called, this function
 245 expects that the current token is a '(' token, but after parsing the
 246 subexpression, it is possible that there is no ')' waiting. For example,
 247 if the user types in "(4 x" instead of "(4)", the parser should emit an
 248 error. Because errors can occur, the parser needs a way to indicate that
 249 they happened: in our parser, we return null on an error.
 250
 251 2) Another interesting aspect of this function is that it uses recursion
 252 by calling ``ParseExpression`` (we will soon see that
 253 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
 254 because it allows us to handle recursive grammars, and keeps each
 255 production very simple. Note that parentheses do not cause construction
 256 of AST nodes themselves. While we could do it this way, the most
 257 important role of parentheses are to guide the parser and provide
 258 grouping. Once the parser constructs the AST, parentheses are not
 259 needed.
 260
 261 The next simple production is for handling variable references and
 262 function calls:
 263
 264 .. code-block:: c++
 265
 266     /// identifierexpr
 267     ///   ::= identifier
 268     ///   ::= identifier '(' expression* ')'
 269     static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
 270       std::string IdName = IdentifierStr;
 271
 272       getNextToken();  // eat identifier.
 273
 274       if (CurTok != '(') // Simple variable ref.
 275         return llvm::make_unique<VariableExprAST>(IdName);
 276
 277       // Call.
 278       getNextToken();  // eat (
 279       std::vector<std::unique_ptr<ExprAST>> Args;
 280       if (CurTok != ')') {
 281         while (1) {
 282           if (auto Arg = ParseExpression())
 283             Args.push_back(std::move(Arg));
 284           else
 285             return nullptr;
 286
 287           if (CurTok == ')')
 288             break;
 289
 290           if (CurTok != ',')
 291             return Error("Expected ')' or ',' in argument list");
 292           getNextToken();
 293         }
 294       }
 295
 296       // Eat the ')'.
 297       getNextToken();
 298
 299       return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
 300     }
 301
 302 This routine follows the same style as the other routines. (It expects
 303 to be called if the current token is a ``tok_identifier`` token). It
 304 also has recursion and error handling. One interesting aspect of this is
 305 that it uses *look-ahead* to determine if the current identifier is a
 306 stand alone variable reference or if it is a function call expression.
 307 It handles this by checking to see if the token after the identifier is
 308 a '(' token, constructing either a ``VariableExprAST`` or
 309 ``CallExprAST`` node as appropriate.
 310
 311 Now that we have all of our simple expression-parsing logic in place, we
 312 can define a helper function to wrap it together into one entry point.
 313 We call this class of expressions "primary" expressions, for reasons
 314 that will become more clear `later in the
 315 tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
 316 primary expression, we need to determine what sort of expression it is:
 317
 318 .. code-block:: c++
 319
 320     /// primary
 321     ///   ::= identifierexpr
 322     ///   ::= numberexpr
 323     ///   ::= parenexpr
 324     static std::unique_ptr<ExprAST> ParsePrimary() {
 325       switch (CurTok) {
 326       default:
 327         return Error("unknown token when expecting an expression");
 328       case tok_identifier:
 329         return ParseIdentifierExpr();
 330       case tok_number:
 331         return ParseNumberExpr();
 332       case '(':
 333         return ParseParenExpr();
 334       }
 335     }
 336
 337 Now that you see the definition of this function, it is more obvious why
 338 we can assume the state of CurTok in the various functions. This uses
 339 look-ahead to determine which sort of expression is being inspected, and
 340 then parses it with a function call.
 341
 342 Now that basic expressions are handled, we need to handle binary
 343 expressions. They are a bit more complex.
 344
 345 Binary Expression Parsing
 346 =========================
 347
 348 Binary expressions are significantly harder to parse because they are
 349 often ambiguous. For example, when given the string "x+y\*z", the parser
 350 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
 351 definitions from mathematics, we expect the later parse, because "\*"
 352 (multiplication) has higher *precedence* than "+" (addition).
 353
 354 There are many ways to handle this, but an elegant and efficient way is
 355 to use `Operator-Precedence
 356 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
 357 This parsing technique uses the precedence of binary operators to guide
 358 recursion. To start with, we need a table of precedences:
 359
 360 .. code-block:: c++
 361
 362     /// BinopPrecedence - This holds the precedence for each binary operator that is
 363     /// defined.
 364     static std::map<char, int> BinopPrecedence;
 365
 366     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
 367     static int GetTokPrecedence() {
 368       if (!isascii(CurTok))
 369         return -1;
 370
 371       // Make sure it's a declared binop.
 372       int TokPrec = BinopPrecedence[CurTok];
 373       if (TokPrec <= 0) return -1;
 374       return TokPrec;
 375     }
 376
 377     int main() {
 378       // Install standard binary operators.
 379       // 1 is lowest precedence.
 380       BinopPrecedence['<'] = 10;
 381       BinopPrecedence['+'] = 20;
 382       BinopPrecedence['-'] = 20;
 383       BinopPrecedence['*'] = 40;  // highest.
 384       ...
 385     }
 386
 387 For the basic form of Kaleidoscope, we will only support 4 binary
 388 operators (this can obviously be extended by you, our brave and intrepid
 389 reader). The ``GetTokPrecedence`` function returns the precedence for
 390 the current token, or -1 if the token is not a binary operator. Having a
 391 map makes it easy to add new operators and makes it clear that the
 392 algorithm doesn't depend on the specific operators involved, but it
 393 would be easy enough to eliminate the map and do the comparisons in the
 394 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
 395
 396 With the helper above defined, we can now start parsing binary
 397 expressions. The basic idea of operator precedence parsing is to break
 398 down an expression with potentially ambiguous binary operators into
 399 pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
 400 Operator precedence parsing considers this as a stream of primary
 401 expressions separated by binary operators. As such, it will first parse
 402 the leading primary expression "a", then it will see the pairs [+, b]
 403 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
 404 primary expressions, the binary expression parser doesn't need to worry
 405 about nested subexpressions like (c+d) at all.
 406
 407 To start, an expression is a primary expression potentially followed by
 408 a sequence of [binop,primaryexpr] pairs:
 409
 410 .. code-block:: c++
 411
 412     /// expression
 413     ///   ::= primary binoprhs
 414     ///
 415     static std::unique_ptr<ExprAST> ParseExpression() {
 416       auto LHS = ParsePrimary();
 417       if (!LHS)
 418         return nullptr;
 419
 420       return ParseBinOpRHS(0, std::move(LHS));
 421     }
 422
 423 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
 424 us. It takes a precedence and a pointer to an expression for the part
 425 that has been parsed so far. Note that "x" is a perfectly valid
 426 expression: As such, "binoprhs" is allowed to be empty, in which case it
 427 returns the expression that is passed into it. In our example above, the
 428 code passes the expression for "a" into ``ParseBinOpRHS`` and the
 429 current token is "+".
 430
 431 The precedence value passed into ``ParseBinOpRHS`` indicates the
 432 *minimal operator precedence* that the function is allowed to eat. For
 433 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
 434 passed in a precedence of 40, it will not consume any tokens (because
 435 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
 436 starts with:
 437
 438 .. code-block:: c++
 439
 440     /// binoprhs
 441     ///   ::= ('+' primary)*
 442     static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
 443                                                   std::unique_ptr<ExprAST> LHS) {
 444       // If this is a binop, find its precedence.
 445       while (1) {
 446         int TokPrec = GetTokPrecedence();
 447
 448         // If this is a binop that binds at least as tightly as the current binop,
 449         // consume it, otherwise we are done.
 450         if (TokPrec < ExprPrec)
 451           return LHS;
 452
 453 This code gets the precedence of the current token and checks to see if
 454 if is too low. Because we defined invalid tokens to have a precedence of
 455 -1, this check implicitly knows that the pair-stream ends when the token
 456 stream runs out of binary operators. If this check succeeds, we know
 457 that the token is a binary operator and that it will be included in this
 458 expression:
 459
 460 .. code-block:: c++
 461
 462         // Okay, we know this is a binop.
 463         int BinOp = CurTok;
 464         getNextToken();  // eat binop
 465
 466         // Parse the primary expression after the binary operator.
 467         auto RHS = ParsePrimary();
 468         if (!RHS)
 469           return nullptr;
 470
 471 As such, this code eats (and remembers) the binary operator and then
 472 parses the primary expression that follows. This builds up the whole
 473 pair, the first of which is [+, b] for the running example.
 474
 475 Now that we parsed the left-hand side of an expression and one pair of
 476 the RHS sequence, we have to decide which way the expression associates.
 477 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
 478 unparsed)". To determine this, we look ahead at "binop" to determine its
 479 precedence and compare it to BinOp's precedence (which is '+' in this
 480 case):
 481
 482 .. code-block:: c++
 483
 484         // If BinOp binds less tightly with RHS than the operator after RHS, let
 485         // the pending operator take RHS as its LHS.
 486         int NextPrec = GetTokPrecedence();
 487         if (TokPrec < NextPrec) {
 488
 489 If the precedence of the binop to the right of "RHS" is lower or equal
 490 to the precedence of our current operator, then we know that the
 491 parentheses associate as "(a+b) binop ...". In our example, the current
 492 operator is "+" and the next operator is "+", we know that they have the
 493 same precedence. In this case we'll create the AST node for "a+b", and
 494 then continue parsing:
 495
 496 .. code-block:: c++
 497
 498           ... if body omitted ...
 499         }
 500
 501         // Merge LHS/RHS.
 502         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 503                                                std::move(RHS));
 504       }  // loop around to the top of the while loop.
 505     }
 506
 507 In our example above, this will turn "a+b+" into "(a+b)" and execute the
 508 next iteration of the loop, with "+" as the current token. The code
 509 above will eat, remember, and parse "(c+d)" as the primary expression,
 510 which makes the current pair equal to [+, (c+d)]. It will then evaluate
 511 the 'if' conditional above with "\*" as the binop to the right of the
 512 primary. In this case, the precedence of "\*" is higher than the
 513 precedence of "+" so the if condition will be entered.
 514
 515 The critical question left here is "how can the if condition parse the
 516 right hand side in full"? In particular, to build the AST correctly for
 517 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
 518 variable. The code to do this is surprisingly simple (code from the
 519 above two blocks duplicated for context):
 520
 521 .. code-block:: c++
 522
 523         // If BinOp binds less tightly with RHS than the operator after RHS, let
 524         // the pending operator take RHS as its LHS.
 525         int NextPrec = GetTokPrecedence();
 526         if (TokPrec < NextPrec) {
 527           RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
 528           if (!RHS)
 529             return nullptr;
 530         }
 531         // Merge LHS/RHS.
 532         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 533                                                std::move(RHS));
 534       }  // loop around to the top of the while loop.
 535     }
 536
 537 At this point, we know that the binary operator to the RHS of our
 538 primary has higher precedence than the binop we are currently parsing.
 539 As such, we know that any sequence of pairs whose operators are all
 540 higher precedence than "+" should be parsed together and returned as
 541 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
 542 specifying "TokPrec+1" as the minimum precedence required for it to
 543 continue. In our example above, this will cause it to return the AST
 544 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
 545 expression.
 546
 547 Finally, on the next iteration of the while loop, the "+g" piece is
 548 parsed and added to the AST. With this little bit of code (14
 549 non-trivial lines), we correctly handle fully general binary expression
 550 parsing in a very elegant way. This was a whirlwind tour of this code,
 551 and it is somewhat subtle. I recommend running through it with a few
 552 tough examples to see how it works.
 553
 554 This wraps up handling of expressions. At this point, we can point the
 555 parser at an arbitrary token stream and build an expression from it,
 556 stopping at the first token that is not part of the expression. Next up
 557 we need to handle function definitions, etc.
 558
 559 Parsing the Rest
 560 ================
 561
 562 The next thing missing is handling of function prototypes. In
 563 Kaleidoscope, these are used both for 'extern' function declarations as
 564 well as function body definitions. The code to do this is
 565 straight-forward and not very interesting (once you've survived
 566 expressions):
 567
 568 .. code-block:: c++
 569
 570     /// prototype
 571     ///   ::= id '(' id* ')'
 572     static std::unique_ptr<PrototypeAST> ParsePrototype() {
 573       if (CurTok != tok_identifier)
 574         return ErrorP("Expected function name in prototype");
 575
 576       std::string FnName = IdentifierStr;
 577       getNextToken();
 578
 579       if (CurTok != '(')
 580         return ErrorP("Expected '(' in prototype");
 581
 582       // Read the list of argument names.
 583       std::vector<std::string> ArgNames;
 584       while (getNextToken() == tok_identifier)
 585         ArgNames.push_back(IdentifierStr);
 586       if (CurTok != ')')
 587         return ErrorP("Expected ')' in prototype");
 588
 589       // success.
 590       getNextToken();  // eat ')'.
 591
 592       return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
 593     }
 594
 595 Given this, a function definition is very simple, just a prototype plus
 596 an expression to implement the body:
 597
 598 .. code-block:: c++
 599
 600     /// definition ::= 'def' prototype expression
 601     static std::unique_ptr<FunctionAST> ParseDefinition() {
 602       getNextToken();  // eat def.
 603       auto Proto = ParsePrototype();
 604       if (!Proto) return nullptr;
 605
 606       if (auto E = ParseExpression())
 607         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 608       return nullptr;
 609     }
 610
 611 In addition, we support 'extern' to declare functions like 'sin' and
 612 'cos' as well as to support forward declaration of user functions. These
 613 'extern's are just prototypes with no body:
 614
 615 .. code-block:: c++
 616
 617     /// external ::= 'extern' prototype
 618     static std::unique_ptr<PrototypeAST> ParseExtern() {
 619       getNextToken();  // eat extern.
 620       return ParsePrototype();
 621     }
 622
 623 Finally, we'll also let the user type in arbitrary top-level expressions
 624 and evaluate them on the fly. We will handle this by defining anonymous
 625 nullary (zero argument) functions for them:
 626
 627 .. code-block:: c++
 628
 629     /// toplevelexpr ::= expression
 630     static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
 631       if (auto E = ParseExpression()) {
 632         // Make an anonymous proto.
 633         auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
 634         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 635       }
 636       return nullptr;
 637     }
 638
 639 Now that we have all the pieces, let's build a little driver that will
 640 let us actually *execute* this code we've built!
 641
 642 The Driver
 643 ==========
 644
 645 The driver for this simply invokes all of the parsing pieces with a
 646 top-level dispatch loop. There isn't much interesting here, so I'll just
 647 include the top-level loop. See `below <#code>`_ for full code in the
 648 "Top-Level Parsing" section.
 649
 650 .. code-block:: c++
 651
 652     /// top ::= definition | external | expression | ';'
 653     static void MainLoop() {
 654       while (1) {
 655         fprintf(stderr, "ready> ");
 656         switch (CurTok) {
 657         case tok_eof:
 658           return;
 659         case ';': // ignore top-level semicolons.
 660           getNextToken();
 661           break;
 662         case tok_def:
 663           HandleDefinition();
 664           break;
 665         case tok_extern:
 666           HandleExtern();
 667           break;
 668         default:
 669           HandleTopLevelExpression();
 670           break;
 671         }
 672       }
 673     }
 674
 675 The most interesting part of this is that we ignore top-level
 676 semicolons. Why is this, you ask? The basic reason is that if you type
 677 "4 + 5" at the command line, the parser doesn't know whether that is the
 678 end of what you will type or not. For example, on the next line you
 679 could type "def foo..." in which case 4+5 is the end of a top-level
 680 expression. Alternatively you could type "\* 6", which would continue
 681 the expression. Having top-level semicolons allows you to type "4+5;",
 682 and the parser will know you are done.
 683
 684 Conclusions
 685 ===========
 686
 687 With just under 400 lines of commented code (240 lines of non-comment,
 688 non-blank code), we fully defined our minimal language, including a
 689 lexer, parser, and AST builder. With this done, the executable will
 690 validate Kaleidoscope code and tell us if it is grammatically invalid.
 691 For example, here is a sample interaction:
 692
 693 .. code-block:: bash
 694
 695     $ ./a.out
 696     ready> def foo(x y) x+foo(y, 4.0);
 697     Parsed a function definition.
 698     ready> def foo(x y) x+y y;
 699     Parsed a function definition.
 700     Parsed a top-level expr
 701     ready> def foo(x y) x+y );
 702     Parsed a function definition.
 703     Error: unknown token when expecting an expression
 704     ready> extern sin(a);
 705     ready> Parsed an extern
 706     ready> ^D
 707     $
 708
 709 There is a lot of room for extension here. You can define new AST nodes,
 710 extend the language in many ways, etc. In the `next
 711 installment <LangImpl3.html>`_, we will describe how to generate LLVM
 712 Intermediate Representation (IR) from the AST.
 713
 714 Full Code Listing
 715 =================
 716
 717 Here is the complete code listing for this and the previous chapter.
 718 Note that it is fully self-contained: you don't need LLVM or any
 719 external libraries at all for this. (Besides the C and C++ standard
 720 libraries, of course.) To build this, just compile with:
 721
 722 .. code-block:: bash
 723
 724     # Compile
 725     clang++ -g -O3 toy.cpp
 726     # Run
 727     ./a.out
 728
 729 Here is the code:
 730
 731 .. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
 732    :language: c++
 733
 734 `Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_
 735