docs/tutorial/LangImpl2.rst

   1 ===========================================
   2 Kaleidoscope: Implementing a Parser and AST
   3 ===========================================
   4
   5 .. contents::
   6    :local:
   7
   8 Chapter 2 Introduction
   9 ======================
  10
  11 Welcome to Chapter 2 of the "`Implementing a language with
  12 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
  13 lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
  14 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
  15 language. Once we have a parser, we'll define and build an `Abstract
  16 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
  17
  18 The parser we will build uses a combination of `Recursive Descent
  19 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
  20 `Operator-Precedence
  21 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
  22 parse the Kaleidoscope language (the latter for binary expressions and
  23 the former for everything else). Before we get to parsing though, lets
  24 talk about the output of the parser: the Abstract Syntax Tree.
  25
  26 The Abstract Syntax Tree (AST)
  27 ==============================
  28
  29 The AST for a program captures its behavior in such a way that it is
  30 easy for later stages of the compiler (e.g. code generation) to
  31 interpret. We basically want one object for each construct in the
  32 language, and the AST should closely model the language. In
  33 Kaleidoscope, we have expressions, a prototype, and a function object.
  34 We'll start with expressions first:
  35
  36 .. code-block:: c++
  37
  38     /// ExprAST - Base class for all expression nodes.
  39     class ExprAST {
  40     public:
  41       virtual ~ExprAST() {}
  42     };
  43
  44     /// NumberExprAST - Expression class for numeric literals like "1.0".
  45     class NumberExprAST : public ExprAST {
  46       double Val;
  47     public:
  48       NumberExprAST(double Val) : Val(Val) {}
  49     };
  50
  51 The code above shows the definition of the base ExprAST class and one
  52 subclass which we use for numeric literals. The important thing to note
  53 about this code is that the NumberExprAST class captures the numeric
  54 value of the literal as an instance variable. This allows later phases
  55 of the compiler to know what the stored numeric value is.
  56
  57 Right now we only create the AST, so there are no useful accessor
  58 methods on them. It would be very easy to add a virtual method to pretty
  59 print the code, for example. Here are the other expression AST node
  60 definitions that we'll use in the basic form of the Kaleidoscope
  61 language:
  62
  63 .. code-block:: c++
  64
  65     /// VariableExprAST - Expression class for referencing a variable, like "a".
  66     class VariableExprAST : public ExprAST {
  67       std::string Name;
  68     public:
  69       VariableExprAST(const std::string &Name) : Name(Name) {}
  70     };
  71
  72     /// BinaryExprAST - Expression class for a binary operator.
  73     class BinaryExprAST : public ExprAST {
  74       char Op;
  75       std::unique_ptr<ExprAST> LHS, RHS;
  76     public:
  77       BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
  78                     std::unique_ptr<ExprAST> RHS)
  79         : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
  80     };
  81
  82     /// CallExprAST - Expression class for function calls.
  83     class CallExprAST : public ExprAST {
  84       std::string Callee;
  85       std::vector<ExprAST*> Args;
  86     public:
  87       CallExprAST(const std::string &Callee,
  88                   std::vector<std::unique_ptr<ExprAST>> Args)
  89         : Callee(Callee), Args(std::move(Args)) {}
  90     };
  91
  92 This is all (intentionally) rather straight-forward: variables capture
  93 the variable name, binary operators capture their opcode (e.g. '+'), and
  94 calls capture a function name as well as a list of any argument
  95 expressions. One thing that is nice about our AST is that it captures
  96 the language features without talking about the syntax of the language.
  97 Note that there is no discussion about precedence of binary operators,
  98 lexical structure, etc.
  99
 100 For our basic language, these are all of the expression nodes we'll
 101 define. Because it doesn't have conditional control flow, it isn't
 102 Turing-complete; we'll fix that in a later installment. The two things
 103 we need next are a way to talk about the interface to a function, and a
 104 way to talk about functions themselves:
 105
 106 .. code-block:: c++
 107
 108     /// PrototypeAST - This class represents the "prototype" for a function,
 109     /// which captures its name, and its argument names (thus implicitly the number
 110     /// of arguments the function takes).
 111     class PrototypeAST {
 112       std::string Name;
 113       std::vector<std::string> Args;
 114     public:
 115       PrototypeAST(const std::string &name, std::vector<std::string> Args)
 116         : Name(name), Args(std::move(Args)) {}
 117     };
 118
 119     /// FunctionAST - This class represents a function definition itself.
 120     class FunctionAST {
 121       std::unique_ptr<PrototypeAST> Proto;
 122       std::unique_ptr<ExprAST> Body;
 123     public:
 124       FunctionAST(std::unique_ptr<PrototypeAST> Proto,
 125                   std::unique_ptr<ExprAST> Body)
 126         : Proto(std::move(Proto)), Body(std::move(Body)) {}
 127     };
 128
 129 In Kaleidoscope, functions are typed with just a count of their
 130 arguments. Since all values are double precision floating point, the
 131 type of each argument doesn't need to be stored anywhere. In a more
 132 aggressive and realistic language, the "ExprAST" class would probably
 133 have a type field.
 134
 135 With this scaffolding, we can now talk about parsing expressions and
 136 function bodies in Kaleidoscope.
 137
 138 Parser Basics
 139 =============
 140
 141 Now that we have an AST to build, we need to define the parser code to
 142 build it. The idea here is that we want to parse something like "x+y"
 143 (which is returned as three tokens by the lexer) into an AST that could
 144 be generated with calls like this:
 145
 146 .. code-block:: c++
 147
 148       auto LHS = llvm::make_unique<VariableExprAST>("x");
 149       auto RHS = llvm::make_unique<VariableExprAST>("y");
 150       auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
 151                                                     std::move(RHS));
 152
 153 In order to do this, we'll start by defining some basic helper routines:
 154
 155 .. code-block:: c++
 156
 157     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
 158     /// token the parser is looking at.  getNextToken reads another token from the
 159     /// lexer and updates CurTok with its results.
 160     static int CurTok;
 161     static int getNextToken() {
 162       return CurTok = gettok();
 163     }
 164
 165 This implements a simple token buffer around the lexer. This allows us
 166 to look one token ahead at what the lexer is returning. Every function
 167 in our parser will assume that CurTok is the current token that needs to
 168 be parsed.
 169
 170 .. code-block:: c++
 171
 172
 173     /// Error* - These are little helper functions for error handling.
 174     ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
 175     PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
 176     FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
 177
 178 The ``Error`` routines are simple helper routines that our parser will
 179 use to handle errors. The error recovery in our parser will not be the
 180 best and is not particular user-friendly, but it will be enough for our
 181 tutorial. These routines make it easier to handle errors in routines
 182 that have various return types: they always return null.
 183
 184 With these basic helper functions, we can implement the first piece of
 185 our grammar: numeric literals.
 186
 187 Basic Expression Parsing
 188 ========================
 189
 190 We start with numeric literals, because they are the simplest to
 191 process. For each production in our grammar, we'll define a function
 192 which parses that production. For numeric literals, we have:
 193
 194 .. code-block:: c++
 195
 196     /// numberexpr ::= number
 197     static std::unique_ptr<ExprAST> ParseNumberExpr() {
 198       auto Result = llvm::make_unique<NumberExprAST>(NumVal);
 199       getNextToken(); // consume the number
 200       return std::move(Result);
 201     }
 202
 203 This routine is very simple: it expects to be called when the current
 204 token is a ``tok_number`` token. It takes the current number value,
 205 creates a ``NumberExprAST`` node, advances the lexer to the next token,
 206 and finally returns.
 207
 208 There are some interesting aspects to this. The most important one is
 209 that this routine eats all of the tokens that correspond to the
 210 production and returns the lexer buffer with the next token (which is
 211 not part of the grammar production) ready to go. This is a fairly
 212 standard way to go for recursive descent parsers. For a better example,
 213 the parenthesis operator is defined like this:
 214
 215 .. code-block:: c++
 216
 217     /// parenexpr ::= '(' expression ')'
 218     static std::unique_ptr<ExprAST> ParseParenExpr() {
 219       getNextToken();  // eat (.
 220       auto V = ParseExpression();
 221       if (!V) return nullptr;
 222
 223       if (CurTok != ')')
 224         return Error("expected ')'");
 225       getNextToken();  // eat ).
 226       return V;
 227     }
 228
 229 This function illustrates a number of interesting things about the
 230 parser:
 231
 232 1) It shows how we use the Error routines. When called, this function
 233 expects that the current token is a '(' token, but after parsing the
 234 subexpression, it is possible that there is no ')' waiting. For example,
 235 if the user types in "(4 x" instead of "(4)", the parser should emit an
 236 error. Because errors can occur, the parser needs a way to indicate that
 237 they happened: in our parser, we return null on an error.
 238
 239 2) Another interesting aspect of this function is that it uses recursion
 240 by calling ``ParseExpression`` (we will soon see that
 241 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
 242 because it allows us to handle recursive grammars, and keeps each
 243 production very simple. Note that parentheses do not cause construction
 244 of AST nodes themselves. While we could do it this way, the most
 245 important role of parentheses are to guide the parser and provide
 246 grouping. Once the parser constructs the AST, parentheses are not
 247 needed.
 248
 249 The next simple production is for handling variable references and
 250 function calls:
 251
 252 .. code-block:: c++
 253
 254     /// identifierexpr
 255     ///   ::= identifier
 256     ///   ::= identifier '(' expression* ')'
 257     static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
 258       std::string IdName = IdentifierStr;
 259
 260       getNextToken();  // eat identifier.
 261
 262       if (CurTok != '(') // Simple variable ref.
 263         return llvm::make_unique<VariableExprAST>(IdName);
 264
 265       // Call.
 266       getNextToken();  // eat (
 267       std::vector<std::unique_ptr<ExprAST>> Args;
 268       if (CurTok != ')') {
 269         while (1) {
 270           auto Arg = ParseExpression();
 271           if (!Arg) return nullptr;
 272           Args.push_back(std::move(Arg));
 273
 274           if (CurTok == ')') break;
 275
 276           if (CurTok != ',')
 277             return Error("Expected ')' or ',' in argument list");
 278           getNextToken();
 279         }
 280       }
 281
 282       // Eat the ')'.
 283       getNextToken();
 284
 285       return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
 286     }
 287
 288 This routine follows the same style as the other routines. (It expects
 289 to be called if the current token is a ``tok_identifier`` token). It
 290 also has recursion and error handling. One interesting aspect of this is
 291 that it uses *look-ahead* to determine if the current identifier is a
 292 stand alone variable reference or if it is a function call expression.
 293 It handles this by checking to see if the token after the identifier is
 294 a '(' token, constructing either a ``VariableExprAST`` or
 295 ``CallExprAST`` node as appropriate.
 296
 297 Now that we have all of our simple expression-parsing logic in place, we
 298 can define a helper function to wrap it together into one entry point.
 299 We call this class of expressions "primary" expressions, for reasons
 300 that will become more clear `later in the
 301 tutorial <LangImpl6.html#unary>`_. In order to parse an arbitrary
 302 primary expression, we need to determine what sort of expression it is:
 303
 304 .. code-block:: c++
 305
 306     /// primary
 307     ///   ::= identifierexpr
 308     ///   ::= numberexpr
 309     ///   ::= parenexpr
 310     static std::unique_ptr<ExprAST> ParsePrimary() {
 311       switch (CurTok) {
 312       default: return Error("unknown token when expecting an expression");
 313       case tok_identifier: return ParseIdentifierExpr();
 314       case tok_number:     return ParseNumberExpr();
 315       case '(':            return ParseParenExpr();
 316       }
 317     }
 318
 319 Now that you see the definition of this function, it is more obvious why
 320 we can assume the state of CurTok in the various functions. This uses
 321 look-ahead to determine which sort of expression is being inspected, and
 322 then parses it with a function call.
 323
 324 Now that basic expressions are handled, we need to handle binary
 325 expressions. They are a bit more complex.
 326
 327 Binary Expression Parsing
 328 =========================
 329
 330 Binary expressions are significantly harder to parse because they are
 331 often ambiguous. For example, when given the string "x+y\*z", the parser
 332 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
 333 definitions from mathematics, we expect the later parse, because "\*"
 334 (multiplication) has higher *precedence* than "+" (addition).
 335
 336 There are many ways to handle this, but an elegant and efficient way is
 337 to use `Operator-Precedence
 338 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
 339 This parsing technique uses the precedence of binary operators to guide
 340 recursion. To start with, we need a table of precedences:
 341
 342 .. code-block:: c++
 343
 344     /// BinopPrecedence - This holds the precedence for each binary operator that is
 345     /// defined.
 346     static std::map<char, int> BinopPrecedence;
 347
 348     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
 349     static int GetTokPrecedence() {
 350       if (!isascii(CurTok))
 351         return -1;
 352
 353       // Make sure it's a declared binop.
 354       int TokPrec = BinopPrecedence[CurTok];
 355       if (TokPrec <= 0) return -1;
 356       return TokPrec;
 357     }
 358
 359     int main() {
 360       // Install standard binary operators.
 361       // 1 is lowest precedence.
 362       BinopPrecedence['<'] = 10;
 363       BinopPrecedence['+'] = 20;
 364       BinopPrecedence['-'] = 20;
 365       BinopPrecedence['*'] = 40;  // highest.
 366       ...
 367     }
 368
 369 For the basic form of Kaleidoscope, we will only support 4 binary
 370 operators (this can obviously be extended by you, our brave and intrepid
 371 reader). The ``GetTokPrecedence`` function returns the precedence for
 372 the current token, or -1 if the token is not a binary operator. Having a
 373 map makes it easy to add new operators and makes it clear that the
 374 algorithm doesn't depend on the specific operators involved, but it
 375 would be easy enough to eliminate the map and do the comparisons in the
 376 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
 377
 378 With the helper above defined, we can now start parsing binary
 379 expressions. The basic idea of operator precedence parsing is to break
 380 down an expression with potentially ambiguous binary operators into
 381 pieces. Consider ,for example, the expression "a+b+(c+d)\*e\*f+g".
 382 Operator precedence parsing considers this as a stream of primary
 383 expressions separated by binary operators. As such, it will first parse
 384 the leading primary expression "a", then it will see the pairs [+, b]
 385 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
 386 primary expressions, the binary expression parser doesn't need to worry
 387 about nested subexpressions like (c+d) at all.
 388
 389 To start, an expression is a primary expression potentially followed by
 390 a sequence of [binop,primaryexpr] pairs:
 391
 392 .. code-block:: c++
 393
 394     /// expression
 395     ///   ::= primary binoprhs
 396     ///
 397     static std::unique_ptr<ExprAST> ParseExpression() {
 398       auto LHS = ParsePrimary();
 399       if (!LHS) return nullptr;
 400
 401       return ParseBinOpRHS(0, std::move(LHS));
 402     }
 403
 404 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
 405 us. It takes a precedence and a pointer to an expression for the part
 406 that has been parsed so far. Note that "x" is a perfectly valid
 407 expression: As such, "binoprhs" is allowed to be empty, in which case it
 408 returns the expression that is passed into it. In our example above, the
 409 code passes the expression for "a" into ``ParseBinOpRHS`` and the
 410 current token is "+".
 411
 412 The precedence value passed into ``ParseBinOpRHS`` indicates the
 413 *minimal operator precedence* that the function is allowed to eat. For
 414 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
 415 passed in a precedence of 40, it will not consume any tokens (because
 416 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
 417 starts with:
 418
 419 .. code-block:: c++
 420
 421     /// binoprhs
 422     ///   ::= ('+' primary)*
 423     static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
 424                                                   std::unique_ptr<ExprAST> LHS) {
 425       // If this is a binop, find its precedence.
 426       while (1) {
 427         int TokPrec = GetTokPrecedence();
 428
 429         // If this is a binop that binds at least as tightly as the current binop,
 430         // consume it, otherwise we are done.
 431         if (TokPrec < ExprPrec)
 432           return LHS;
 433
 434 This code gets the precedence of the current token and checks to see if
 435 if is too low. Because we defined invalid tokens to have a precedence of
 436 -1, this check implicitly knows that the pair-stream ends when the token
 437 stream runs out of binary operators. If this check succeeds, we know
 438 that the token is a binary operator and that it will be included in this
 439 expression:
 440
 441 .. code-block:: c++
 442
 443         // Okay, we know this is a binop.
 444         int BinOp = CurTok;
 445         getNextToken();  // eat binop
 446
 447         // Parse the primary expression after the binary operator.
 448         auto RHS = ParsePrimary();
 449         if (!RHS) return nullptr;
 450
 451 As such, this code eats (and remembers) the binary operator and then
 452 parses the primary expression that follows. This builds up the whole
 453 pair, the first of which is [+, b] for the running example.
 454
 455 Now that we parsed the left-hand side of an expression and one pair of
 456 the RHS sequence, we have to decide which way the expression associates.
 457 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
 458 unparsed)". To determine this, we look ahead at "binop" to determine its
 459 precedence and compare it to BinOp's precedence (which is '+' in this
 460 case):
 461
 462 .. code-block:: c++
 463
 464         // If BinOp binds less tightly with RHS than the operator after RHS, let
 465         // the pending operator take RHS as its LHS.
 466         int NextPrec = GetTokPrecedence();
 467         if (TokPrec < NextPrec) {
 468
 469 If the precedence of the binop to the right of "RHS" is lower or equal
 470 to the precedence of our current operator, then we know that the
 471 parentheses associate as "(a+b) binop ...". In our example, the current
 472 operator is "+" and the next operator is "+", we know that they have the
 473 same precedence. In this case we'll create the AST node for "a+b", and
 474 then continue parsing:
 475
 476 .. code-block:: c++
 477
 478           ... if body omitted ...
 479         }
 480
 481         // Merge LHS/RHS.
 482         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 483                                                std::move(RHS));
 484       }  // loop around to the top of the while loop.
 485     }
 486
 487 In our example above, this will turn "a+b+" into "(a+b)" and execute the
 488 next iteration of the loop, with "+" as the current token. The code
 489 above will eat, remember, and parse "(c+d)" as the primary expression,
 490 which makes the current pair equal to [+, (c+d)]. It will then evaluate
 491 the 'if' conditional above with "\*" as the binop to the right of the
 492 primary. In this case, the precedence of "\*" is higher than the
 493 precedence of "+" so the if condition will be entered.
 494
 495 The critical question left here is "how can the if condition parse the
 496 right hand side in full"? In particular, to build the AST correctly for
 497 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
 498 variable. The code to do this is surprisingly simple (code from the
 499 above two blocks duplicated for context):
 500
 501 .. code-block:: c++
 502
 503         // If BinOp binds less tightly with RHS than the operator after RHS, let
 504         // the pending operator take RHS as its LHS.
 505         int NextPrec = GetTokPrecedence();
 506         if (TokPrec < NextPrec) {
 507           RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
 508           if (RHS == 0) return 0;
 509         }
 510         // Merge LHS/RHS.
 511         LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 512                                                std::move(RHS));
 513       }  // loop around to the top of the while loop.
 514     }
 515
 516 At this point, we know that the binary operator to the RHS of our
 517 primary has higher precedence than the binop we are currently parsing.
 518 As such, we know that any sequence of pairs whose operators are all
 519 higher precedence than "+" should be parsed together and returned as
 520 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
 521 specifying "TokPrec+1" as the minimum precedence required for it to
 522 continue. In our example above, this will cause it to return the AST
 523 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
 524 expression.
 525
 526 Finally, on the next iteration of the while loop, the "+g" piece is
 527 parsed and added to the AST. With this little bit of code (14
 528 non-trivial lines), we correctly handle fully general binary expression
 529 parsing in a very elegant way. This was a whirlwind tour of this code,
 530 and it is somewhat subtle. I recommend running through it with a few
 531 tough examples to see how it works.
 532
 533 This wraps up handling of expressions. At this point, we can point the
 534 parser at an arbitrary token stream and build an expression from it,
 535 stopping at the first token that is not part of the expression. Next up
 536 we need to handle function definitions, etc.
 537
 538 Parsing the Rest
 539 ================
 540
 541 The next thing missing is handling of function prototypes. In
 542 Kaleidoscope, these are used both for 'extern' function declarations as
 543 well as function body definitions. The code to do this is
 544 straight-forward and not very interesting (once you've survived
 545 expressions):
 546
 547 .. code-block:: c++
 548
 549     /// prototype
 550     ///   ::= id '(' id* ')'
 551     static std::unique_ptr<PrototypeAST> ParsePrototype() {
 552       if (CurTok != tok_identifier)
 553         return ErrorP("Expected function name in prototype");
 554
 555       std::string FnName = IdentifierStr;
 556       getNextToken();
 557
 558       if (CurTok != '(')
 559         return ErrorP("Expected '(' in prototype");
 560
 561       // Read the list of argument names.
 562       std::vector<std::string> ArgNames;
 563       while (getNextToken() == tok_identifier)
 564         ArgNames.push_back(IdentifierStr);
 565       if (CurTok != ')')
 566         return ErrorP("Expected ')' in prototype");
 567
 568       // success.
 569       getNextToken();  // eat ')'.
 570
 571       return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
 572     }
 573
 574 Given this, a function definition is very simple, just a prototype plus
 575 an expression to implement the body:
 576
 577 .. code-block:: c++
 578
 579     /// definition ::= 'def' prototype expression
 580     static std::unique_ptr<FunctionAST> ParseDefinition() {
 581       getNextToken();  // eat def.
 582       auto Proto = ParsePrototype();
 583       if (!Proto) return nullptr;
 584
 585       if (auto E = ParseExpression())
 586         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 587       return nullptr;
 588     }
 589
 590 In addition, we support 'extern' to declare functions like 'sin' and
 591 'cos' as well as to support forward declaration of user functions. These
 592 'extern's are just prototypes with no body:
 593
 594 .. code-block:: c++
 595
 596     /// external ::= 'extern' prototype
 597     static std::unique_ptr<PrototypeAST> ParseExtern() {
 598       getNextToken();  // eat extern.
 599       return ParsePrototype();
 600     }
 601
 602 Finally, we'll also let the user type in arbitrary top-level expressions
 603 and evaluate them on the fly. We will handle this by defining anonymous
 604 nullary (zero argument) functions for them:
 605
 606 .. code-block:: c++
 607
 608     /// toplevelexpr ::= expression
 609     static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
 610       if (auto E = ParseExpression()) {
 611         // Make an anonymous proto.
 612         auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
 613         return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 614       }
 615       return nullptr;
 616     }
 617
 618 Now that we have all the pieces, let's build a little driver that will
 619 let us actually *execute* this code we've built!
 620
 621 The Driver
 622 ==========
 623
 624 The driver for this simply invokes all of the parsing pieces with a
 625 top-level dispatch loop. There isn't much interesting here, so I'll just
 626 include the top-level loop. See `below <#code>`_ for full code in the
 627 "Top-Level Parsing" section.
 628
 629 .. code-block:: c++
 630
 631     /// top ::= definition | external | expression | ';'
 632     static void MainLoop() {
 633       while (1) {
 634         fprintf(stderr, "ready> ");
 635         switch (CurTok) {
 636         case tok_eof:    return;
 637         case ';':        getNextToken(); break;  // ignore top-level semicolons.
 638         case tok_def:    HandleDefinition(); break;
 639         case tok_extern: HandleExtern(); break;
 640         default:         HandleTopLevelExpression(); break;
 641         }
 642       }
 643     }
 644
 645 The most interesting part of this is that we ignore top-level
 646 semicolons. Why is this, you ask? The basic reason is that if you type
 647 "4 + 5" at the command line, the parser doesn't know whether that is the
 648 end of what you will type or not. For example, on the next line you
 649 could type "def foo..." in which case 4+5 is the end of a top-level
 650 expression. Alternatively you could type "\* 6", which would continue
 651 the expression. Having top-level semicolons allows you to type "4+5;",
 652 and the parser will know you are done.
 653
 654 Conclusions
 655 ===========
 656
 657 With just under 400 lines of commented code (240 lines of non-comment,
 658 non-blank code), we fully defined our minimal language, including a
 659 lexer, parser, and AST builder. With this done, the executable will
 660 validate Kaleidoscope code and tell us if it is grammatically invalid.
 661 For example, here is a sample interaction:
 662
 663 .. code-block:: bash
 664
 665     $ ./a.out
 666     ready> def foo(x y) x+foo(y, 4.0);
 667     Parsed a function definition.
 668     ready> def foo(x y) x+y y;
 669     Parsed a function definition.
 670     Parsed a top-level expr
 671     ready> def foo(x y) x+y );
 672     Parsed a function definition.
 673     Error: unknown token when expecting an expression
 674     ready> extern sin(a);
 675     ready> Parsed an extern
 676     ready> ^D
 677     $
 678
 679 There is a lot of room for extension here. You can define new AST nodes,
 680 extend the language in many ways, etc. In the `next
 681 installment <LangImpl3.html>`_, we will describe how to generate LLVM
 682 Intermediate Representation (IR) from the AST.
 683
 684 Full Code Listing
 685 =================
 686
 687 Here is the complete code listing for this and the previous chapter.
 688 Note that it is fully self-contained: you don't need LLVM or any
 689 external libraries at all for this. (Besides the C and C++ standard
 690 libraries, of course.) To build this, just compile with:
 691
 692 .. code-block:: bash
 693
 694     # Compile
 695     clang++ -g -O3 toy.cpp
 696     # Run
 697     ./a.out
 698
 699 Here is the code:
 700
 701 .. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
 702    :language: c++
 703
 704 `Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_
 705