From 26b8aab72e6895d7851c0b4568a54920162dd736 Mon Sep 17 00:00:00 2001 From: Sean Silva Date: Mon, 7 Jan 2013 02:43:44 +0000 Subject: [PATCH] tblgen, docs: Add initial syntax reference. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171685 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/TableGen/LangRef.rst | 375 ++++++++++++++++++++++++++++++++++++++ docs/subsystems.rst | 1 + 2 files changed, 376 insertions(+) create mode 100644 docs/TableGen/LangRef.rst diff --git a/docs/TableGen/LangRef.rst b/docs/TableGen/LangRef.rst new file mode 100644 index 00000000000..34098a0eb6b --- /dev/null +++ b/docs/TableGen/LangRef.rst @@ -0,0 +1,375 @@ +=========================== +TableGen Language Reference +=========================== + +.. sectionauthor:: Sean Silva + +.. contents:: + :local: + +.. warning:: + This document is extremely rough. If you find something lacking, please + fix it, file a documentation bug, or ask about it on llvmdev. + +Introduction +============ + +This document is meant to be a normative spec about the TableGen language +in and of itself (i.e. how to understand a given construct in terms of how +it affects the final set of records represented by the TableGen file). If +you are unsure if this document is really what you are looking for, please +read :doc:`/TableGenFundamentals` first. + +Notation +======== + +The lexical and syntax notation used here is intended to imitate +`Python's`_. In particular, for lexical definitions, the productions +operate at the character level and there is no implied whitespace between +elements. The syntax definitions operate at the token level, so there is +implied whitespace between tokens. + +.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation + +Lexical Analysis +================ + +TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``) +comments. + +The following is a listing of the basic punctuation tokens:: + + - + [ ] { } ( ) < > : ; . = ? # + +Numeric literals take one of the following forms: + +.. TableGen actually will lex some pretty strange sequences an interpret + them as numbers. What is shown here is an attempt to approximate what it + "should" accept. + +.. productionlist:: + TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger` + DecimalInteger: ["+" | "-"] ("0"..."9")+ + HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+ + BinInteger: "0b" ("0" | "1")+ + +One aspect to note is that the :token:`DecimalInteger` token *includes* the +``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as +most languages do. + +TableGen has identifier-like tokens: + +.. productionlist:: + ualpha: "a"..."z" | "A"..."Z" | "_" + TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")* + TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")* + +Note that unlike most languages, TableGen allows :token:`TokIdentifier` to +begin with a number. In case of ambiguity, a token will be interpreted as a +numeric literal rather than an identifier. + +TableGen also has two string-like literals: + +.. productionlist:: + TokString: '"' '"' + TokCodeFragment: "[{" "}]" + +TableGen also has the following keywords:: + + bit bits class code dag + def foreach defm field in + int let list multiclass string + +TableGen also has "bang operators" which have a +wide variety of meanings:: + + !eq !if !head !tail !con + !shl !sra !srl + !cast !empty !subst !foreach !strconcat + +Syntax +====== + +TableGen has an ``include`` mechanism. It does not play a role in the +syntax per se, since it is lexically replaced with the contents of the +included file. + +.. productionlist:: + IncludeDirective: "include" `TokString` + +TableGen's top-level production consists of "objects". + +.. productionlist:: + TableGenFile: `Object`* + Object: `Class` | `Def` | `Defm` | `Let` | `MultiClass` | `Foreach` + +``class``\es +------------ + +.. productionlist:: + Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody` + +A ``class`` declaration creates a record which other records can inherit +from. A class can be parametrized by a list of "template arguments", whose +values can be used in the class body. + +A given class can only be defined once. A ``class`` declaration is +considered to define the class if any of the following is true: + +.. break ObjectBody into its consituents so that they are present here? + +#. The :token:`TemplateArgList` is present. +#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty. +#. The :token:`BaseClassList` in the :token:`ObjectBody` is present. + +You can declare an empty class by giving and empty :token:`TemplateArgList` +and an empty :token:`ObjectBody`. This can serve as a restricted form of +forward declaration: note that records deriving from the forward-declared +class will inherit no fields from it since the record expansion is done +when the record is parsed. + +.. productionlist:: + TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">" + +Declarations +------------ + +.. Omitting mention of arcane "field" prefix to discourage its use. + +The declaration syntax is pretty much what you would expect as a C++ +programmer. + +.. productionlist:: + Declaration: `Type` `TokIdentifier` ["=" `Value`] + +It assigns the value to the identifer. + +Types +----- + +.. productionlist:: + Type: "string" | "code" | "bit" | "int" | "dag" + :| "bits" "<" `TokInteger` ">" + :| "list" "<" `Type` ">" + :| `ClassID` + ClassID: `TokIdentifier` + +Both ``string`` and ``code`` correspond to the string type; the difference +is purely to indicate programmer intention. + +The :token:`ClassID` must identify a class that has been previously +declared or defined. + +Values +------ + +.. productionlist:: + Value: `SimpleValue` `ValueSuffix`* + ValueSuffix: "{" `RangeList` "}" + :| "[" `RangeList` "]" + :| "." `TokIdentifier` + RangeList: `RangePiece` ("," `RangePiece`)* + RangePiece: `TokInteger` + :| `TokInteger` "-" `TokInteger` + :| `TokInteger` `TokInteger` + +The peculiar last form of :token:`RangePiece` is due to the fact that the +"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as +two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``, +instead of "1", "-", and "5". +The :token:`RangeList` can be thought of as specifying "list slice" in some +contexts. + + +:token:`SimpleValue` has a number of forms: + + +.. productionlist:: + SimpleValue: `TokIdentifier` + +The value will be the variable referenced by the identifier. It can be one +of: + +.. The code for this is exceptionally abstruse. These examples are a + best-effort attempt. + +* name of a ``def``, such as the use of ``Bar`` in:: + + def Bar : SomeClass { + int X = 5; + } + + def Foo { + SomeClass Baz = Bar; + } + +* value local to a ``def``, such as the use of ``Bar`` in:: + + def Foo { + int Bar = 5; + int Baz = Bar; + } + +* a template arg of a ``class``, such as the use of ``Bar`` in:: + + class Foo { + int Baz = Bar; + } + +* value local to a ``multiclass``, such as the use of ``Bar`` in:: + + multiclass Foo { + int Bar = 5; + int Baz = Bar; + } + +* a template arg to a ``multiclass``, such as the use of ``Bar`` in:: + + multiclass Foo { + int Baz = Bar; + } + +.. productionlist:: + SimpleValue: `TokInteger` + +This represents the numeric value of the integer. + +.. productionlist:: + SimpleValue: `TokString`+ + +Multiple adjacent string literals are concatenated like in C/C++. The value +is the concatenation of the strings. + +.. productionlist:: + SimpleValue: `TokCodeFragment` + +The value is the string value of the code fragment. + +.. productionlist:: + SimpleValue: "?" + +``?`` represents an "unset" initializer. + +.. productionlist:: + SimpleValue: "{" `ValueList` "}" + ValueList: [`ValueListNE`] + ValueListNE: `Value` ("," `Value`)* + +This represents a sequence of bits, as would be used to initialize a +``bits`` field (where ``n`` is the number of bits). + +.. productionlist:: + SimpleValue: `ClassID` "<" `ValueListNE` ">" + +This generates a new anonymous record definition (as would be created by an +unnamed ``def`` inheriting from the given class with the given template +arguments) and the value is the value of that record definition. + +.. productionlist:: + SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"] + +A list initializer. The optional :token:`Type` can be used to indicate a +specific element type, otherwise the element type will be deduced from the +given values. + +.. The initial `DagArg` of the dag must start with an identifier or + !cast, but this is more of an implementation detail and so for now just + leave it out. + +.. productionlist:: + SimpleValue: "(" `DagArg` `DagArgList` ")" + DagArgList: `DagArg` ("," `DagArg`)* + DagArg: `Value` [":" `TokVarName`] + +The initial :token:`DagArg` is called the "operator" of the dag. + +.. productionlist:: + SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")" + +Bodies +------ + +.. productionlist:: + ObjectBody: `BaseClassList` `Body` + BaseClassList: [`BaseClassListNE`] + BaseClassListNE: `SubClassRef` ("," `SubClassRef`)* + SubClassRef: (`ClassID` | `DefmID`) ["<" `ValueList` ">"] + DefmID: `TokIdentifier` + +The version with the :token:`DefmID` is only valid in the +:token:`BaseClassList` of a ``defm``. +The :token:`DefmID` should be the name of a ``multiclass``. + +.. put this somewhere else + +It is after parsing the base class list that the "let stack" is applied. + +.. productionlist:: + Body: ";" | "{" BodyList "}" + BodyList: BodyItem* + BodyItem: `Declaration` ";" + :| "let" `TokIdentifier` [`RangeList`] "=" `Value` ";" + +The ``let`` form allows overriding the value of an inherited field. + +``def`` +------- + +.. TODO:: + There can be pastes in the names here, like ``#NAME#``. Look into that + and document it (it boils down to ParseIDValue with IDParseMode == + ParseNameMode). ParseObjectName calls into the general ParseValue, with + the only different from "arbitrary expression parsing" being IDParseMode + == Mode. + +.. productionlist:: + Def: "def" `TokIdentifier` `ObjectBody` + +Defines a record whose name is given by the :token:`TokIdentifier`. The +fields of the record are inherited from the base classes and defined in the +body. + +Special handling occurs if this ``def`` appears inside a ``multiclass`` or +a ``foreach``. + +``defm`` +-------- + +.. productionlist:: + Defm: "defm" `TokIdentifier` ":" `BaseClassList` ";" + +Note that in the :token:`BaseClassList`, all of the ``multiclass``'s must +precede any ``class``'s that appear. + +``foreach`` +----------- + +.. productionlist:: + Foreach: "foreach" `Declaration` "in" "{" `Object`* "}" + :| "foreach" `Declaration` "in" `Object` + +The value assigned to the variable in the declaration is iterated over and +the object or object list is reevaluated with the variable set at each +iterated value. + +Top-Level ``let`` +----------------- + +.. productionlist:: + Let: "let" `LetList` "in" "{" `Object`* "}" + :| "let" `LetList` "in" `Object` + LetList: `LetItem` ("," `LetItem`)* + LetItem: `TokIdentifier` [`RangeList`] "=" `Value` + +This is effectively equivalent to ``let`` inside the body of a record +except that it applies to multiple records at a time. The bindings are +applied at the end of parsing the base classes of a record. + +``multiclass`` +-------------- + +.. productionlist:: + MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`] + : [":" `BaseMultiClassList`] "{" `MultiClassDef`+ "}" + BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)* + MultiClassID: `TokIdentifier` diff --git a/docs/subsystems.rst b/docs/subsystems.rst index 0d36b54b01e..505c573075a 100644 --- a/docs/subsystems.rst +++ b/docs/subsystems.rst @@ -24,6 +24,7 @@ Subsystem Documentation WritingAnLLVMBackend GarbageCollection WritingAnLLVMPass + TableGen/LangRef * :doc:`WritingAnLLVMPass` -- 2.34.1