X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FMIRLangRef.rst;h=a5f8c8c743ab2829b79f1b3fcd434dbb4a691190;hb=daaec40323619188f16f744af624de4b30755f56;hp=845b2b754ff0b5c4997fb42429cfa27028ddead0;hpb=32dcc28f85a8ceee7ace549fb829214e614d6cf0;p=oota-llvm.git diff --git a/docs/MIRLangRef.rst b/docs/MIRLangRef.rst index 845b2b754ff..a5f8c8c743a 100644 --- a/docs/MIRLangRef.rst +++ b/docs/MIRLangRef.rst @@ -33,9 +33,74 @@ contain the serialized machine functions. .. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132 +MIR Testing Guide +================= + +You can use the MIR format for testing in two different ways: + +- You can write MIR tests that invoke a single code generation pass using the + ``run-pass`` option in llc. + +- You can use llc's ``stop-after`` option with existing or new LLVM assembly + tests and check the MIR output of a specific code generation pass. + +Testing Individual Code Generation Passes +----------------------------------------- + +The ``run-pass`` option in llc allows you to create MIR tests that invoke +just a single code generation pass. When this option is used, llc will parse +an input MIR file, run the specified code generation pass, and print the +resulting MIR to the standard output stream. + +You can generate an input MIR file for the test by using the ``stop-after`` +option in llc. For example, if you would like to write a test for the +post register allocation pseudo instruction expansion pass, you can specify +the machine copy propagation pass in the ``stop-after`` option, as it runs +just before the pass that we are trying to test: + + ``llc -stop-after machine-cp bug-trigger.ll > test.mir`` + +After generating the input MIR file, you'll have to add a run line that uses +the ``-run-pass`` option to it. In order to test the post register allocation +pseudo instruction expansion pass on X86-64, a run line like the one shown +below can be used: + + ``# RUN: llc -run-pass postrapseudos -march=x86-64 %s -o /dev/null | FileCheck %s`` + +The MIR files are target dependent, so they have to be placed in the target +specific test directories. They also need to specify a target triple or a +target architecture either in the run line or in the embedded LLVM IR module. + +Limitations +----------- + +Currently the MIR format has several limitations in terms of which state it +can serialize: + +- The target-specific state in the target-specific ``MachineFunctionInfo`` + subclasses isn't serialized at the moment. + +- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and + SystemZ backends) aren't serialized at the moment. + +- The ``MCSymbol`` machine operands are only printed, they can't be parsed. + +- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI + instructions and the variable debug information from MMI is serialized right + now. + +These limitations impose restrictions on what you can test with the MIR format. +For now, tests that would like to test some behaviour that depends on the state +of certain ``MCSymbol`` operands or the exception handling state in MMI, can't +use the MIR format. As well as that, tests that test some behaviour that +depends on the state of the target specific ``MachineFunctionInfo`` or +``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment. + High Level Structure ==================== +.. _embedded-module: + Embedded Module --------------- @@ -64,22 +129,21 @@ Machine Functions The remaining YAML documents contain the machine functions. This is an example of such YAML document: -.. code-block:: yaml +.. code-block:: llvm --- name: inc tracksRegLiveness: true liveins: - { reg: '%rdi' } - body: - - id: 0 - name: entry - liveins: [ '%rdi' ] - instructions: - - '%eax = MOV32rm %rdi, 1, _, 0, _' - - '%eax = INC32r killed %eax, implicit-def dead %eflags' - - 'MOV32mr killed %rdi, 1, _, 0, _, %eax' - - 'RETQ %eax' + body: | + bb.0.entry: + liveins: %rdi + + %eax = MOV32rm %rdi, 1, _, 0, _ + %eax = INC32r killed %eax, implicit-def dead %eflags + MOV32mr killed %rdi, 1, _, 0, _, %eax + RETQ %eax ... The document above consists of attributes that represent the various @@ -88,25 +152,332 @@ properties and data structures in a machine function. The attribute ``name`` is required, and its value should be identical to the name of a function that this machine function is based on. -The attribute ``body`` contains a list of YAML mappings that represent the -function's machine basic blocks. +The attribute ``body`` is a `YAML block literal string`_. Its value represents +the function's machine basic blocks and their machine instructions. + +Machine Instructions Format Reference +===================================== + +The machine basic blocks and their instructions are represented using a custom, +human readable serialization language. This language is used in the +`YAML block literal string`_ that corresponds to the machine function's body. + +A source string that uses this language contains a list of machine basic +blocks, which are described in the section below. + +Machine Basic Blocks +-------------------- + +A machine basic block is defined in a single block definition source construct +that contains the block's ID. +The example below defines two blocks that have an ID of zero and one: + +.. code-block:: llvm + + bb.0: + + bb.1: + + +A machine basic block can also have a name. It should be specified after the ID +in the block's definition: + +.. code-block:: llvm + + bb.0.entry: ; This block's name is "entry" + + +The block's name should be identical to the name of the IR block that this +machine block is based on. + +Block References +^^^^^^^^^^^^^^^^ + +The machine basic blocks are identified by their ID numbers. Individual +blocks are referenced using the following syntax: + +.. code-block:: llvm + + %bb.[.] + +Examples: + +.. code-block:: llvm + + %bb.0 + %bb.1.then + +Successors +^^^^^^^^^^ + +The machine basic block's successors have to be specified before any of the +instructions: + +.. code-block:: llvm + + bb.0.entry: + successors: %bb.1.then, %bb.2.else + + bb.1.then: + + bb.2.else: + + +The branch weights can be specified in brackets after the successor blocks. +The example below defines a block that has two successors with branch weights +of 32 and 16: + +.. code-block:: llvm + + bb.0.entry: + successors: %bb.1.then(32), %bb.2.else(16) + +.. _bb-liveins: + +Live In Registers +^^^^^^^^^^^^^^^^^ + +The machine basic block's live in registers have to be specified before any of +the instructions: + +.. code-block:: llvm + + bb.0.entry: + liveins: %edi, %esi + +The list of live in registers and successors can be empty. The language also +allows multiple live in register and successor lists - they are combined into +one list by the parser. + +Miscellaneous Attributes +^^^^^^^^^^^^^^^^^^^^^^^^ + +The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be +specified in brackets after the block's definition: + +.. code-block:: llvm + + bb.0.entry (address-taken): + + bb.2.else (align 4): + + bb.3(landing-pad, align 4): + + +.. TODO: Describe the way the reference to an unnamed LLVM IR block can be + preserved. + +Machine Instructions +-------------------- + +A machine instruction is composed of a name, +:ref:`machine operands `, +:ref:`instruction flags `, and machine memory operands. + +The instruction's name is usually specified before the operands. The example +below shows an instance of the X86 ``RETQ`` instruction with a single machine +operand: + +.. code-block:: llvm + + RETQ %eax + +However, if the machine instruction has one or more explicitly defined register +operands, the instruction's name has to be specified after them. The example +below shows an instance of the AArch64 ``LDPXpost`` instruction with three +defined register operands: + +.. code-block:: llvm + + %sp, %fp, %lr = LDPXpost %sp, 2 + +The instruction names are serialized using the exact definitions from the +target's ``*InstrInfo.td`` files, and they are case sensitive. This means that +similar instruction names like ``TSTri`` and ``tSTRi`` represent different +machine instructions. + +.. _instruction-flags: + +Instruction Flags +^^^^^^^^^^^^^^^^^ + +The flag ``frame-setup`` can be specified before the instruction's name: + +.. code-block:: llvm + + %fp = frame-setup ADDXri %sp, 0, 0 + +.. _registers: + +Registers +--------- + +Registers are one of the key primitives in the machine instructions +serialization language. They are primarly used in the +:ref:`register machine operands `, +but they can also be used in a number of other places, like the +:ref:`basic block's live in list `. + +The physical registers are identified by their name. They use the following +syntax: + +.. code-block:: llvm + + % + +The example below shows three X86 physical registers: + +.. code-block:: llvm + + %eax + %r15 + %eflags + +The virtual registers are identified by their ID number. They use the following +syntax: + +.. code-block:: llvm + + % + +Example: + +.. code-block:: llvm + + %0 + +The null registers are represented using an underscore ('``_``'). They can also be +represented using a '``%noreg``' named register, although the former syntax +is preferred. + +.. _machine-operands: + +Machine Operands +---------------- + +There are seventeen different kinds of machine operands, and all of them, except +the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are +just printed out - they can't be parsed back yet. + +Immediate Operands +^^^^^^^^^^^^^^^^^^ + +The immediate machine operands are untyped, 64-bit signed integers. The +example below shows an instance of the X86 ``MOV32ri`` instruction that has an +immediate machine operand ``-42``: + +.. code-block:: llvm + + %eax = MOV32ri -42 + +.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands. + +.. _register-operands: + +Register Operands +^^^^^^^^^^^^^^^^^ + +The :ref:`register ` primitive is used to represent the register +machine operands. The register operands can also have optional +:ref:`register flags `, +:ref:`a subregister index `, +and a reference to the tied register operand. +The full syntax of a register operand is shown below: + +.. code-block:: llvm + + [] [ : ] [ (tied-def ) ] + +This example shows an instance of the X86 ``XOR32rr`` instruction that has +5 register operands with different register flags: + +.. code-block:: llvm + + dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al + +.. _register-flags: + +Register Flags +~~~~~~~~~~~~~~ + +The table below shows all of the possible register flags along with the +corresponding internal ``llvm::RegState`` representation: + +.. list-table:: + :header-rows: 1 + + * - Flag + - Internal Value + + * - ``implicit`` + - ``RegState::Implicit`` + + * - ``implicit-def`` + - ``RegState::ImplicitDefine`` + + * - ``def`` + - ``RegState::Define`` + + * - ``dead`` + - ``RegState::Dead`` + + * - ``killed`` + - ``RegState::Kill`` + + * - ``undef`` + - ``RegState::Undef`` + + * - ``internal`` + - ``RegState::InternalRead`` + + * - ``early-clobber`` + - ``RegState::EarlyClobber`` + + * - ``debug-use`` + - ``RegState::Debug`` + +.. _subregister-indices: + +Subregister Indices +~~~~~~~~~~~~~~~~~~~ + +The register machine operands can reference a portion of a register by using +the subregister indices. The example below shows an instance of the ``COPY`` +pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8 +lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1: + +.. code-block:: llvm + + %1 = COPY %0:sub_8bit + +The names of the subregister indices are target specific, and are typically +defined in the target's ``*RegisterInfo.td`` file. + +Global Value Operands +^^^^^^^^^^^^^^^^^^^^^ + +The global value machine operands reference the global values from the +:ref:`embedded LLVM IR module `. +The example below shows an instance of the X86 ``MOV64rm`` instruction that has +a global value operand named ``G``: + +.. code-block:: llvm + + %rax = MOV64rm %rip, 1, _, @G, _ + +The named global values are represented using an identifier with the '@' prefix. +If the identifier doesn't match the regular expression +`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted. -The first machine basic block in the ``body`` list above contains the attribute -``instructions``. This attribute stores a list of string literals which -represent the machine instructions for that basic block. +The unnamed global values are represented using an unsigned numeric value with +the '@' prefix, like in the following examples: ``@0``, ``@989``. .. TODO: Describe the parsers default behaviour when optional YAML attributes are missing. -.. TODO: Describe the syntax of the machine instructions. -.. TODO: Describe the syntax of the immediate machine operands. -.. TODO: Describe the syntax of the register machine operands. -.. TODO: Describe the syntax of the virtual register operands and their YAML - definitions. -.. TODO: Describe the syntax of the register operand flags and the subregisters. +.. TODO: Describe the syntax for the bundled instructions. +.. TODO: Describe the syntax for virtual register YAML definitions. .. TODO: Describe the machine function's YAML flag attributes. -.. TODO: Describe the machine basic block's YAML flag, successors and livein - attributes. Describe the syntax for the machine basic block operands. -.. TODO: Describe the syntax for the global value, external symbol and register +.. TODO: Describe the syntax for the external symbol and register mask machine operands. .. TODO: Describe the frame information YAML mapping. .. TODO: Describe the syntax of the stack object machine operands and their