Merging r261039:

[oota-llvm.git] / docs / CodeGenerator.rst
diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst

index 72b12539cd73ac891201768f0aff1087e4108dbe..f3b949c7ad157fac32e43ca1647a07b9a757e8ee 100644 (file)
--- a/docs/CodeGenerator.rst
+++ b/docs/CodeGenerator.rst
@@ -1,5 +1,3 @@
-.. _code_generator:
-
  ==========================================
  The LLVM Target-Independent Code Generator
  ==========================================
@@ -17,6 +15,8 @@ The LLVM Target-Independent Code Generator
      .partial { background-color: #F88017 }
      .yes { background-color: #0F0; }
      .yes:before { content: "Y" }
+    .na { background-color: #6666FF; }
+    .na:before { content: "N/A" }
    </style>
  
  .. contents::
@@ -70,7 +70,7 @@ different pieces of this will be useful to you.  In any case, you should be
  familiar with the `target description`_ and `machine code representation`_
  classes.  If you want to add a backend for a new target, you will need to
  `implement the target description`_ classes for your new target and understand
-the `LLVM code representation <LangRef.html>`_.  If you are interested in
+the :doc:`LLVM code representation <LangRef>`.  If you are interested in
  implementing a new `code generation algorithm`_, it should only depend on the
  target-description and machine code representation classes, ensuring that it is
  portable.
@@ -81,7 +81,7 @@ Required components in the code generator
  The two pieces of the LLVM code generator are the high-level interface to the
  code generator and the set of reusable components that can be used to build
  target-specific backends.  The two most important interfaces (:raw-html:`<tt>`
-`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `TargetData`_
+`TargetMachine`_ :raw-html:`</tt>` and :raw-html:`<tt>` `DataLayout`_
  :raw-html:`</tt>`) are the only ones that are required to be defined for a
  backend to fit into the LLVM system, but the others must be defined if the
  reusable code generator components are going to be used.
@@ -172,7 +172,7 @@ architecture.  These target descriptions often have a large amount of common
  information (e.g., an ``add`` instruction is almost identical to a ``sub``
  instruction).  In order to allow the maximum amount of commonality to be
  factored out, the LLVM code generator uses the
-`TableGen <TableGenFundamentals.html>`_ tool to describe big chunks of the
+:doc:`TableGen/index` tool to describe big chunks of the
  target machine, which allows the use of domain-specific and target-specific
  abstractions to reduce the amount of repetition.
  
@@ -197,7 +197,7 @@ any particular client.  These classes are designed to capture the *abstract*
  properties of the target (such as the instructions and registers it has), and do
  not incorporate any particular pieces of code generation algorithms.
  
-All of the target description classes (except the :raw-html:`<tt>` `TargetData`_
+All of the target description classes (except the :raw-html:`<tt>` `DataLayout`_
  :raw-html:`</tt>` class) are designed to be subclassed by the concrete target
  implementation, and have virtual methods implemented.  To get to these
  implementations, the :raw-html:`<tt>` `TargetMachine`_ :raw-html:`</tt>` class
@@ -214,23 +214,23 @@ the ``get*Info`` methods (``getInstrInfo``, ``getRegisterInfo``,
  ``getFrameInfo``, etc.).  This class is designed to be specialized by a concrete
  target implementation (e.g., ``X86TargetMachine``) which implements the various
  virtual methods.  The only required target description class is the
-:raw-html:`<tt>` `TargetData`_ :raw-html:`</tt>` class, but if the code
+:raw-html:`<tt>` `DataLayout`_ :raw-html:`</tt>` class, but if the code
  generator components are to be used, the other interfaces should be implemented
  as well.
  
-.. _TargetData:
+.. _DataLayout:
  
-The ``TargetData`` class
+The ``DataLayout`` class
  ------------------------
  
-The ``TargetData`` class is the only required target description class, and it
-is the only class that is not extensible (you cannot derived a new class from
-it).  ``TargetData`` specifies information about how the target lays out memory
+The ``DataLayout`` class is the only required target description class, and it
+is the only class that is not extensible (you cannot derive a new class from
+it).  ``DataLayout`` specifies information about how the target lays out memory
  for structures, the alignment requirements for various data types, the size of
  pointers in the target, and whether the target is little-endian or
  big-endian.
  
-.. _targetlowering:
+.. _TargetLowering:
  
  The ``TargetLowering`` class
  ----------------------------
@@ -248,7 +248,9 @@ operations.  Among other things, this class indicates:
  * the type to use for shift amounts, and
  
  * various high-level characteristics, like whether it is profitable to turn
-  division by a constant into a multiplication sequence
+  division by a constant into a multiplication sequence.
+
+.. _TargetRegisterInfo:
  
  The ``TargetRegisterInfo`` class
  --------------------------------
@@ -256,10 +258,10 @@ The ``TargetRegisterInfo`` class
  The ``TargetRegisterInfo`` class is used to describe the register file of the
  target and any interactions between the registers.
  
-Registers in the code generator are represented in the code generator by
-unsigned integers.  Physical registers (those that actually exist in the target
-description) are unique small numbers, and virtual registers are generally
-large.  Note that register ``#0`` is reserved as a flag value.
+Registers are represented in the code generator by unsigned integers.  Physical
+registers (those that actually exist in the target description) are unique
+small numbers, and virtual registers are generally large.  Note that
+register ``#0`` is reserved as a flag value.
  
  Each register in the processor description has an associated
  ``TargetRegisterDesc`` entry, which provides a textual name for the register
@@ -275,7 +277,7 @@ an associated register class.  When the register allocator runs, it replaces
  virtual registers with a physical register in the set.
  
  The target-specific implementations of these classes is auto-generated from a
-`TableGen <TableGenFundamentals.html>`_ description of the register file.
+:doc:`TableGen/index` description of the register file.
  
  .. _TargetInstrInfo:
  
@@ -283,17 +285,15 @@ The ``TargetInstrInfo`` class
  -----------------------------
  
  The ``TargetInstrInfo`` class is used to describe the machine instructions
-supported by the target. It is essentially an array of ``TargetInstrDescriptor``
-objects, each of which describes one instruction the target
-supports. Descriptors define things like the mnemonic for the opcode, the number
-of operands, the list of implicit register uses and defs, whether the
-instruction has certain target-independent properties (accesses memory, is
-commutable, etc), and holds any target-specific flags.
-
-The ``TargetFrameInfo`` class
------------------------------
+supported by the target.  Descriptions define things like the mnemonic for
+the opcode, the number of operands, the list of implicit register uses and defs,
+whether the instruction has certain target-independent properties (accesses
+memory, is commutable, etc), and holds any target-specific flags.
+
+The ``TargetFrameLowering`` class
+---------------------------------
  
-The ``TargetFrameInfo`` class is used to provide information about the stack
+The ``TargetFrameLowering`` class is used to provide information about the stack
  frame layout of the target. It holds the direction of stack growth, the known
  stack alignment on entry to each function, and the offset to the local area.
  The offset to the local area is the offset from the stack pointer on function
@@ -434,12 +434,12 @@ For example, consider this simple LLVM example:
  .. code-block:: llvm
  
    define i32 @test(i32 %X, i32 %Y) {
-    %Z = udiv i32 %X, %Y
+    %Z = sdiv i32 %X, %Y
      ret i32 %Z
    }
  
-The X86 instruction selector produces this machine code for the ``div`` and
-``ret`` (use "``llc X.bc -march=x86 -print-machineinstrs``" to get this):
+The X86 instruction selector might produce this machine code for the ``div`` and
+``ret``:
  
  .. code-block:: llvm
  
@@ -454,8 +454,8 @@ The X86 instruction selector produces this machine code for the ``div`` and
    %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
    ret
  
-By the end of code generation, the register allocator has coalesced the
-registers and deleted the resultant identity moves producing the following
+By the end of code generation, the register allocator would coalesce the
+registers and delete the resultant identity moves producing the following
  code:
  
  .. code-block:: llvm
@@ -464,7 +464,7 @@ code:
    mov %EAX, %EDX
    sar %EDX, 31
    idiv %ECX
-  ret 
+  ret
  
  This approach is extremely general (if it can handle the X86 architecture, it
  can handle anything!) and allows all of the target specific knowledge about the
@@ -636,6 +636,18 @@ file (MCObjectStreamer).  MCAsmStreamer is a straight-forward implementation
  that prints out a directive for each method (e.g. ``EmitValue -> .byte``), but
  MCObjectStreamer implements a full assembler.
  
+For target specific directives, the MCStreamer has a MCTargetStreamer instance.
+Each target that needs it defines a class that inherits from it and is a lot
+like MCStreamer itself: It has one method per directive and two classes that
+inherit from it, a target object streamer and a target asm streamer. The target
+asm streamer just prints it (``emitFnStart -> .fnstart``), and the object
+streamer implement the assembler logic for it.
+
+To make llvm use these classes, the target initialization must call
+TargetRegistry::RegisterAsmStreamer and TargetRegistry::RegisterMCObjectStreamer
+passing callbacks that allocate the corresponding target streamer and pass it
+to createAsmStreamer or to the appropriate object streamer constructor.
+
  The ``MCContext`` class
  -----------------------
  
@@ -737,7 +749,7 @@ The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
  ``SDNode`` class.  The primary payload of the ``SDNode`` is its operation code
  (Opcode) that indicates what operation the node performs and the operands to the
  operation.  The various operation node types are described at the top of the
-``include/llvm/CodeGen/SelectionDAGNodes.h`` file.
+``include/llvm/CodeGen/ISDOpcodes.h`` file.
  
  Although most operations define a single value, each node in the graph may
  define multiple values.  For example, a combined div/rem operation will define
@@ -757,7 +769,9 @@ provide an ordering between nodes that have side effects (such as loads, stores,
  calls, returns, etc).  All nodes that have side effects should take a token
  chain as input and produce a new one as output.  By convention, token chain
  inputs are always operand #0, and chain results are always the last value
-produced by an operation.
+produced by an operation. However, after instruction selection, the
+machine nodes have their chain after the instruction's operands, and
+may be followed by glue nodes.
  
  A SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
  always a marker node with an Opcode of ``ISD::EntryToken``.  The Root node is
@@ -771,6 +785,8 @@ value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
  SREM or UREM operation.  The `legalize types`_ and `legalize operations`_ phases
  are responsible for turning an illegal DAG into a legal DAG.
  
+.. _SelectionDAG-Process:
+
  SelectionDAG Instruction Selection Process
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
@@ -813,7 +829,7 @@ One great way to visualize what is going on here is to take advantage of a few
  LLC command line options.  The following options pop up a window displaying the
  SelectionDAG at specific times (if you only get errors printed to the console
  while using this, you probably `need to configure your
-system <ProgrammersManual.html#ViewGraph>`_ to add support for it).
+system <ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ to add support for it).
  
  * ``-view-dag-combine1-dags`` displays the DAG after being built, before the
    first optimization pass.
@@ -832,14 +848,17 @@ is based on the final SelectionDAG, with nodes that must be scheduled together
  bundled into a single scheduling-unit node, and with immediate operands and
  other nodes that aren't relevant for scheduling omitted.
  
+The option ``-filter-view-dags`` allows to select the name of the basic block
+that you are interested to visualize and filters all the previous
+``view-*-dags`` options.
+
  .. _Build initial DAG:
  
  Initial SelectionDAG Construction
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  The initial SelectionDAG is na\ :raw-html:`&iuml;`\ vely peephole expanded from
-the LLVM input by the ``SelectionDAGLowering`` class in the
-``lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp`` file.  The intent of this pass
+the LLVM input by the ``SelectionDAGBuilder`` class.  The intent of this pass
  is to expose as much low-level, target-specific details to the SelectionDAG as
  possible.  This pass is mostly hard-coded (e.g. an LLVM ``add`` turns into an
  ``SDNode add`` while a ``getelementptr`` is expanded into the obvious
@@ -875,7 +894,7 @@ found, the elements are converted to scalars ("scalarizing").
  
  A target implementation tells the legalizer which types are supported (and which
  register class to use for them) by calling the ``addRegisterClass`` method in
-its TargetLowering constructor.
+its ``TargetLowering`` constructor.
  
  .. _legalize operations:
  .. _Legalizer:
@@ -969,7 +988,8 @@ The ``FADDS`` instruction is a simple binary single-precision add instruction.
  To perform this pattern match, the PowerPC backend includes the following
  instruction definitions:
  
-::
+.. code-block:: text
+  :emphasize-lines: 4-5,9
  
    def FMADDS : AForm_1<59, 29,
                        (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
@@ -981,10 +1001,10 @@ instruction definitions:
                        "fadds $FRT, $FRA, $FRB",
                        [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;
  
-The portion of the instruction definition in bold indicates the pattern used to
-match the instruction.  The DAG operators (like ``fmul``/``fadd``) are defined
-in the ``include/llvm/Target/TargetSelectionDAG.td`` file.  " ``F4RC``" is the
-register class of the input and result values.
+The highlighted portion of the instruction definitions indicates the pattern
+used to match the instructions. The DAG operators (like ``fmul``/``fadd``)
+are defined in the ``include/llvm/Target/TargetSelectionDAG.td`` file.
+"``F4RC``" is the register class of the input and result values.
  
  The TableGen DAG instruction selector generator reads the instruction patterns
  in the ``.td`` file and automatically builds parts of the pattern matching code
@@ -1036,6 +1056,24 @@ for your target.  It has the following strengths:
    are used to manipulate the input immediate (in this case, take the high or low
    16-bits of the immediate).
  
+* When using the 'Pat' class to map a pattern to an instruction that has one
+  or more complex operands (like e.g. `X86 addressing mode`_), the pattern may
+  either specify the operand as a whole using a ``ComplexPattern``, or else it
+  may specify the components of the complex operand separately.  The latter is
+  done e.g. for pre-increment instructions by the PowerPC back end:
+
+  ::
+
+    def STWU  : DForm_1<37, (outs ptr_rc:$ea_res), (ins GPRC:$rS, memri:$dst),
+                    "stwu $rS, $dst", LdStStoreUpd, []>,
+                    RegConstraint<"$dst.reg = $ea_res">, NoEncode<"$ea_res">;
+
+    def : Pat<(pre_store GPRC:$rS, ptr_rc:$ptrreg, iaddroff:$ptroff),
+              (STWU GPRC:$rS, iaddroff:$ptroff, ptr_rc:$ptrreg)>;
+
+  Here, the pair of ``ptroff`` and ``ptrreg`` operands is matched onto the
+  complex operand ``dst`` of class ``memri`` in the ``STWU`` instruction.
+
  * While the system does automate a lot, it still allows you to write custom C++
    code to match special cases if there is something that is hard to
    express.
@@ -1196,7 +1234,7 @@ used. Each virtual register can only be mapped to physical registers of a
  particular class. For instance, in the X86 architecture, some virtuals can only
  be allocated to 8 bit registers.  A register class is described by
  ``TargetRegisterClass`` objects.  To discover if a virtual register is
-compatible with a given physical, this code can be used:</p>
+compatible with a given physical, this code can be used:
  
  .. code-block:: c++
  
@@ -1302,7 +1340,7 @@ found before being stored or after being reloaded.
  If the indirect strategy is used, after all the virtual registers have been
  mapped to physical registers or stack slots, it is necessary to use a spiller
  object to place load and store instructions in the code. Every virtual that has
-been mapped to a stack slot will be stored to memory after been defined and will
+been mapped to a stack slot will be stored to memory after being defined and will
  be loaded before being used. The implementation of the spiller tries to recycle
  load/store instructions, avoiding unnecessary instructions. For an example of
  how to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in
@@ -1315,7 +1353,7 @@ With very rare exceptions (e.g., function calls), the LLVM machine code
  instructions are three address instructions. That is, each instruction is
  expected to define at most one register, and to use at most two registers.
  However, some architectures use two address instructions. In this case, the
-defined register is also one of the used register. For instance, an instruction
+defined register is also one of the used registers. For instance, an instruction
  such as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX +
  %EBX``.
  
@@ -1540,7 +1578,7 @@ three important things that you have to implement for your target:
     correspond to. The MCInsts that are generated by this are fed into the
     instruction printer or the encoder.
  
-Finally, at your choosing, you can also implement an subclass of MCCodeEmitter
+Finally, at your choosing, you can also implement a subclass of MCCodeEmitter
  which lowers MCInst's into machine code bytes and relocations.  This is
  important if you want to support direct .o file emission, or would like to
  implement an assembler for your target.
@@ -1594,7 +1632,7 @@ Implementing a Native Assembler
  ===============================
  
  Though you're probably reading this because you want to write or maintain a
-compiler backend, LLVM also fully supports building a native assemblers too.
+compiler backend, LLVM also fully supports building a native assembler.
  We've tried hard to automate the generation of the assembler from the .td files
  (in particular the instruction syntax and encodings), which means that a large
  part of the manual and repetitive data entry can be factored and shared with the
@@ -1651,7 +1689,7 @@ ones supported by the matcher), through a Requires clause:
    def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
    def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
  
-In this example, the mnemonic gets mapped into different a new one depending on
+In this example, the mnemonic gets mapped into a different one depending on
  the current instruction set.
  
  Instruction Aliases
@@ -1728,6 +1766,8 @@ This section of the document explains features or design decisions that are
  specific to the code generator for a particular target.  First we start with a
  table that summarizes what features are supported by each target.
  
+.. _target-feature-matrix:
+
  Target Feature Matrix
  ---------------------
  
@@ -1742,12 +1782,14 @@ the key:
  :raw-html:`<table border="1" cellspacing="0">`
  :raw-html:`<tr>`
  :raw-html:`<th>Unknown</th>`
+:raw-html:`<th>Not Applicable</th>`
  :raw-html:`<th>No support</th>`
  :raw-html:`<th>Partial Support</th>`
  :raw-html:`<th>Complete Support</th>`
  :raw-html:`</tr>`
  :raw-html:`<tr>`
  :raw-html:`<td class="unknown"></td>`
+:raw-html:`<td class="na"></td>`
  :raw-html:`<td class="no"></td>`
  :raw-html:`<td class="partial"></td>`
  :raw-html:`<td class="yes"></td>`
@@ -1763,136 +1805,136 @@ Here is the table:
  :raw-html:`<tr>`
  :raw-html:`<th>Feature</th>`
  :raw-html:`<th>ARM</th>`
-:raw-html:`<th>CellSPU</th>`
  :raw-html:`<th>Hexagon</th>`
-:raw-html:`<th>MBlaze</th>`
  :raw-html:`<th>MSP430</th>`
  :raw-html:`<th>Mips</th>`
-:raw-html:`<th>PTX</th>`
+:raw-html:`<th>NVPTX</th>`
  :raw-html:`<th>PowerPC</th>`
  :raw-html:`<th>Sparc</th>`
+:raw-html:`<th>SystemZ</th>`
  :raw-html:`<th>X86</th>`
  :raw-html:`<th>XCore</th>`
+:raw-html:`<th>eBPF</th>`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_reliable">is generally reliable</a></td>`
  :raw-html:`<td class="yes"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="yes"></td> <!-- Hexagon -->`
-:raw-html:`<td class="no"></td> <!-- MBlaze -->`
  :raw-html:`<td class="unknown"></td> <!-- MSP430 -->`
  :raw-html:`<td class="yes"></td> <!-- Mips -->`
-:raw-html:`<td class="no"></td> <!-- PTX -->`
+:raw-html:`<td class="yes"></td> <!-- NVPTX -->`
  :raw-html:`<td class="yes"></td> <!-- PowerPC -->`
  :raw-html:`<td class="yes"></td> <!-- Sparc -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
-:raw-html:`<td class="unknown"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_asmparser">assembly parser</a></td>`
  :raw-html:`<td class="no"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="no"></td> <!-- Hexagon -->`
-:raw-html:`<td class="yes"></td> <!-- MBlaze -->`
  :raw-html:`<td class="no"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="no"></td> <!-- PTX -->`
+:raw-html:`<td class="no"></td> <!-- NVPTX -->`
  :raw-html:`<td class="no"></td> <!-- PowerPC -->`
  :raw-html:`<td class="no"></td> <!-- Sparc -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
  :raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_disassembler">disassembler</a></td>`
  :raw-html:`<td class="yes"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="no"></td> <!-- Hexagon -->`
-:raw-html:`<td class="yes"></td> <!-- MBlaze -->`
  :raw-html:`<td class="no"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="no"></td> <!-- PTX -->`
+:raw-html:`<td class="na"></td> <!-- NVPTX -->`
  :raw-html:`<td class="no"></td> <!-- PowerPC -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="no"></td> <!-- Sparc -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
-:raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_inlineasm">inline asm</a></td>`
  :raw-html:`<td class="yes"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="yes"></td> <!-- Hexagon -->`
-:raw-html:`<td class="yes"></td> <!-- MBlaze -->`
  :raw-html:`<td class="unknown"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="unknown"></td> <!-- PTX -->`
+:raw-html:`<td class="yes"></td> <!-- NVPTX -->`
  :raw-html:`<td class="yes"></td> <!-- PowerPC -->`
  :raw-html:`<td class="unknown"></td> <!-- Sparc -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
-:raw-html:`<td class="unknown"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_jit">jit</a></td>`
  :raw-html:`<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="no"></td> <!-- Hexagon -->`
-:raw-html:`<td class="no"></td> <!-- MBlaze -->`
  :raw-html:`<td class="unknown"></td> <!-- MSP430 -->`
  :raw-html:`<td class="yes"></td> <!-- Mips -->`
-:raw-html:`<td class="unknown"></td> <!-- PTX -->`
+:raw-html:`<td class="na"></td> <!-- NVPTX -->`
  :raw-html:`<td class="yes"></td> <!-- PowerPC -->`
  :raw-html:`<td class="unknown"></td> <!-- Sparc -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
-:raw-html:`<td class="unknown"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_objectwrite">.o&nbsp;file writing</a></td>`
  :raw-html:`<td class="no"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="no"></td> <!-- Hexagon -->`
-:raw-html:`<td class="yes"></td> <!-- MBlaze -->`
  :raw-html:`<td class="no"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="no"></td> <!-- PTX -->`
+:raw-html:`<td class="na"></td> <!-- NVPTX -->`
  :raw-html:`<td class="no"></td> <!-- PowerPC -->`
  :raw-html:`<td class="no"></td> <!-- Sparc -->`
+:raw-html:`<td class="yes"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
  :raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="yes"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a hr:raw-html:`ef="#feat_tailcall">tail calls</a></td>`
  :raw-html:`<td class="yes"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="yes"></td> <!-- Hexagon -->`
-:raw-html:`<td class="no"></td> <!-- MBlaze -->`
  :raw-html:`<td class="unknown"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="unknown"></td> <!-- PTX -->`
+:raw-html:`<td class="no"></td> <!-- NVPTX -->`
  :raw-html:`<td class="yes"></td> <!-- PowerPC -->`
  :raw-html:`<td class="unknown"></td> <!-- Sparc -->`
+:raw-html:`<td class="no"></td> <!-- SystemZ -->`
  :raw-html:`<td class="yes"></td> <!-- X86 -->`
-:raw-html:`<td class="unknown"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`<tr>`
  :raw-html:`<td><a href="#feat_segstacks">segmented stacks</a></td>`
  :raw-html:`<td class="no"></td> <!-- ARM -->`
-:raw-html:`<td class="no"></td> <!-- CellSPU -->`
  :raw-html:`<td class="no"></td> <!-- Hexagon -->`
-:raw-html:`<td class="no"></td> <!-- MBlaze -->`
  :raw-html:`<td class="no"></td> <!-- MSP430 -->`
  :raw-html:`<td class="no"></td> <!-- Mips -->`
-:raw-html:`<td class="no"></td> <!-- PTX -->`
+:raw-html:`<td class="no"></td> <!-- NVPTX -->`
  :raw-html:`<td class="no"></td> <!-- PowerPC -->`
  :raw-html:`<td class="no"></td> <!-- Sparc -->`
+:raw-html:`<td class="no"></td> <!-- SystemZ -->`
  :raw-html:`<td class="partial"><a href="#feat_segstacks_x86">*</a></td> <!-- X86 -->`
  :raw-html:`<td class="no"></td> <!-- XCore -->`
+:raw-html:`<td class="no"></td> <!-- eBPF -->`
  :raw-html:`</tr>`
  
  :raw-html:`</table>`
@@ -1966,7 +2008,7 @@ Tail Calls
  
  This box indicates whether the target supports guaranteed tail calls.  These are
  calls marked "`tail <LangRef.html#i_call>`_" and use the fastcc calling
-convention.  Please see the `tail call section more more details`_.
+convention.  Please see the `tail call section`_ for more details.
  
  .. _feat_segstacks:
  
@@ -1984,7 +2026,7 @@ Basic support exists on the X86 backend. Currently vararg doesn't work and the
  object files are not marked the way the gold linker expects, but simple Go
  programs can be built by dragonegg.
  
-.. _tail call section more more details:
+.. _tail call section:
  
  Tail call optimization
  ----------------------
@@ -1992,15 +2034,15 @@ Tail call optimization
  Tail call optimization, callee reusing the stack of the caller, is currently
  supported on x86/x86-64 and PowerPC. It is performed if:
  
-* Caller and callee have the calling convention ``fastcc`` or ``cc 10`` (GHC
-  call convention).
+* Caller and callee have the calling convention ``fastcc``, ``cc 10`` (GHC
+  calling convention) or ``cc 11`` (HiPE calling convention).
  
  * The call is a tail call - in tail position (ret immediately follows call and
    ret uses value of call or is void).
  
  * Option ``-tailcallopt`` is enabled.
  
-* Platform specific constraints are met.
+* Platform-specific constraints are met.
  
  x86/x86-64 constraints:
  
@@ -2370,17 +2412,17 @@ Dynamic Allocation
  
    TODO - More to come.
  
-The PTX backend
----------------
+The NVPTX backend
+-----------------
  
-The PTX code generator lives in the lib/Target/PTX directory. It is currently a
-work-in-progress, but already supports most of the code generation functionality
-needed to generate correct PTX kernels for CUDA devices.
+The NVPTX code generator under lib/Target/NVPTX is an open-source version of
+the NVIDIA NVPTX code generator for LLVM.  It is contributed by NVIDIA and is
+a port of the code generator used in the CUDA compiler (nvcc).  It targets the
+PTX 3.0/3.1 ISA and can target any compute capability greater than or equal to
+2.0 (Fermi).
  
-The code generator can target PTX 2.0+, and shader model 1.0+.  The PTX ISA
-Reference Manual is used as the primary source of ISA information, though an
-effort is made to make the output of the code generator match the output of the
-NVidia nvcc compiler, whenever possible.
+This target is of production quality and should be completely compatible with
+the official NVIDIA toolchain.
  
  Code Generator Options:
  
@@ -2390,39 +2432,216 @@ Code Generator Options:
  :raw-html:`<th>Description</th>`
  :raw-html:`</tr>`
  :raw-html:`<tr>`
-:raw-html:`<td>``double``</td>`
-:raw-html:`<td align="left">If enabled, the map_f64_to_f32 directive is disabled in the PTX output, allowing native double-precision arithmetic</td>`
+:raw-html:`<td>sm_20</td>`
+:raw-html:`<td align="left">Set shader model/compute capability to 2.0</td>`
+:raw-html:`</tr>`
+:raw-html:`<tr>`
+:raw-html:`<td>sm_21</td>`
+:raw-html:`<td align="left">Set shader model/compute capability to 2.1</td>`
  :raw-html:`</tr>`
  :raw-html:`<tr>`
-:raw-html:`<td>``no-fma``</td>`
-:raw-html:`<td align="left">Disable generation of Fused-Multiply Add instructions, which may be beneficial for some devices</td>`
+:raw-html:`<td>sm_30</td>`
+:raw-html:`<td align="left">Set shader model/compute capability to 3.0</td>`
  :raw-html:`</tr>`
  :raw-html:`<tr>`
-:raw-html:`<td>``smxy / computexy``</td>`
-:raw-html:`<td align="left">Set shader model/compute capability to x.y, e.g. sm20 or compute13</td>`
+:raw-html:`<td>sm_35</td>`
+:raw-html:`<td align="left">Set shader model/compute capability to 3.5</td>`
+:raw-html:`</tr>`
+:raw-html:`<tr>`
+:raw-html:`<td>ptx30</td>`
+:raw-html:`<td align="left">Target PTX 3.0</td>`
+:raw-html:`</tr>`
+:raw-html:`<tr>`
+:raw-html:`<td>ptx31</td>`
+:raw-html:`<td align="left">Target PTX 3.1</td>`
  :raw-html:`</tr>`
  :raw-html:`</table>`
  
-Working:
+The extended Berkeley Packet Filter (eBPF) backend
+--------------------------------------------------
+
+Extended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used
+to filter network packets.  The
+`bpf() system call <http://man7.org/linux/man-pages/man2/bpf.2.html>`_
+performs a range of operations related to eBPF.  For both cBPF and eBPF
+programs, the Linux kernel statically analyzes the programs before loading
+them, in order to ensure that they cannot harm the running system.  eBPF is
+a 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs.
+Opcodes are 8-bit encoded, and 87 instructions are defined.  There are 10
+registers, grouped by function as outlined below.
+
+::
+
+  R0        return value from in-kernel functions; exit value for eBPF program
+  R1 - R5   function call arguments to in-kernel functions
+  R6 - R9   callee-saved registers preserved by in-kernel functions
+  R10       stack frame pointer (read only)
+
+Instruction encoding (arithmetic and jump)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+eBPF is reusing most of the opcode encoding from classic to simplify conversion
+of classic BPF to eBPF.  For arithmetic and jump instructions the 8-bit 'code'
+field is divided into three parts:
+
+::
+
+  +----------------+--------+--------------------+
+  |   4 bits       |  1 bit |   3 bits           |
+  | operation code | source | instruction class  |
+  +----------------+--------+--------------------+
+  (MSB)                                      (LSB)
+
+Three LSB bits store instruction class which is one of:
+
+::
+
+  BPF_LD     0x0
+  BPF_LDX    0x1
+  BPF_ST     0x2
+  BPF_STX    0x3
+  BPF_ALU    0x4
+  BPF_JMP    0x5
+  (unused)   0x6
+  BPF_ALU64  0x7
+
+When BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP,
+4th bit encodes source operand
+
+::
+
+  BPF_X     0x0  use src_reg register as source operand
+  BPF_K     0x1  use 32 bit immediate as source operand
+
+and four MSB bits store operation code
  
-* Arithmetic instruction selection (including combo FMA)
+::
+
+  BPF_ADD   0x0  add
+  BPF_SUB   0x1  subtract
+  BPF_MUL   0x2  multiply
+  BPF_DIV   0x3  divide
+  BPF_OR    0x4  bitwise logical OR
+  BPF_AND   0x5  bitwise logical AND
+  BPF_LSH   0x6  left shift
+  BPF_RSH   0x7  right shift (zero extended)
+  BPF_NEG   0x8  arithmetic negation
+  BPF_MOD   0x9  modulo
+  BPF_XOR   0xa  bitwise logical XOR
+  BPF_MOV   0xb  move register to register
+  BPF_ARSH  0xc  right shift (sign extended)
+  BPF_END   0xd  endianness conversion
+
+If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of
+
+::
  
-* Bitwise instruction selection
+  BPF_JA    0x0  unconditional jump
+  BPF_JEQ   0x1  jump ==
+  BPF_JGT   0x2  jump >
+  BPF_JGE   0x3  jump >=
+  BPF_JSET  0x4  jump if (DST & SRC)
+  BPF_JNE   0x5  jump !=
+  BPF_JSGT  0x6  jump signed >
+  BPF_JSGE  0x7  jump signed >=
+  BPF_CALL  0x8  function call
+  BPF_EXIT  0x9  function return
+
+Instruction encoding (load, store)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+For load and store instructions the 8-bit 'code' field is divided as:
  
-* Control-flow instruction selection
+::
+
+  +--------+--------+-------------------+
+  | 3 bits | 2 bits |   3 bits          |
+  |  mode  |  size  | instruction class |
+  +--------+--------+-------------------+
+  (MSB)                             (LSB)
  
-* Function calls (only on SM 2.0+ and no return arguments)
+Size modifier is one of
  
-* Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = shared)
+::
  
-* Thread synchronization (bar.sync)
+  BPF_W       0x0  word
+  BPF_H       0x1  half word
+  BPF_B       0x2  byte
+  BPF_DW      0x3  double word
  
-* Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)
+Mode modifier is one of
  
-In Progress:
+::
  
-* Robust call instruction selection
+  BPF_IMM     0x0  immediate
+  BPF_ABS     0x1  used to access packet data
+  BPF_IND     0x2  used to access packet data
+  BPF_MEM     0x3  memory
+  (reserved)  0x4
+  (reserved)  0x5
+  BPF_XADD    0x6  exclusive add
  
-* Stack frame allocation
  
-* Device-specific instruction scheduling optimizations
+Packet data access (BPF_ABS, BPF_IND)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
+(BPF_IND | <size> | BPF_LD) which are used to access packet data.
+Register R6 is an implicit input that must contain pointer to sk_buff.
+Register R0 is an implicit output which contains the data fetched
+from the packet.  Registers R1-R5 are scratch registers and must not
+be used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD
+instructions.  These instructions have implicit program exit condition
+as well.  When eBPF program is trying to access the data beyond
+the packet boundary, the interpreter will abort the execution of the program.
+
+BPF_IND | BPF_W | BPF_LD is equivalent to:
+  R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32))
+
+eBPF maps
+^^^^^^^^^
+
+eBPF maps are provided for sharing data between kernel and user-space.
+Currently implemented types are hash and array, with potential extension to
+support bloom filters, radix trees, etc.  A map is defined by its type,
+maximum number of elements, key size and value size in bytes.  eBPF syscall
+supports create, update, find and delete functions on maps.
+
+Function calls
+^^^^^^^^^^^^^^
+
+Function call arguments are passed using up to five registers (R1 - R5).
+The return value is passed in a dedicated register (R0).  Four additional
+registers (R6 - R9) are callee-saved, and the values in these registers
+are preserved within kernel functions.  R0 - R5 are scratch registers within
+kernel functions, and eBPF programs must therefor store/restore values in
+these registers if needed across function calls.  The stack can be accessed
+using the read-only frame pointer R10.  eBPF registers map 1:1 to hardware
+registers on x86_64 and other 64-bit architectures.  For example, x86_64
+in-kernel JIT maps them as
+
+::
+
+  R0 - rax
+  R1 - rdi
+  R2 - rsi
+  R3 - rdx
+  R4 - rcx
+  R5 - r8
+  R6 - rbx
+  R7 - r13
+  R8 - r14
+  R9 - r15
+  R10 - rbp
+
+since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
+and rbx, r12 - r15 are callee saved.
+
+Program start
+^^^^^^^^^^^^^
+
+An eBPF program receives a single argument and contains
+a single eBPF main routine; the program does not contain eBPF functions.
+Function calls are limited to a predefined set of kernel functions.  The size
+of a program is limited to 4K instructions:  this ensures fast termination and
+a limited number of kernel function calls.  Prior to running an eBPF program,
+a verifier performs static analysis to prevent loops in the code and
+to ensure valid register usage and operand types.