X-Git-Url: http://plrg.eecs.uci.edu/git/?p=oota-llvm.git;a=blobdiff_plain;f=docs%2FCodeGenerator.rst;h=f3b949c7ad157fac32e43ca1647a07b9a757e8ee;hp=3bc8ce1a9708ed4b5744f4e7e69e856ff4a51493;hb=49c02eaad55b68f58d4abdbc22eb08b92a024437;hpb=d30c11eddee0c6f17a65303c249b77d14962c973 diff --git a/docs/CodeGenerator.rst b/docs/CodeGenerator.rst index 3bc8ce1a970..f3b949c7ad1 100644 --- a/docs/CodeGenerator.rst +++ b/docs/CodeGenerator.rst @@ -290,10 +290,10 @@ the opcode, the number of operands, the list of implicit register uses and defs, whether the instruction has certain target-independent properties (accesses memory, is commutable, etc), and holds any target-specific flags. -The ``TargetFrameInfo`` class ------------------------------ +The ``TargetFrameLowering`` class +--------------------------------- -The ``TargetFrameInfo`` class is used to provide information about the stack +The ``TargetFrameLowering`` class is used to provide information about the stack frame layout of the target. It holds the direction of stack growth, the known stack alignment on entry to each function, and the offset to the local area. The offset to the local area is the offset from the stack pointer on function @@ -464,7 +464,7 @@ code: mov %EAX, %EDX sar %EDX, 31 idiv %ECX - ret + ret This approach is extremely general (if it can handle the X86 architecture, it can handle anything!) and allows all of the target specific knowledge about the @@ -640,7 +640,7 @@ For target specific directives, the MCStreamer has a MCTargetStreamer instance. Each target that needs it defines a class that inherits from it and is a lot like MCStreamer itself: It has one method per directive and two classes that inherit from it, a target object streamer and a target asm streamer. The target -asm streamer just prints it (``emitFnStart -> .fnstrart``), and the object +asm streamer just prints it (``emitFnStart -> .fnstart``), and the object streamer implement the assembler logic for it. To make llvm use these classes, the target initialization must call @@ -749,7 +749,7 @@ The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the ``SDNode`` class. The primary payload of the ``SDNode`` is its operation code (Opcode) that indicates what operation the node performs and the operands to the operation. The various operation node types are described at the top of the -``include/llvm/CodeGen/SelectionDAGNodes.h`` file. +``include/llvm/CodeGen/ISDOpcodes.h`` file. Although most operations define a single value, each node in the graph may define multiple values. For example, a combined div/rem operation will define @@ -769,7 +769,9 @@ provide an ordering between nodes that have side effects (such as loads, stores, calls, returns, etc). All nodes that have side effects should take a token chain as input and produce a new one as output. By convention, token chain inputs are always operand #0, and chain results are always the last value -produced by an operation. +produced by an operation. However, after instruction selection, the +machine nodes have their chain after the instruction's operands, and +may be followed by glue nodes. A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is always a marker node with an Opcode of ``ISD::EntryToken``. The Root node is @@ -827,7 +829,7 @@ One great way to visualize what is going on here is to take advantage of a few LLC command line options. The following options pop up a window displaying the SelectionDAG at specific times (if you only get errors printed to the console while using this, you probably `need to configure your -system `_ to add support for it). +system `_ to add support for it). * ``-view-dag-combine1-dags`` displays the DAG after being built, before the first optimization pass. @@ -846,6 +848,10 @@ is based on the final SelectionDAG, with nodes that must be scheduled together bundled into a single scheduling-unit node, and with immediate operands and other nodes that aren't relevant for scheduling omitted. +The option ``-filter-view-dags`` allows to select the name of the basic block +that you are interested to visualize and filters all the previous +``view-*-dags`` options. + .. _Build initial DAG: Initial SelectionDAG Construction @@ -1228,7 +1234,7 @@ used. Each virtual register can only be mapped to physical registers of a particular class. For instance, in the X86 architecture, some virtuals can only be allocated to 8 bit registers. A register class is described by ``TargetRegisterClass`` objects. To discover if a virtual register is -compatible with a given physical, this code can be used:

+compatible with a given physical, this code can be used: .. code-block:: c++ @@ -1334,7 +1340,7 @@ found before being stored or after being reloaded. If the indirect strategy is used, after all the virtual registers have been mapped to physical registers or stack slots, it is necessary to use a spiller object to place load and store instructions in the code. Every virtual that has -been mapped to a stack slot will be stored to memory after been defined and will +been mapped to a stack slot will be stored to memory after being defined and will be loaded before being used. The implementation of the spiller tries to recycle load/store instructions, avoiding unnecessary instructions. For an example of how to invoke the spiller, see ``RegAllocLinearScan::runOnMachineFunction`` in @@ -1347,7 +1353,7 @@ With very rare exceptions (e.g., function calls), the LLVM machine code instructions are three address instructions. That is, each instruction is expected to define at most one register, and to use at most two registers. However, some architectures use two address instructions. In this case, the -defined register is also one of the used register. For instance, an instruction +defined register is also one of the used registers. For instance, an instruction such as ``ADD %EAX, %EBX``, in X86 is actually equivalent to ``%EAX = %EAX + %EBX``. @@ -1572,7 +1578,7 @@ three important things that you have to implement for your target: correspond to. The MCInsts that are generated by this are fed into the instruction printer or the encoder. -Finally, at your choosing, you can also implement an subclass of MCCodeEmitter +Finally, at your choosing, you can also implement a subclass of MCCodeEmitter which lowers MCInst's into machine code bytes and relocations. This is important if you want to support direct .o file emission, or would like to implement an assembler for your target. @@ -1683,7 +1689,7 @@ ones supported by the matcher), through a Requires clause: def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>; def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>; -In this example, the mnemonic gets mapped into different a new one depending on +In this example, the mnemonic gets mapped into a different one depending on the current instruction set. Instruction Aliases @@ -1808,6 +1814,7 @@ Here is the table: :raw-html:`SystemZ` :raw-html:`X86` :raw-html:`XCore` +:raw-html:`eBPF` :raw-html:`` :raw-html:`` @@ -1822,6 +1829,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1836,6 +1844,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1850,6 +1859,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1864,6 +1874,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1878,6 +1889,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1892,6 +1904,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1906,6 +1919,7 @@ Here is the table: :raw-html:` ` :raw-html:` ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1920,6 +1934,7 @@ Here is the table: :raw-html:` ` :raw-html:`* ` :raw-html:` ` +:raw-html:` ` :raw-html:`` :raw-html:`` @@ -1993,7 +2008,7 @@ Tail Calls This box indicates whether the target supports guaranteed tail calls. These are calls marked "`tail `_" and use the fastcc calling -convention. Please see the `tail call section more more details`_. +convention. Please see the `tail call section`_ for more details. .. _feat_segstacks: @@ -2011,7 +2026,7 @@ Basic support exists on the X86 backend. Currently vararg doesn't work and the object files are not marked the way the gold linker expects, but simple Go programs can be built by dragonegg. -.. _tail call section more more details: +.. _tail call section: Tail call optimization ---------------------- @@ -2027,7 +2042,7 @@ supported on x86/x86-64 and PowerPC. It is performed if: * Option ``-tailcallopt`` is enabled. -* Platform specific constraints are met. +* Platform-specific constraints are met. x86/x86-64 constraints: @@ -2442,3 +2457,191 @@ Code Generator Options: :raw-html:`` :raw-html:`` +The extended Berkeley Packet Filter (eBPF) backend +-------------------------------------------------- + +Extended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used +to filter network packets. The +`bpf() system call `_ +performs a range of operations related to eBPF. For both cBPF and eBPF +programs, the Linux kernel statically analyzes the programs before loading +them, in order to ensure that they cannot harm the running system. eBPF is +a 64-bit RISC instruction set designed for one to one mapping to 64-bit CPUs. +Opcodes are 8-bit encoded, and 87 instructions are defined. There are 10 +registers, grouped by function as outlined below. + +:: + + R0 return value from in-kernel functions; exit value for eBPF program + R1 - R5 function call arguments to in-kernel functions + R6 - R9 callee-saved registers preserved by in-kernel functions + R10 stack frame pointer (read only) + +Instruction encoding (arithmetic and jump) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +eBPF is reusing most of the opcode encoding from classic to simplify conversion +of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' +field is divided into three parts: + +:: + + +----------------+--------+--------------------+ + | 4 bits | 1 bit | 3 bits | + | operation code | source | instruction class | + +----------------+--------+--------------------+ + (MSB) (LSB) + +Three LSB bits store instruction class which is one of: + +:: + + BPF_LD 0x0 + BPF_LDX 0x1 + BPF_ST 0x2 + BPF_STX 0x3 + BPF_ALU 0x4 + BPF_JMP 0x5 + (unused) 0x6 + BPF_ALU64 0x7 + +When BPF_CLASS(code) == BPF_ALU or BPF_ALU64 or BPF_JMP, +4th bit encodes source operand + +:: + + BPF_X 0x0 use src_reg register as source operand + BPF_K 0x1 use 32 bit immediate as source operand + +and four MSB bits store operation code + +:: + + BPF_ADD 0x0 add + BPF_SUB 0x1 subtract + BPF_MUL 0x2 multiply + BPF_DIV 0x3 divide + BPF_OR 0x4 bitwise logical OR + BPF_AND 0x5 bitwise logical AND + BPF_LSH 0x6 left shift + BPF_RSH 0x7 right shift (zero extended) + BPF_NEG 0x8 arithmetic negation + BPF_MOD 0x9 modulo + BPF_XOR 0xa bitwise logical XOR + BPF_MOV 0xb move register to register + BPF_ARSH 0xc right shift (sign extended) + BPF_END 0xd endianness conversion + +If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of + +:: + + BPF_JA 0x0 unconditional jump + BPF_JEQ 0x1 jump == + BPF_JGT 0x2 jump > + BPF_JGE 0x3 jump >= + BPF_JSET 0x4 jump if (DST & SRC) + BPF_JNE 0x5 jump != + BPF_JSGT 0x6 jump signed > + BPF_JSGE 0x7 jump signed >= + BPF_CALL 0x8 function call + BPF_EXIT 0x9 function return + +Instruction encoding (load, store) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +For load and store instructions the 8-bit 'code' field is divided as: + +:: + + +--------+--------+-------------------+ + | 3 bits | 2 bits | 3 bits | + | mode | size | instruction class | + +--------+--------+-------------------+ + (MSB) (LSB) + +Size modifier is one of + +:: + + BPF_W 0x0 word + BPF_H 0x1 half word + BPF_B 0x2 byte + BPF_DW 0x3 double word + +Mode modifier is one of + +:: + + BPF_IMM 0x0 immediate + BPF_ABS 0x1 used to access packet data + BPF_IND 0x2 used to access packet data + BPF_MEM 0x3 memory + (reserved) 0x4 + (reserved) 0x5 + BPF_XADD 0x6 exclusive add + + +Packet data access (BPF_ABS, BPF_IND) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Two non-generic instructions: (BPF_ABS | | BPF_LD) and +(BPF_IND | | BPF_LD) which are used to access packet data. +Register R6 is an implicit input that must contain pointer to sk_buff. +Register R0 is an implicit output which contains the data fetched +from the packet. Registers R1-R5 are scratch registers and must not +be used to store the data across BPF_ABS | BPF_LD or BPF_IND | BPF_LD +instructions. These instructions have implicit program exit condition +as well. When eBPF program is trying to access the data beyond +the packet boundary, the interpreter will abort the execution of the program. + +BPF_IND | BPF_W | BPF_LD is equivalent to: + R0 = ntohl(\*(u32 \*) (((struct sk_buff \*) R6)->data + src_reg + imm32)) + +eBPF maps +^^^^^^^^^ + +eBPF maps are provided for sharing data between kernel and user-space. +Currently implemented types are hash and array, with potential extension to +support bloom filters, radix trees, etc. A map is defined by its type, +maximum number of elements, key size and value size in bytes. eBPF syscall +supports create, update, find and delete functions on maps. + +Function calls +^^^^^^^^^^^^^^ + +Function call arguments are passed using up to five registers (R1 - R5). +The return value is passed in a dedicated register (R0). Four additional +registers (R6 - R9) are callee-saved, and the values in these registers +are preserved within kernel functions. R0 - R5 are scratch registers within +kernel functions, and eBPF programs must therefor store/restore values in +these registers if needed across function calls. The stack can be accessed +using the read-only frame pointer R10. eBPF registers map 1:1 to hardware +registers on x86_64 and other 64-bit architectures. For example, x86_64 +in-kernel JIT maps them as + +:: + + R0 - rax + R1 - rdi + R2 - rsi + R3 - rdx + R4 - rcx + R5 - r8 + R6 - rbx + R7 - r13 + R8 - r14 + R9 - r15 + R10 - rbp + +since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing +and rbx, r12 - r15 are callee saved. + +Program start +^^^^^^^^^^^^^ + +An eBPF program receives a single argument and contains +a single eBPF main routine; the program does not contain eBPF functions. +Function calls are limited to a predefined set of kernel functions. The size +of a program is limited to 4K instructions: this ensures fast termination and +a limited number of kernel function calls. Prior to running an eBPF program, +a verifier performs static analysis to prevent loops in the code and +to ensure valid register usage and operand types.