X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FCodeGenerator.html;h=c4b15dfe377e8a4962dd1d24350e9159d9326c42;hb=057a4b40a65692ea54e0a00cb6ea27d0855be51f;hp=4a656a243c47fdafe973ad63042074849eb01dd0;hpb=cb88ec34f0ed7b969e20b8e6e67fee9b29f65dfa;p=oota-llvm.git diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html index 4a656a243c4..c4b15dfe377 100644 --- a/docs/CodeGenerator.html +++ b/docs/CodeGenerator.html @@ -19,9 +19,9 @@ -
+

The LLVM Target-Independent Code Generator -

+
  1. Introduction @@ -50,6 +50,7 @@
  2. The MachineBasicBlock class
  3. The MachineFunction class
  4. +
  5. MachineInstr Bundles
  6. The "MC" Layer @@ -97,6 +98,14 @@
  7. Built in register allocators
  8. Code Emission
  9. +
  10. VLIW Packetizer + +
  11. Implementing a Native Assembler
  12. @@ -114,6 +123,7 @@
  13. Prolog/Epilog
  14. Dynamic Allocation
  15. +
  16. The PTX backend
@@ -127,12 +137,12 @@ -
+

Introduction -

+ -
+

The LLVM target-independent code generator is a framework that provides a suite of reusable components for translating the LLVM internal representation @@ -188,14 +198,12 @@ depend on the target-description and machine code representation classes, ensuring that it is portable.

-
- -
+

Required components in the code generator -

+ -
+

The two pieces of the LLVM code generator are the high-level interface to the code generator and the set of reusable components that can be used to build @@ -223,11 +231,11 @@

- + -
+

The LLVM target-independent code generator is designed to support efficient and quality code generation for standard register-based microprocessors. @@ -297,11 +305,11 @@

- + -
+

The target description classes require a detailed description of the target architecture. These target descriptions often have a large amount of common @@ -324,13 +332,15 @@

+
+ - + -
+

The LLVM target description classes (located in the include/llvm/Target directory) provide an abstract description of @@ -346,14 +356,12 @@ TargetMachine class provides accessors that should be implemented by the target.

-
- - + -
+

The TargetMachine class provides virtual methods that are used to access the target-specific implementations of the various target description @@ -369,11 +377,11 @@

- + -
+

The TargetData class is the only required target description class, and it is the only class that is not extensible (you cannot derived a new @@ -385,11 +393,11 @@

- + -
+

The TargetLowering class is used by SelectionDAG based instruction selectors primarily to describe how LLVM code should be lowered to @@ -411,11 +419,11 @@

- + -
+

The TargetRegisterInfo class is used to describe the register file of the target and any interactions between the registers.

@@ -445,11 +453,11 @@
- + -
+

The TargetInstrInfo class is used to describe the machine instructions supported by the target. It is essentially an array of @@ -463,11 +471,11 @@

- + -
+

The TargetFrameInfo class is used to provide information about the stack frame layout of the target. It holds the direction of stack growth, the @@ -479,11 +487,11 @@

- + -
+

The TargetSubtarget class is used to provide information about the specific chip set being targeted. A sub-target informs code generation of @@ -495,11 +503,11 @@ -

+ -
+

The TargetJITInfo class exposes an abstract interface used by the Just-In-Time code generator to perform target-specific activities, such as @@ -509,13 +517,15 @@

+
+ - + -
+

At the high-level, LLVM code is translated to a machine specific representation formed out of @@ -528,14 +538,12 @@ SSA representation for machine code, as well as a register allocated, non-SSA form.

-
- - + -
+

Target machine instructions are represented as instances of the MachineInstr class. This class is an extremely abstract way of @@ -576,14 +584,12 @@

Also if the first operand is a def, it is easier to create instructions whose only def is the first operand.

-
- - + -
+

Machine instructions are created by using the BuildMI functions, located in the include/llvm/CodeGen/MachineInstrBuilder.h file. The @@ -630,11 +636,11 @@ MI.addReg(Reg, RegState::Define);

- + -
+

One important issue that the code generator needs to be aware of is the presence of fixed registers. In particular, there are often places in the @@ -702,11 +708,26 @@ ret

-
- Machine code in SSA form +

+ Call-clobbered registers +

+ +
+ +

Some machine instructions, like calls, clobber a large number of physical + registers. Rather than adding <def,dead> operands for + all of them, it is possible to use an MO_RegisterMask operand + instead. The register mask operand holds a bit mask of preserved registers, + and everything else is considered to be clobbered by the instruction.

+
-
+ +

+ Machine code in SSA form +

+ +

MachineInstr's are initially selected in SSA-form, and are maintained in SSA-form until register allocation happens. For the most part, @@ -719,12 +740,14 @@ ret

+
+ - + -
+

The MachineBasicBlock class contains a list of machine instructions (MachineInstr instances). It roughly @@ -737,11 +760,11 @@ ret

- + -
+

The MachineFunction class contains a list of machine basic blocks (MachineBasicBlock instances). It @@ -754,14 +777,99 @@ ret

+ +

+ MachineInstr Bundles +

+ +
+ +

LLVM code generator can model sequences of instructions as MachineInstr + bundles. A MI bundle can model a VLIW group / pack which contains an + arbitrary number of parallel instructions. It can also be used to model + a sequential list of instructions (potentially with data dependencies) that + cannot be legally separated (e.g. ARM Thumb2 IT blocks).

+ +

Conceptually a MI bundle is a MI with a number of other MIs nested within: +

+ +
+
+--------------
+|   Bundle   | ---------
+--------------          \
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |
+--------------
+|   Bundle   | --------
+--------------         \
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |                  ...
+       |
+--------------
+|   Bundle   | --------
+--------------         \
+       |
+      ...
+
+
+ +

MI bundle support does not change the physical representations of + MachineBasicBlock and MachineInstr. All the MIs (including top level and + nested ones) are stored as sequential list of MIs. The "bundled" MIs are + marked with the 'InsideBundle' flag. A top level MI with the special BUNDLE + opcode is used to represent the start of a bundle. It's legal to mix BUNDLE + MIs with indiviual MIs that are not inside bundles nor represent bundles. +

+ +

MachineInstr passes should operate on a MI bundle as a single unit. Member + methods have been taught to correctly handle bundles and MIs inside bundles. + The MachineBasicBlock iterator has been modified to skip over bundled MIs to + enforce the bundle-as-a-single-unit concept. An alternative iterator + instr_iterator has been added to MachineBasicBlock to allow passes to + iterate over all of the MIs in a MachineBasicBlock, including those which + are nested inside bundles. The top level BUNDLE instruction must have the + correct set of register MachineOperand's that represent the cumulative + inputs and outputs of the bundled MIs.

+ +

Packing / bundling of MachineInstr's should be done as part of the register + allocation super-pass. More specifically, the pass which determines what + MIs should be bundled together must be done after code generator exits SSA + form (i.e. after two-address pass, PHI elimination, and copy coalescing). + Bundles should only be finalized (i.e. adding BUNDLE MIs and input and + output register MachineOperands) after virtual registers have been + rewritten into physical registers. This requirement eliminates the need to + add virtual register operands to BUNDLE instructions which would effectively + double the virtual register def and use lists.

+ +
+ +
- + -
+

The MC Layer is used to represent and process code at the raw machine code @@ -770,7 +878,7 @@ level, devoid of "high level" information like "constant pools", "jump tables", like label names, machine instructions, and sections in the object file. The code in this layer is used for a number of important purposes: the tail end of the code generator uses it to write a .s or .o file, and it is also used by the -llvm-mc tool to implement standalone machine codeassemblers and disassemblers. +llvm-mc tool to implement standalone machine code assemblers and disassemblers.

@@ -779,15 +887,12 @@ of important subsystems that interact at this layer, they are described later in this manual.

-
- - - + -
+

MCStreamer is best thought of as an assembler API. It is an abstract API which @@ -817,11 +922,11 @@ MCObjectStreamer implements a full assembler.

- + -
+

The MCContext class is the owner of a variety of uniqued data structures at the @@ -832,11 +937,11 @@ interact with to create symbols and sections. This class can not be subclassed.

- + -
+

The MCSymbol class represents a symbol (aka label) in the assembly file. There @@ -864,11 +969,11 @@ like this to the .s file:

- + -
+

The MCSection class represents an object-file specific section. It is subclassed @@ -882,11 +987,11 @@ directive in a .s file).

- + -
+

The MCInst class is a target-independent representation of an instruction. It @@ -904,27 +1009,26 @@ printer, and the type generated by the assembly parser and disassembler.

+
- + -
+

This section documents the phases described in the high-level design of the code generator. It explains how they work and some of the rationale behind their design.

-
- - + -
+

Instruction Selection is the process of translating LLVM code presented to the code generator into target-specific machine instructions. There are @@ -936,14 +1040,12 @@ printer, and the type generated by the assembly parser and disassembler. selector to be generated from these .td files, though currently there are still things that require custom C++ code.

-
- - + -
+

The SelectionDAG provides an abstraction for code representation in a way that is amenable to instruction selection using automatic techniques @@ -1001,11 +1103,11 @@ printer, and the type generated by the assembly parser and disassembler.

- + -
+

SelectionDAG-based instruction selection consists of the following steps:

@@ -1082,11 +1184,11 @@ printer, and the type generated by the assembly parser and disassembler.
- + -
+

The initial SelectionDAG is naïvely peephole expanded from the LLVM input by the SelectionDAGLowering class in the @@ -1102,11 +1204,11 @@ printer, and the type generated by the assembly parser and disassembler.

- + -
+

The Legalize phase is in charge of converting a DAG to only use the types that are natively supported by the target.

@@ -1135,11 +1237,11 @@ printer, and the type generated by the assembly parser and disassembler.
- + -
+

The Legalize phase is in charge of converting a DAG to only use the operations that are natively supported by the target.

@@ -1167,12 +1269,13 @@ printer, and the type generated by the assembly parser and disassembler.
- +

+ + SelectionDAG Optimization Phase: the DAG Combiner + +

-
+

The SelectionDAG optimization phase is run multiple times for code generation, immediately after the DAG is built and once after each @@ -1202,11 +1305,11 @@ printer, and the type generated by the assembly parser and disassembler.

- + -
+

The Select phase is the bulk of the target-specific code for instruction selection. This phase takes a legal SelectionDAG as input, pattern matches @@ -1363,11 +1466,11 @@ def : Pat<(i32 imm:$imm),

- + -
+

The scheduling phase takes the DAG of target instructions from the selection phase and assigns an order. The scheduler can pick an order depending on @@ -1384,11 +1487,11 @@ def : Pat<(i32 imm:$imm),

- + -
+
  1. Optional function-at-a-time selection.
  2. @@ -1398,18 +1501,20 @@ def : Pat<(i32 imm:$imm),
+
+ - -

To Be Written

+ +

To Be Written

- + -
+

Live Intervals are the ranges (intervals) where a variable is live. They are used by some register allocator passes to @@ -1417,14 +1522,12 @@ def : Pat<(i32 imm:$imm), register are live at the same point in the program (i.e., they conflict). When this situation occurs, one virtual register must be spilled.

-
- - + -
+

The first step in determining the live intervals of variables is to calculate the set of registers that are immediately dead after the instruction (i.e., @@ -1466,11 +1569,11 @@ def : Pat<(i32 imm:$imm),

- + -
+

We now have the information available to perform the live intervals analysis and build the live intervals themselves. We start off by numbering the basic @@ -1485,12 +1588,14 @@ def : Pat<(i32 imm:$imm),

+
+ - + -
+

The Register Allocation problem consists in mapping a program Pv, that can use an unbounded number of virtual registers, @@ -1500,23 +1605,21 @@ def : Pat<(i32 imm:$imm), accommodate all the virtual registers, some of them will have to be mapped into memory. These virtuals are called spilled virtuals.

-
- - + -
+

In LLVM, physical registers are denoted by integer numbers that normally range from 1 to 1023. To see how this numbering is defined for a particular architecture, you can read the GenRegisterNames.inc file for that architecture. For instance, by - inspecting lib/Target/X86/X86GenRegisterNames.inc we see that the - 32-bit register EAX is denoted by 15, and the MMX register - MM0 is mapped to 48.

+ inspecting lib/Target/X86/X86GenRegisterInfo.inc we see that the + 32-bit register EAX is denoted by 43, and the MMX register + MM0 is mapped to 65.

Some architectures contain registers that share the same physical location. A notable example is the X86 platform. For instance, in the X86 architecture, @@ -1524,7 +1627,7 @@ def : Pat<(i32 imm:$imm), bits. These physical registers are marked as aliased in LLVM. Given a particular architecture, you can check which registers are aliased by inspecting its RegisterInfo.td file. Moreover, the method - TargetRegisterInfo::getAliasSet(p_reg) returns an array containing + MCRegisterInfo::getAliasSet(p_reg) returns an array containing all the physical registers aliased to the register p_reg.

Physical registers, in LLVM, are grouped in Register Classes. @@ -1617,11 +1720,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, -

+ -
+

There are two ways to map virtual registers to physical registers (or to memory slots). The first way, that we will call direct mapping, is @@ -1667,11 +1770,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

- + -
+

With very rare exceptions (e.g., function calls), the LLVM machine code instructions are three address instructions. That is, each instruction is @@ -1703,11 +1806,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

- + -
+

An important transformation that happens during register allocation is called the SSA Deconstruction Phase. The SSA form simplifies many analyses @@ -1727,11 +1830,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

- + -
+

Instruction folding is an optimization performed during register allocation that removes unnecessary copy instructions. For instance, a @@ -1764,32 +1867,38 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, -

+ -
+

The LLVM infrastructure provides the application developer with three different register allocators:

    -
  • Linear ScanThe default allocator. This is the - well-know linear scan register allocator. Whereas the - Simple and Local algorithms use a direct mapping - implementation technique, the Linear Scan implementation - uses a spiller in order to place load and stores.
  • -
  • Fast — This register allocator is the default for debug builds. It allocates registers on a basic block level, attempting to keep values in registers and reusing registers as appropriate.
  • +
  • Basic — This is an incremental approach to register + allocation. Live ranges are assigned to registers one at a time in + an order that is driven by heuristics. Since code can be rewritten + on-the-fly during allocation, this framework allows interesting + allocators to be developed as extensions. It is not itself a + production register allocator but is a potentially useful + stand-alone mode for triaging bugs and as a performance baseline. + +
  • GreedyThe default allocator. This is a + highly tuned implementation of the Basic allocator that + incorporates global live range splitting. This allocator works hard + to minimize the cost of spill code. +
  • PBQP — A Partitioned Boolean Quadratic Programming (PBQP) based register allocator. This allocator works by constructing a PBQP problem representing the register allocation problem under consideration, solving this using a PBQP solver, and mapping the solution back to a register assignment.
  • -

The type of register allocator used in llc can be chosen with the @@ -1805,23 +1914,143 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;

+
+ -
+

Prolog/Epilog Code Insertion +

+ +
+ + +

+ Compact Unwind +

+ +
+ +

Throwing an exception requires unwinding out of a function. The + information on how to unwind a given function is traditionally expressed in + DWARF unwind (a.k.a. frame) info. But that format was originally developed + for debuggers to backtrace, and each Frame Description Entry (FDE) requires + ~20-30 bytes per function. There is also the cost of mapping from an address + in a function to the corresponding FDE at runtime. An alternative unwind + encoding is called compact unwind and requires just 4-bytes per + function.

+ +

The compact unwind encoding is a 32-bit value, which is encoded in an + architecture-specific way. It specifies which registers to restore and from + where, and how to unwind out of the function. When the linker creates a final + linked image, it will create a __TEXT,__unwind_info + section. This section is a small and fast way for the runtime to access + unwind info for any given function. If we emit compact unwind info for the + function, that compact unwind info will be encoded in + the __TEXT,__unwind_info section. If we emit DWARF unwind info, + the __TEXT,__unwind_info section will contain the offset of the + FDE in the __TEXT,__eh_frame section in the final linked + image.

+ +

For X86, there are three modes for the compact unwind encoding:

+ +
+
Function with a Frame Pointer (EBP or RBP)
+

EBP/RBP-based frame, where EBP/RBP is pushed + onto the stack immediately after the return address, + then ESP/RSP is moved to EBP/RBP. Thus to + unwind, ESP/RSP is restored with the + current EBP/RBP value, then EBP/RBP is restored + by popping the stack, and the return is done by popping the stack once + more into the PC. All non-volatile registers that need to be restored must + have been saved in a small range on the stack that + starts EBP-4 to EBP-1020 (RBP-8 + to RBP-1020). The offset (divided by 4 in 32-bit mode and 8 + in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000). + The registers saved are encoded in bits 0-14 + (mask: 0x00007FFF) as five 3-bit entries from the following + table:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Compact Numberi386 Registerx86-64 Regiser
1EBXRBX
2ECXR12
3EDXR13
4EDIR14
5ESIR15
6EBPRBP
+ +
+ +
Frameless with a Small Constant Stack Size (EBP + or RBP is not used as a frame pointer)
+

To return, a constant (encoded in the compact unwind encoding) is added + to the ESP/RSP. Then the return is done by popping the stack + into the PC. All non-volatile registers that need to be restored must have + been saved on the stack immediately after the return address. The stack + size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits + 16-23 (mask: 0x00FF0000). There is a maximum stack size of + 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers + saved is encoded in bits 9-12 (mask: 0x00001C00). Bits 0-9 + (mask: 0x000003FF) contain which registers were saved and + their order. (See + the encodeCompactUnwindRegistersWithoutFrame() function + in lib/Target/X86FrameLowering.cpp for the encoding + algorithm.)

+ +
Frameless with a Large Constant Stack Size (EBP + or RBP is not used as a frame pointer)
+

This case is like the "Frameless with a Small Constant Stack Size" + case, but the stack size is too large to encode in the compact unwind + encoding. Instead it requires that the function contains "subl + $nnnnnn, %esp" in its prolog. The compact encoding contains the + offset to the $nnnnnn value in the function in bits 9-12 + (mask: 0x00001C00).

+
+ +
+
-

To Be Written

+ - -

To Be Written

+ +

To Be Written

- + -
+

The code emission step of code generation is responsible for lowering from the code generator abstractions (like

+ +

+ VLIW Packetizer +

+ +
+ +

In a Very Long Instruction Word (VLIW) architecture, the compiler is + responsible for mapping instructions to functional-units available on + the architecture. To that end, the compiler creates groups of instructions + called packets or bundles. The VLIW packetizer in LLVM is + a target-independent mechanism to enable the packetization of machine + instructions.

+ + + +

+ Mapping from instructions to functional units +

+ +
+ +

Instructions in a VLIW target can typically be mapped to multiple functional +units. During the process of packetizing, the compiler must be able to reason +about whether an instruction can be added to a packet. This decision can be +complex since the compiler has to examine all possible mappings of instructions +to functional units. Therefore to alleviate compilation-time complexity, the +VLIW packetizer parses the instruction classes of a target and generates tables +at compiler build time. These tables can then be queried by the provided +machine-independent API to determine if an instruction can be accommodated in a +packet.

+
+ + +

+ + How the packetization tables are generated and used + +

+ +
+ +

The packetizer reads instruction classes from a target's itineraries and +creates a deterministic finite automaton (DFA) to represent the state of a +packet. A DFA consists of three major elements: inputs, states, and +transitions. The set of inputs for the generated DFA represents the instruction +being added to a packet. The states represent the possible consumption +of functional units by instructions in a packet. In the DFA, transitions from +one state to another occur on the addition of an instruction to an existing +packet. If there is a legal mapping of functional units to instructions, then +the DFA contains a corresponding transition. The absence of a transition +indicates that a legal mapping does not exist and that the instruction cannot +be added to the packet.

+ +

To generate tables for a VLIW target, add TargetGenDFAPacketizer.inc +as a target to the Makefile in the target directory. The exported API provides +three functions: DFAPacketizer::clearResources(), +DFAPacketizer::reserveResources(MachineInstr *MI), and +DFAPacketizer::canReserveResources(MachineInstr *MI). These functions +allow a target packetizer to add an instruction to an existing packet and to +check whether an instruction can be added to a packet. See +llvm/CodeGen/DFAPacketizer.h for more information.

+ +
+ +
+ +
- + -
+

Though you're probably reading this because you want to write or maintain a compiler backend, LLVM also fully supports building a native assemblers too. @@ -1896,20 +2193,18 @@ We've tried hard to automate the generation of the assembler from the .td files part of the manual and repetitive data entry can be factored and shared with the compiler.

-
- -
Instruction Parsing
+

Instruction Parsing

-

To Be Written

+

To Be Written

-
+

Instruction Alias Processing -

+ -
+

Once the instruction is parsed, it enters the MatchInstructionImpl function. The MatchInstructionImpl function performs alias processing and then does actual matching.

@@ -1922,12 +2217,10 @@ complex/powerful). Generally you want to use the first alias mechanism that meets the needs of your instruction, because it will allow a more concise description.

-
- -
Mnemonic Aliases
+

Mnemonic Aliases

-
+

The first phase of alias processing is simple instruction mnemonic remapping for classes of instructions which are allowed with two different @@ -1965,9 +2258,9 @@ on the current instruction set.

-
Instruction Aliases
+

Instruction Aliases

-
+

The most general phase of alias processing occurs while matching is happening: it provides new forms for the matcher to match along with a specific @@ -2026,38 +2319,40 @@ def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;

Instruction aliases can also have a Requires clause to make them subtarget specific.

-
+

If the back-end supports it, the instruction printer can automatically emit + the alias rather than what's being aliased. It typically leads to better, + more readable code. If it's better to print out what's being aliased, then + pass a '0' as the third parameter to the InstAlias definition.

+
+
-
Instruction Matching
- -

To Be Written

- +

Instruction Matching

+

To Be Written

+
- + -
+

This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target.

-
- - + -
+

Note that this table does not include the C backend or Cpp backends, since they do not use the target independent code generator infrastructure. It also @@ -2092,16 +2387,14 @@ is the key:

Feature ARM - Alpha - Blackfin CellSPU + Hexagon MBlaze MSP430 Mips PTX PowerPC Sparc - SystemZ X86 XCore @@ -2109,16 +2402,14 @@ is the key:

is generally reliable - - + - + - @@ -2126,16 +2417,14 @@ is the key:

assembly parser - - + - @@ -2143,16 +2432,14 @@ is the key:

disassembler - - + - @@ -2160,33 +2447,29 @@ is the key:

inline asm - - + - - * + jit * - - + - + - @@ -2194,16 +2477,14 @@ is the key:

.o file writing - - + - @@ -2211,29 +2492,40 @@ is the key:

tail calls - - + - + + segmented stacks + + + + + + + + + + * + + + -
- -
Is Generally Reliable
+

Is Generally Reliable

-
+

This box indicates whether the target is considered to be production quality. This indicates that the target has been used as a static compiler to compile large amounts of code by a variety of different people and is in @@ -2241,9 +2533,9 @@ continuous use.

-
Assembly Parser
+

Assembly Parser

-
+

This box indicates whether the target supports parsing target specific .s files by implementing the MCAsmParser interface. This is required for llvm-mc to be able to act as a native assembler and is required for inline assembly @@ -2253,30 +2545,27 @@ support in the native .o file writer.

-
Disassembler
+

Disassembler

-
+

This box indicates whether the target supports the MCDisassembler API for disassembling machine opcode bytes into MCInst's.

-
Inline Asm
+

Inline Asm

-
+

This box indicates whether the target supports most popular inline assembly constraints and modifiers.

-

X86 lacks reliable support for inline assembly -constraints relating to the X86 floating point stack.

-
-
JIT Support
+

JIT Support

-
+

This box indicates whether the target supports the JIT compiler through the ExecutionEngine interface.

@@ -2286,9 +2575,9 @@ in ARM codegen mode, but lacks NEON and full Thumb support.

-
.o File Writing
+

.o File Writing

-
+

This box indicates whether the target supports writing .o files (e.g. MachO, ELF, and/or COFF) files directly from the target. Note that the target also @@ -2302,9 +2591,9 @@ file to a .o file (as is the case for many C compilers).

-
Tail Calls
+

Tail Calls

-
+

This box indicates whether the target supports guaranteed tail calls. These are calls marked "tail" and use the fastcc @@ -2313,15 +2602,30 @@ more more details.

+ +

Segmented Stacks

+ +
+ +

This box indicates whether the target supports segmented stacks. This +replaces the traditional large C stack with many linked segments. It +is compatible with the gcc +implementation used by the Go front end.

+

Basic support exists on the X86 backend. Currently +vararg doesn't work and the object files are not marked the way the gold +linker expects, but simple Go programs can be built by dragonegg.

+
+ +
- + -
+

Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:

@@ -2383,11 +2687,11 @@ define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
- + -
+

Sibling call optimization is a restricted form of tail call optimization. Unlike tail call optimization described in the previous section, it can be @@ -2427,24 +2731,22 @@ entry:

- + -
+

The X86 code generator lives in the lib/Target/X86 directory. This code generator is capable of targeting a variety of x86-32 and x86-64 processors, and includes support for ISA extensions such as MMX and SSE.

-
- - + -
+

The following are the known target triples that are supported by the X86 backend. This is not an exhaustive list, and it would be useful to add those @@ -2469,31 +2771,34 @@ entry:

- + -
+

The following target-specific calling conventions are known to backend:

    -
  • x86_StdCall — stdcall calling convention seen on Microsoft - Windows platform (CC ID = 64).
  • - -
  • x86_FastCall — fastcall calling convention seen on Microsoft - Windows platform (CC ID = 65).
  • +
  • x86_StdCall — stdcall calling convention seen on Microsoft + Windows platform (CC ID = 64).
  • +
  • x86_FastCall — fastcall calling convention seen on Microsoft + Windows platform (CC ID = 65).
  • +
  • x86_ThisCall — Similar to X86_StdCall. Passes first argument + in ECX, others via stack. Callee is responsible for stack cleaning. This + convention is used by MSVC by default for methods in its ABI + (CC ID = 70).
- + -
+

The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following expression directly in integer @@ -2526,11 +2831,11 @@ OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg

- + -
+

x86 has a feature which provides the ability to perform loads and stores to different address spaces @@ -2571,11 +2876,11 @@ OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm PhysReg

- + -
+

An instruction name consists of the base name, a default operand size, and a a character per operand with an optional special size. For example:

@@ -2591,25 +2896,25 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
+
+ - + -
+

The PowerPC code generator lives in the lib/Target/PowerPC directory. The code generation is retargetable to several variations or subtargets of the PowerPC ISA; including ppc32, ppc64 and altivec.

-
- - + -
+

LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative (PIC) or static addressing for accessing global values, so no TOC @@ -2625,11 +2930,11 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

- + -
+

The size of a PowerPC frame is usually fixed for the duration of a function's invocation. Since the frame is fixed size, all references @@ -2772,11 +3077,11 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

- + -
+

The llvm prolog and epilog are the same as described in the PowerPC ABI, with the following exceptions. Callee saved registers are spilled after the frame @@ -2789,16 +3094,83 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

- + -
+

TODO - More to come.

+
+ + +

+ The PTX backend +

+ +
+ +

The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.

+ +

The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.

+ +

Code Generator Options:

+ + + + + + + + + + + + + + + + + +
OptionDescription
doubleIf enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic
no-fmaDisable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices
smxy / computexySet shader model/compute capability to x.y, + e.g. sm20 or compute13
+ +

Working:

+
    +
  • Arithmetic instruction selection (including combo FMA)
  • +
  • Bitwise instruction selection
  • +
  • Control-flow instruction selection
  • +
  • Function calls (only on SM 2.0+ and no return arguments)
  • +
  • Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = + shared)
  • +
  • Thread synchronization (bar.sync)
  • +
  • Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)
  • +
+ +

In Progress:

+
    +
  • Robust call instruction selection
  • +
  • Stack frame allocation
  • +
  • Device-specific instruction scheduling optimizations
  • +
+ + +
+ +

@@ -2809,7 +3181,7 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"> Chris Lattner
- The LLVM Compiler Infrastructure
+ The LLVM Compiler Infrastructure
Last modified: $Date$