The LLVM Target-Independent Code Generator

The LLVM target-independent code generator is a framework that provides a suite of reusable components for translating the LLVM internal representation to the machine code for a specified target—either in assembly form (suitable for a static compiler) or in binary machine code format (usable for - a JIT compiler). The LLVM target-independent code generator consists of five + a JIT compiler). The LLVM target-independent code generator consists of six main components:

include/llvm/Target/

Classes used to represent the machine code - being generated for a target. These classes are intended to be abstract +
Classes used to represent the code being + generated for a target. These classes are intended to be abstract enough to represent the machine code for any target machine. These - classes are defined in include/llvm/CodeGen/.

include/llvm/CodeGen/

Classes and algorithms used to represent code as the object file level, + the MC Layer. These classes represent assembly level + constructs like labels, sections, and instructions. At this level, + concepts like "constant pool entries" and "jump tables" don't exist.
Target-independent algorithms used to implement various phases of native code generation (register allocation, scheduling, @@ -165,14 +198,12 @@ depend on the target-description and machine code representation classes, ensuring that it is portable.
-

- -

Required components in the code generator -

+ -

The two pieces of the LLVM code generator are the high-level interface to the code generator and the set of reusable components that can be used to build @@ -200,11 +231,11 @@

The high-level design of the code generator -

+ -

The LLVM target-independent code generator is designed to support efficient and quality code generation for standard register-based microprocessors. @@ -274,11 +305,11 @@

Using TableGen for target description -

+ -

The target description classes require a detailed description of the target architecture. These target descriptions often have a large amount of common @@ -301,13 +332,15 @@

+ -

Target description classes -

+ -

The LLVM target description classes (located in the include/llvm/Target directory) provide an abstract description of @@ -323,14 +356,12 @@ TargetMachine class provides accessors that should be implemented by the target.

- -

The `TargetMachine` class -

+ -

The TargetMachine class provides virtual methods that are used to access the target-specific implementations of the various target description @@ -346,11 +377,11 @@

The `TargetData` class -

+ -

The TargetData class is the only required target description class, and it is the only class that is not extensible (you cannot derived a new @@ -362,11 +393,11 @@

The `TargetLowering` class -

+ -

The TargetLowering class is used by SelectionDAG based instruction selectors primarily to describe how LLVM code should be lowered to @@ -388,11 +419,11 @@

The `TargetRegisterInfo` class -

+ -

The TargetRegisterInfo class is used to describe the register file of the target and any interactions between the registers.

@@ -422,11 +453,11 @@

The `TargetInstrInfo` class -

+ -

The TargetInstrInfo class is used to describe the machine instructions supported by the target. It is essentially an array of @@ -440,11 +471,11 @@

The `TargetFrameInfo` class -

+ -

The TargetFrameInfo class is used to provide information about the stack frame layout of the target. It holds the direction of stack growth, the @@ -456,11 +487,11 @@

The `TargetSubtarget` class -

+ -

The TargetSubtarget class is used to provide information about the specific chip set being targeted. A sub-target informs code generation of @@ -472,11 +503,11 @@ -

The `TargetJITInfo` class -

+ -

The TargetJITInfo class exposes an abstract interface used by the Just-In-Time code generator to perform target-specific activities, such as @@ -486,13 +517,15 @@

+ -

Machine code description classes -

+ -

At the high-level, LLVM code is translated to a machine specific representation formed out of @@ -505,14 +538,12 @@ SSA representation for machine code, as well as a register allocated, non-SSA form.

- -

The `MachineInstr` class -

+ -

Target machine instructions are represented as instances of the MachineInstr class. This class is an extremely abstract way of @@ -553,14 +584,12 @@

Also if the first operand is a def, it is easier to create instructions whose only def is the first operand.

- -

Using the `MachineInstrBuilder.h` functions -

+ -

Machine instructions are created by using the BuildMI functions, located in the include/llvm/CodeGen/MachineInstrBuilder.h file. The @@ -600,18 +629,18 @@ BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);

-MI.addReg(Reg, MachineOperand::Def);
+MI.addReg(Reg, RegState::Define);

Fixed (preassigned) registers -

+ -

One important issue that the code generator needs to be aware of is the presence of fixed registers. In particular, there are often places in the @@ -679,11 +708,26 @@ ret

- Machine code in SSA form +

+ Call-clobbered registers +

+ +

Some machine instructions, like calls, clobber a large number of physical + registers. Rather than adding <def,dead> operands for + all of them, it is possible to use an MO_RegisterMask operand + instead. The register mask operand holds a bit mask of preserved registers, + and everything else is considered to be clobbered by the instruction.

+ +

+ Machine code in SSA form +

+ +

MachineInstr's are initially selected in SSA-form, and are maintained in SSA-form until register allocation happens. For the most part, @@ -696,12 +740,14 @@ ret

+ -

The `MachineBasicBlock` class -

+ -

The MachineBasicBlock class contains a list of machine instructions (MachineInstr instances). It roughly @@ -714,11 +760,11 @@ ret

The `MachineFunction` class -

+ -

The MachineFunction class contains a list of machine basic blocks (MachineBasicBlock instances). It @@ -731,26 +777,258 @@ ret

+ +

+ `MachineInstr Bundles` +

+ +

LLVM code generator can model sequences of instructions as MachineInstr + bundles. A MI bundle can model a VLIW group / pack which contains an + arbitrary number of parallel instructions. It can also be used to model + a sequential list of instructions (potentially with data dependencies) that + cannot be legally separated (e.g. ARM Thumb2 IT blocks).

+ +

Conceptually a MI bundle is a MI with a number of other MIs nested within: +

+ +

+--------------
+|   Bundle   | ---------
+--------------          \
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |
+--------------
+|   Bundle   | --------
+--------------         \
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |           ----------------
+       |           |      MI      |
+       |           ----------------
+       |                   |
+       |                  ...
+       |
+--------------
+|   Bundle   | --------
+--------------         \
+       |
+      ...
+

+ +

MI bundle support does not change the physical representations of + MachineBasicBlock and MachineInstr. All the MIs (including top level and + nested ones) are stored as sequential list of MIs. The "bundled" MIs are + marked with the 'InsideBundle' flag. A top level MI with the special BUNDLE + opcode is used to represent the start of a bundle. It's legal to mix BUNDLE + MIs with indiviual MIs that are not inside bundles nor represent bundles. +

+ +

MachineInstr passes should operate on a MI bundle as a single unit. Member + methods have been taught to correctly handle bundles and MIs inside bundles. + The MachineBasicBlock iterator has been modified to skip over bundled MIs to + enforce the bundle-as-a-single-unit concept. An alternative iterator + instr_iterator has been added to MachineBasicBlock to allow passes to + iterate over all of the MIs in a MachineBasicBlock, including those which + are nested inside bundles. The top level BUNDLE instruction must have the + correct set of register MachineOperand's that represent the cumulative + inputs and outputs of the bundled MIs.

+ +

Packing / bundling of MachineInstr's should be done as part of the register + allocation super-pass. More specifically, the pass which determines what + MIs should be bundled together must be done after code generator exits SSA + form (i.e. after two-address pass, PHI elimination, and copy coalescing). + Bundles should only be finalized (i.e. adding BUNDLE MIs and input and + output register MachineOperands) after virtual registers have been + rewritten into physical registers. This requirement eliminates the need to + add virtual register operands to BUNDLE instructions which would effectively + double the virtual register def and use lists.

+ +

+ -

- Target-independent code generation algorithms +

+ The "MC" Layer +

+ + +

+ +

+The MC Layer is used to represent and process code at the raw machine code +level, devoid of "high level" information like "constant pools", "jump tables", +"global variables" or anything like that. At this level, LLVM handles things +like label names, machine instructions, and sections in the object file. The +code in this layer is used for a number of important purposes: the tail end of +the code generator uses it to write a .s or .o file, and it is also used by the +llvm-mc tool to implement standalone machine code assemblers and disassemblers. +

+ +

+This section describes some of the important classes. There are also a number +of important subsystems that interact at this layer, they are described later +in this manual. +

+ + +

+ The `MCStreamer` API +

+ +

+MCStreamer is best thought of as an assembler API. It is an abstract API which +is implemented in different ways (e.g. to output a .s file, output an +ELF .o file, etc) but whose API correspond directly to what you see in a .s +file. MCStreamer has one method per directive, such as EmitLabel, +EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which +directly correspond to assembly level directives. It also has an +EmitInstruction method, which is used to output an MCInst to the streamer. +

+ +

+This API is most important for two clients: the llvm-mc stand-alone assembler is +effectively a parser that parses a line, then invokes a method on MCStreamer. In +the code generator, the Code Emission phase of the code +generator lowers higher level LLVM IR and Machine* constructs down to the MC +layer, emitting directives through MCStreamer.

+ +

+On the implementation side of MCStreamer, there are two major implementations: +one for writing out a .s file (MCAsmStreamer), and one for writing out a .o +file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation +that prints out a directive for each method (e.g. EmitValue -> .byte), but +MCObjectStreamer implements a full assembler. +

+ + +

+ The `MCContext` class +

+ +

+The MCContext class is the owner of a variety of uniqued data structures at the +MC layer, including symbols, sections, etc. As such, this is the class that you +interact with to create symbols and sections. This class can not be subclassed. +

+ +

+ + +

+ The `MCSymbol` class +

+ +

+The MCSymbol class represents a symbol (aka label) in the assembly file. There +are two interesting kinds of symbols: assembler temporary symbols, and normal +symbols. Assembler temporary symbols are used and processed by the assembler +but are discarded when the object file is produced. The distinction is usually +represented by adding a prefix to the label, for example "L" labels are +assembler temporary labels in MachO. +

+ +

MCSymbols are created by MCContext and uniqued there. This means that +MCSymbols can be compared for pointer equivalence to find out if they are the +same symbol. Note that pointer inequality does not guarantee the labels will +end up at different addresses though. It's perfectly legal to output something +like this to the .s file:

+ +

+  foo:
+  bar:
+    .byte 4
+

+ +

In this case, both the foo and bar symbols will have the same address.

+ +

+ + +

+ The `MCSection` class +

+ +

+The MCSection class represents an object-file specific section. It is subclassed +by object file specific implementations (e.g. MCSectionMachO, +MCSectionCOFF, MCSectionELF) and these are created and uniqued +by MCContext. The MCStreamer has a notion of the current section, which can be +changed with the SwitchToSection method (which corresponds to a ".section" +directive in a .s file). +

+ +

+ + +

+ The `MCInst` class +

+ +

+The MCInst class is a target-independent representation of an instruction. It +is a simple class (much more so than MachineInstr) +that holds a target-specific opcode and a vector of MCOperands. MCOperand, in +turn, is a simple discriminated union of three cases: 1) a simple immediate, +2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an +MCExpr. +

+ +

MCInst is the common currency used to represent machine instructions at the +MC layer. It is the type used by the instruction encoder, the instruction +printer, and the type generated by the assembly parser and disassembler. +

+ +

+ + +

+ Target-independent code generation algorithms +

This section documents the phases described in the high-level design of the code generator. It explains how they work and some of the rationale behind their design.

- -

Instruction Selection -

+ -

Instruction Selection is the process of translating LLVM code presented to the code generator into target-specific machine instructions. There are @@ -762,14 +1040,12 @@ ret selector to be generated from these .td files, though currently there are still things that require custom C++ code.

- -

Introduction to SelectionDAGs -

+ -

The SelectionDAG provides an abstraction for code representation in a way that is amenable to instruction selection using automatic techniques @@ -827,11 +1103,11 @@ ret

SelectionDAG Instruction Selection Process -

+ -

SelectionDAG-based instruction selection consists of the following steps:

@@ -856,9 +1132,9 @@ ret SelectionDAG optimizer is run to clean up redundancies exposed by type legalization. -

Legalize SelectionDAG Types — - This stage transforms SelectionDAG nodes to eliminate any types that are - unsupported on the target.

Legalize SelectionDAG Ops — + This stage transforms SelectionDAG nodes to eliminate any operations + that are unsupported on the target.

Optimize SelectionDAG — The SelectionDAG optimizer is run to eliminate inefficiencies introduced by @@ -908,11 +1184,11 @@ ret

Initial SelectionDAG Construction -

+ -

The initial SelectionDAG is naïvely peephole expanded from the LLVM input by the SelectionDAGLowering class in the @@ -928,11 +1204,11 @@ ret

SelectionDAG LegalizeTypes Phase -

+ -

The Legalize phase is in charge of converting a DAG to only use the types that are natively supported by the target.

@@ -961,11 +1237,11 @@ ret

SelectionDAG Legalize Phase -

+ -

The Legalize phase is in charge of converting a DAG to only use the operations that are natively supported by the target.

@@ -993,12 +1269,13 @@ ret

- SelectionDAG Optimization Phase: the DAG - Combiner -

+ + SelectionDAG Optimization Phase: the DAG Combiner + +

The SelectionDAG optimization phase is run multiple times for code generation, immediately after the DAG is built and once after each @@ -1028,11 +1305,11 @@ ret

SelectionDAG Select Phase -

+ -

The Select phase is the bulk of the target-specific code for instruction selection. This phase takes a legal SelectionDAG as input, pattern matches @@ -1041,9 +1318,9 @@ ret

-%t1 = add float %W, %X
-%t2 = mul float %t1, %Y
-%t3 = add float %t2, %Z
+%t1 = fadd float %W, %X
+%t2 = fmul float %t1, %Y
+%t3 = fadd float %t2, %Z

@@ -1089,8 +1366,8 @@ def FADDS : AForm_2<59, 21,

The portion of the instruction definition in bold indicates the pattern used to match the instruction. The DAG operators (like fmul/fadd) are defined in - the lib/Target/TargetSelectionDAG.td file. "F4RC" is the - register class of the input and result values.

+ the include/llvm/Target/TargetSelectionDAG.td file. " + F4RC" is the register class of the input and result values.

The TableGen DAG instruction selector generator reads the instruction patterns in the .td file and automatically builds parts of the @@ -1189,11 +1466,11 @@ def : Pat<(i32 imm:$imm),

SelectionDAG Scheduling and Formation Phase -

+ -

The scheduling phase takes the DAG of target instructions from the selection phase and assigns an order. The scheduler can pick an order depending on @@ -1210,11 +1487,11 @@ def : Pat<(i32 imm:$imm),

Future directions for the SelectionDAG -

+ -

Optional function-at-a-time selection.

+ -

SSA-based Machine Code Optimizations -

To Be Written

+ +

To Be Written

Live Intervals -

+ -

Live Intervals are the ranges (intervals) where a variable is live. They are used by some register allocator passes to @@ -1243,14 +1522,12 @@ def : Pat<(i32 imm:$imm), register are live at the same point in the program (i.e., they conflict). When this situation occurs, one virtual register must be spilled.

- -

Live Variable Analysis -

+ -

The first step in determining the live intervals of variables is to calculate the set of registers that are immediately dead after the instruction (i.e., @@ -1292,11 +1569,11 @@ def : Pat<(i32 imm:$imm),

Live Intervals Analysis -

+ -

We now have the information available to perform the live intervals analysis and build the live intervals themselves. We start off by numbering the basic @@ -1311,12 +1588,14 @@ def : Pat<(i32 imm:$imm),

+ -

Register Allocation -

+ -

The Register Allocation problem consists in mapping a program P_v, that can use an unbounded number of virtual registers, @@ -1326,23 +1605,21 @@ def : Pat<(i32 imm:$imm), accommodate all the virtual registers, some of them will have to be mapped into memory. These virtuals are called spilled virtuals.

- -

How registers are represented in LLVM -

+ -

In LLVM, physical registers are denoted by integer numbers that normally range from 1 to 1023. To see how this numbering is defined for a particular architecture, you can read the GenRegisterNames.inc file for that architecture. For instance, by - inspecting lib/Target/X86/X86GenRegisterNames.inc we see that the - 32-bit register EAX is denoted by 15, and the MMX register - MM0 is mapped to 48.

+ inspecting lib/Target/X86/X86GenRegisterInfo.inc we see that the + 32-bit register EAX is denoted by 43, and the MMX register + MM0 is mapped to 65.

Some architectures contain registers that share the same physical location. A notable example is the X86 platform. For instance, in the X86 architecture, @@ -1350,7 +1627,7 @@ def : Pat<(i32 imm:$imm), bits. These physical registers are marked as aliased in LLVM. Given a particular architecture, you can check which registers are aliased by inspecting its RegisterInfo.td file. Moreover, the method - TargetRegisterInfo::getAliasSet(p_reg) returns an array containing + MCRegisterInfo::getAliasSet(p_reg) returns an array containing all the physical registers aliased to the register p_reg.

Physical registers, in LLVM, are grouped in Register Classes. @@ -1380,23 +1657,30 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, for RegisterClass, the last parameter of which is a list of registers. Just commenting some out is one simple way to avoid them being used. A more polite way is to explicitly exclude some registers from - the allocation order. See the definition of the GR register - class in lib/Target/IA64/IA64RegisterInfo.td for an example of this - (e.g., numReservedRegs registers are hidden.)

+ the allocation order. See the definition of the GR8 register + class in lib/Target/X86/X86RegisterInfo.td for an example of this. +

Virtual registers are also denoted by integer numbers. Contrary to physical - registers, different virtual registers never share the same number. The - smallest virtual register is normally assigned the number 1024. This may - change, so, in order to know which is the first virtual register, you should - access TargetRegisterInfo::FirstVirtualRegister. Any register whose - number is greater than or equal - to TargetRegisterInfo::FirstVirtualRegister is considered a virtual - register. Whereas physical registers are statically defined in - a TargetRegisterInfo.td file and cannot be created by the - application developer, that is not the case with virtual registers. In order - to create new virtual registers, use the + registers, different virtual registers never share the same number. Whereas + physical registers are statically defined in a TargetRegisterInfo.td + file and cannot be created by the application developer, that is not the case + with virtual registers. In order to create new virtual registers, use the method MachineRegisterInfo::createVirtualRegister(). This method - will return a virtual register with the highest code.

+ will return a new virtual register. Use an IndexedMap<Foo, + VirtReg2IndexFunctor> to hold information per virtual register. If you + need to enumerate all virtual registers, use the function + TargetRegisterInfo::index2VirtReg() to find the virtual register + numbers:

+ +

+  for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
+    unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
+    stuff(VirtReg);
+  }
+

Before register allocation, the operands of an instruction are mostly virtual registers, although physical registers may also be used. In order to check if @@ -1429,18 +1713,18 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, instruction, use TargetInstrInfo::get(opcode)::ImplicitUses. Pre-colored registers impose constraints on any register allocation algorithm. The - register allocator must make sure that none of them is been overwritten by + register allocator must make sure that none of them are overwritten by the values of virtual registers while still alive.

Mapping virtual registers to physical registers -

+ -

There are two ways to map virtual registers to physical registers (or to memory slots). The first way, that we will call direct mapping, is @@ -1456,8 +1740,8 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, order to get and store values in memory. To assign a physical register to a virtual register present in a given operand, use MachineOperand::setReg(p_reg). To insert a store instruction, - use TargetRegisterInfo::storeRegToStackSlot(...), and to insert a - load instruction, use TargetRegisterInfo::loadRegFromStackSlot.

+ use TargetInstrInfo::storeRegToStackSlot(...), and to insert a + load instruction, use TargetInstrInfo::loadRegFromStackSlot.

The indirect mapping shields the application developer from the complexities of inserting load and store instructions. In order to map a virtual register @@ -1486,11 +1770,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

Handling two address instructions -

+ -

With very rare exceptions (e.g., function calls), the LLVM machine code instructions are three address instructions. That is, each instruction is @@ -1522,11 +1806,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

The SSA deconstruction phase -

+ -

An important transformation that happens during register allocation is called the SSA Deconstruction Phase. The SSA form simplifies many analyses @@ -1546,11 +1830,11 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

Instruction folding -

+ -

Instruction folding is an optimization performed during register allocation that removes unnecessary copy instructions. For instance, a @@ -1583,32 +1867,38 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, -

Built in register allocators -

+ -

The LLVM infrastructure provides the application developer with three different register allocators:

Simple — This is a very simple implementation that does not - keep values in registers across instructions. This register allocator - immediately spills every value right after it is computed, and reloads all - used operands from memory to temporary registers before each - instruction.
Local — This register allocator is an improvement on the - Simple implementation. It allocates registers on a basic block - level, attempting to keep values in registers and reusing registers as - appropriate.
Linear Scan — The default allocator. This is the - well-know linear scan register allocator. Whereas the - Simple and Local algorithms use a direct mapping - implementation technique, the Linear Scan implementation - uses a spiller in order to place load and stores.
Fast — This register allocator is the default for debug + builds. It allocates registers on a basic block level, attempting to keep + values in registers and reusing registers as appropriate.
Basic — This is an incremental approach to register + allocation. Live ranges are assigned to registers one at a time in + an order that is driven by heuristics. Since code can be rewritten + on-the-fly during allocation, this framework allows interesting + allocators to be developed as extensions. It is not itself a + production register allocator but is a potentially useful + stand-alone mode for triaging bugs and as a performance baseline. + +
Greedy — The default allocator. This is a + highly tuned implementation of the Basic allocator that + incorporates global live range splitting. This allocator works hard + to minimize the cost of spill code. + +
PBQP — A Partitioned Boolean Quadratic Programming (PBQP) + based register allocator. This allocator works by constructing a PBQP + problem representing the register allocation problem under consideration, + solving this using a PBQP solver, and mapping the solution back to a + register assignment.

The type of register allocator used in llc can be chosen with the @@ -1616,69 +1906,733 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf,

-$ llc -f -regalloc=simple file.bc -o sp.s;
-$ llc -f -regalloc=local file.bc -o lc.s;
-$ llc -f -regalloc=linearscan file.bc -o ln.s;
+$ llc -regalloc=linearscan file.bc -o ln.s;
+$ llc -regalloc=fast file.bc -o fa.s;
+$ llc -regalloc=pbqp file.bc -o pbqp.s;

+ -

Prolog/Epilog Code Insertion +

+ +

+ + +

+ Compact Unwind +

+ +

Throwing an exception requires unwinding out of a function. The + information on how to unwind a given function is traditionally expressed in + DWARF unwind (a.k.a. frame) info. But that format was originally developed + for debuggers to backtrace, and each Frame Description Entry (FDE) requires + ~20-30 bytes per function. There is also the cost of mapping from an address + in a function to the corresponding FDE at runtime. An alternative unwind + encoding is called compact unwind and requires just 4-bytes per + function.

+ +

The compact unwind encoding is a 32-bit value, which is encoded in an + architecture-specific way. It specifies which registers to restore and from + where, and how to unwind out of the function. When the linker creates a final + linked image, it will create a __TEXT,__unwind_info + section. This section is a small and fast way for the runtime to access + unwind info for any given function. If we emit compact unwind info for the + function, that compact unwind info will be encoded in + the __TEXT,__unwind_info section. If we emit DWARF unwind info, + the __TEXT,__unwind_info section will contain the offset of the + FDE in the __TEXT,__eh_frame section in the final linked + image.

+ +

For X86, there are three modes for the compact unwind encoding:

+ +

Function with a Frame Pointer (EBP or RBP)

EBP/RBP-based frame, where EBP/RBP is pushed + onto the stack immediately after the return address, + then ESP/RSP is moved to EBP/RBP. Thus to + unwind, ESP/RSP is restored with the + current EBP/RBP value, then EBP/RBP is restored + by popping the stack, and the return is done by popping the stack once + more into the PC. All non-volatile registers that need to be restored must + have been saved in a small range on the stack that + starts EBP-4 to EBP-1020 (RBP-8 + to RBP-1020). The offset (divided by 4 in 32-bit mode and 8 + in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000). + The registers saved are encoded in bits 0-14 + (mask: 0x00007FFF) as five 3-bit entries from the following + table:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Compact Number	i386 Register	x86-64 Regiser
1	`EBX`	`RBX`
2	`ECX`	`R12`
3	`EDX`	`R13`
4	`EDI`	`R14`
5	`ESI`	`R15`
6	`EBP`	`RBP`

+ +

Frameless with a Small Constant Stack Size (EBP + or RBP is not used as a frame pointer)

To return, a constant (encoded in the compact unwind encoding) is added + to the ESP/RSP. Then the return is done by popping the stack + into the PC. All non-volatile registers that need to be restored must have + been saved on the stack immediately after the return address. The stack + size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits + 16-23 (mask: 0x00FF0000). There is a maximum stack size of + 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers + saved is encoded in bits 9-12 (mask: 0x00001C00). Bits 0-9 + (mask: 0x000003FF) contain which registers were saved and + their order. (See + the encodeCompactUnwindRegistersWithoutFrame() function + in lib/Target/X86FrameLowering.cpp for the encoding + algorithm.)

Frameless with a Large Constant Stack Size (EBP + or RBP is not used as a frame pointer)

This case is like the "Frameless with a Small Constant Stack Size" + case, but the stack size is too large to encode in the compact unwind + encoding. Instead it requires that the function contains "subl + $nnnnnn, %esp" in its prolog. The compact encoding contains the + offset to the $nnnnnn value in the function in bits 9-12 + (mask: 0x00001C00).

+ +

To Be Written

+ -

Late Machine Code Optimizations -

To Be Written

+ +

To Be Written

+ -

Code Emission +

+ +

The code emission step of code generation is responsible for lowering from +the code generator abstractions (like MachineFunction, MachineInstr, etc) down +to the abstractions used by the MC layer (MCInst, +MCStreamer, etc). This is +done with a combination of several different classes: the (misnamed) +target-independent AsmPrinter class, target-specific subclasses of AsmPrinter +(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.

+ +

Since the MC layer works at the level of abstraction of object files, it +doesn't have a notion of functions, global variables etc. Instead, it thinks +about labels, directives, and instructions. A key class used at this time is +the MCStreamer class. This is an abstract API that is implemented in different +ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively +an "assembler API". MCStreamer has one method per directive, such as EmitLabel, +EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly +level directives. +

+ +

If you are interested in implementing a code generator for a target, there +are three important things that you have to implement for your target:

+ +

First, you need a subclass of AsmPrinter for your target. This class +implements the general lowering process converting MachineFunction's into MC +label constructs. The AsmPrinter base class provides a number of useful methods +and routines, and also allows you to override the lowering process in some +important ways. You should get much of the lowering for free if you are +implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile +class implements much of the common logic.
Second, you need to implement an instruction printer for your target. The +instruction printer takes an MCInst and renders it to a +raw_ostream as text. Most of this is automatically generated from the .td file +(when you specify something like "add $dst, $src1, $src2" in the +instructions), but you need to implement routines to print operands.
Third, you need to implement code that lowers a MachineInstr to an MCInst, usually implemented in +"<target>MCInstLower.cpp". This lowering process is often target +specific, and is responsible for turning jump table entries, constant pool +indices, global variable addresses, etc into MCLabels as appropriate. This +translation layer is also responsible for expanding pseudo ops used by the code +generator into the actual machine instructions they correspond to. The MCInsts +that are generated by this are fed into the instruction printer or the encoder. +

+ +

Finally, at your choosing, you can also implement an subclass of +MCCodeEmitter which lowers MCInst's into machine code bytes and relocations. +This is important if you want to support direct .o file emission, or would like +to implement an assembler for your target.

+ +

+ + +

+ VLIW Packetizer +

+ +

In a Very Long Instruction Word (VLIW) architecture, the compiler is + responsible for mapping instructions to functional-units available on + the architecture. To that end, the compiler creates groups of instructions + called packets or bundles. The VLIW packetizer in LLVM is + a target-independent mechanism to enable the packetization of machine + instructions.

+ + + +

+ Mapping from instructions to functional units +

+ +

Instructions in a VLIW target can typically be mapped to multiple functional +units. During the process of packetizing, the compiler must be able to reason +about whether an instruction can be added to a packet. This decision can be +complex since the compiler has to examine all possible mappings of instructions +to functional units. Therefore to alleviate compilation-time complexity, the +VLIW packetizer parses the instruction classes of a target and generates tables +at compiler build time. These tables can then be queried by the provided +machine-independent API to determine if an instruction can be accommodated in a +packet.

+ + +

+ + How the packetization tables are generated and used + +

+ +

The packetizer reads instruction classes from a target's itineraries and +creates a deterministic finite automaton (DFA) to represent the state of a +packet. A DFA consists of three major elements: inputs, states, and +transitions. The set of inputs for the generated DFA represents the instruction +being added to a packet. The states represent the possible consumption +of functional units by instructions in a packet. In the DFA, transitions from +one state to another occur on the addition of an instruction to an existing +packet. If there is a legal mapping of functional units to instructions, then +the DFA contains a corresponding transition. The absence of a transition +indicates that a legal mapping does not exist and that the instruction cannot +be added to the packet.

+ +

To generate tables for a VLIW target, add TargetGenDFAPacketizer.inc +as a target to the Makefile in the target directory. The exported API provides +three functions: DFAPacketizer::clearResources(), +DFAPacketizer::reserveResources(MachineInstr *MI), and +DFAPacketizer::canReserveResources(MachineInstr *MI). These functions +allow a target packetizer to add an instruction to an existing packet and to +check whether an instruction can be added to a packet. See +llvm/CodeGen/DFAPacketizer.h for more information.

+ +

To Be Written

+ +

+ + +

+ Implementing a Native Assembler +

+ + +

+ +

Though you're probably reading this because you want to write or maintain a +compiler backend, LLVM also fully supports building a native assemblers too. +We've tried hard to automate the generation of the assembler from the .td files +(in particular the instruction syntax and encodings), which means that a large +part of the manual and repetitive data entry can be factored and shared with the +compiler.

+ + +

Instruction Parsing

+ +

To Be Written

+ + + +

+ Instruction Alias Processing +

+ +

Once the instruction is parsed, it enters the MatchInstructionImpl function. +The MatchInstructionImpl function performs alias processing and then does +actual matching.

+ +

Alias processing is the phase that canonicalizes different lexical forms of +the same instructions down to one representation. There are several different +kinds of alias that are possible to implement and they are listed below in the +order that they are processed (which is in order from simplest/weakest to most +complex/powerful). Generally you want to use the first alias mechanism that +meets the needs of your instruction, because it will allow a more concise +description.

+ -

- Generating Assembly Code +

Mnemonic Aliases

+ +

The first phase of alias processing is simple instruction mnemonic +remapping for classes of instructions which are allowed with two different +mnemonics. This phase is a simple and unconditionally remapping from one input +mnemonic to one output mnemonic. It isn't possible for this form of alias to +look at the operands at all, so the remapping must apply for all forms of a +given mnemonic. Mnemonic aliases are defined simply, for example X86 has: +

+ +

+def : MnemonicAlias<"cbw",     "cbtw">;
+def : MnemonicAlias<"smovq",   "movsq">;
+def : MnemonicAlias<"fldcww",  "fldcw">;
+def : MnemonicAlias<"fucompi", "fucomip">;
+def : MnemonicAlias<"ud2a",    "ud2">;
+

To Be Written

+ +

... and many others. With a MnemonicAlias definition, the mnemonic is +remapped simply and directly. Though MnemonicAlias's can't look at any aspect +of the instruction (such as the operands) they can depend on global modes (the +same ones supported by the matcher), through a Requires clause:

+ +

+def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
+def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
+

+ +

In this example, the mnemonic gets mapped into different a new one depending +on the current instruction set.

+ +

+ -

- Generating Binary Machine Code +

Instruction Aliases

+ +

The most general phase of alias processing occurs while matching is +happening: it provides new forms for the matcher to match along with a specific +instruction to generate. An instruction alias has two parts: the string to +match and the instruction to generate. For example: +

+ +

+def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8  :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX32rr8  GR32:$dst, GR8  :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr8  GR64:$dst, GR8  :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
+

+ +

This shows a powerful example of the instruction aliases, matching the +same mnemonic in multiple different ways depending on what operands are present +in the assembly. The result of instruction aliases can include operands in a +different order than the destination instruction, and can use an input +multiple times, for example:

+ +

+def : InstAlias<"clrb $reg", (XOR8rr  GR8 :$reg, GR8 :$reg)>;
+def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
+def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
+def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
+

+ +

This example also shows that tied operands are only listed once. In the X86 +backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied +to the output). InstAliases take a flattened operand list without duplicates +for tied operands. The result of an instruction alias can also use immediates +and fixed physical registers which are added as simple immediate operands in the +result, for example:

+ +

+// Fixed Immediate operand.
+def : InstAlias<"aad", (AAD8i8 10)>;
+
+// Fixed register operand.
+def : InstAlias<"fcomi", (COM_FIr ST1)>;
+
+// Simple alias.
+def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
+

+ + +

Instruction aliases can also have a Requires clause to make them +subtarget specific.

+ +

If the back-end supports it, the instruction printer can automatically emit + the alias rather than what's being aliased. It typically leads to better, + more readable code. If it's better to print out what's being aliased, then + pass a '0' as the third parameter to the InstAlias definition.

For the JIT or .o file writer

+ +

Instruction Matching

+ +

To Be Written

+ +

Target-specific Implementation Notes -

+ -

This section of the document explains features or design decisions that are - specific to the code generator for a particular target.

+ specific to the code generator for a particular target. First we start + with a table that summarizes what features are supported by each target.

+ + +

+ Target Feature Matrix +

+ +

Note that this table does not include the C backend or Cpp backends, since +they do not use the target independent code generator infrastructure. It also +doesn't list features that are not supported fully by any target yet. It +considers a feature to be supported if at least one subtarget supports it. A +feature being supported means that it is useful and works for most cases, it +does not indicate that there are zero known bugs in the implementation. Here +is the key:

+ + + + + + + + + + + + + + + +

Unknown	No support	Partial Support	Complete Support

+ +

Here is the table:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Feature	ARM	X86
	Target
is generally reliable
assembly parser
disassembler
inline asm
jit	*
.o file writing
tail calls
segmented stacks		*

+ + +

Is Generally Reliable

+ +

This box indicates whether the target is considered to be production quality. +This indicates that the target has been used as a static compiler to +compile large amounts of code by a variety of different people and is in +continuous use.

+ + +

Assembly Parser

+ +

This box indicates whether the target supports parsing target specific .s +files by implementing the MCAsmParser interface. This is required for llvm-mc +to be able to act as a native assembler and is required for inline assembly +support in the native .o file writer.

+ +

+ + + +

Disassembler

+ +

This box indicates whether the target supports the MCDisassembler API for +disassembling machine opcode bytes into MCInst's.

+ +

+ + +

Inline Asm

+ +

This box indicates whether the target supports most popular inline assembly +constraints and modifiers.

+ +

+ + +

JIT Support

+ +

This box indicates whether the target supports the JIT compiler through +the ExecutionEngine interface.

+ +

The ARM backend has basic support for integer code +in ARM codegen mode, but lacks NEON and full Thumb support.

+ +

+ + +

.o File Writing

+ +

This box indicates whether the target supports writing .o files (e.g. MachO, +ELF, and/or COFF) files directly from the target. Note that the target also +must include an assembly parser and general inline assembly support for full +inline assembly support in the .o writer.

+ +

Targets that don't support this feature can obviously still write out .o +files, they just rely on having an external assembler to translate from a .s +file to a .o file (as is the case for many C compilers).

+ +

+ + +

Tail Calls

+ +

This box indicates whether the target supports guaranteed tail calls. These +are calls marked "tail" and use the fastcc +calling convention. Please see the tail call section +more more details.

+ +

+ + +

Segmented Stacks

+ +

This box indicates whether the target supports segmented stacks. This +replaces the traditional large C stack with many linked segments. It +is compatible with the gcc +implementation used by the Go front end.

+ +

Basic support exists on the X86 backend. Currently +vararg doesn't work and the object files are not marked the way the gold +linker expects, but simple Go programs can be built by dragonegg.

+ +

Tail call optimization -

+ -

Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:

Caller and callee have the calling convention fastcc.
Caller and callee have the calling convention fastcc or + cc 10 (GHC call convention).
The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).

On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a - function pointer). So there is one less register for integer argument - passing. For x86 this means 2 registers (if inreg parameter - attribute is used) and for x86-64 this means 5 register are used.

+ +

+ Sibling call optimization +

+ +

Sibling call optimization is a restricted form of tail call optimization. + Unlike tail call optimization described in the previous section, it can be + performed automatically on any tail calls when -tailcallopt option + is not specified.

+ +

Sibling call optimization is currently performed on x86/x86-64 when the + following constraints are met:

+ +

Caller and callee have the same calling convention. It can be either + c or fastcc. + +
The call is a tail call - in tail position (ret immediately follows call + and ret uses value of call or is void).
Caller and callee have matching return type or the callee result is not + used. + +
If any of the callee arguments are being passed in stack, they must be + available in caller's own incoming argument stack and the frame offsets + must be the same. +

+ +

Example:

+declare i32 @bar(i32, i32)
+
+define i32 @foo(i32 %a, i32 %b, i32 %c) {
+entry:
+  %0 = tail call i32 @bar(i32 %a, i32 %b)
+  ret i32 %0
+}
+

The X86 backend -

+ -

The X86 code generator lives in the lib/Target/X86 directory. This code generator is capable of targeting a variety of x86-32 and x86-64 processors, and includes support for ISA extensions such as MMX and SSE.

- -

X86 Target Triples supported -

+ -

The following are the known target triples that are supported by the X86 backend. This is not an exhaustive list, and it would be useful to add those @@ -1773,36 +2764,41 @@ define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {

i386-pc-mingw32msvc — MingW crosscompiler on Linux

i686-apple-darwin* — Apple Darwin on X86

+ +

x86_64-unknown-linux-gnu — Linux

X86 Calling Conventions supported -

+ -

The following target-specific calling conventions are known to backend:

x86_StdCall — stdcall calling convention seen on Microsoft - Windows platform (CC ID = 64).
x86_FastCall — fastcall calling convention seen on Microsoft - Windows platform (CC ID = 65).
x86_StdCall — stdcall calling convention seen on Microsoft + Windows platform (CC ID = 64).
x86_FastCall — fastcall calling convention seen on Microsoft + Windows platform (CC ID = 65).
x86_ThisCall — Similar to X86_StdCall. Passes first argument + in ECX, others via stack. Callee is responsible for stack cleaning. This + convention is used by MSVC by default for methods in its ABI + (CC ID = 70).

Representing X86 addressing modes in MachineInstrs -

+ -

The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following expression directly in integer @@ -1810,35 +2806,38 @@ define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {

-Base + [1,2,4,8] * IndexReg + Disp32
+SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32

In order to represent this, LLVM tracks no less than 4 operands for each +

In order to represent this, LLVM tracks no less than 5 operands for each memory operand of this form. This means that the "load" form of 'mov' has the following MachineOperands in this order:

-Index:        0     |    1        2       3           4
-Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement
-OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm
+Index:        0     |    1        2       3           4          5
+Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
+OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg

Stores, and all other instructions, treat the four memory operands in the - same way and in the same order.

+ same way and in the same order. If the segment register is unspecified + (regno = 0), then no segment override is generated. "Lea" operations do not + have a segment register specified, so they only have 4 operands for their + memory reference.

X86 address spaces supported -

+ -

x86 has an experimental feature which provides +

x86 has a feature which provides the ability to perform loads and stores to different address spaces via the x86 segment registers. A segment override prefix byte on an instruction causes the instruction's memory access to go to the specified @@ -1877,11 +2876,11 @@ OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm

Instruction naming -

+ -

An instruction name consists of the base name, a default operand size, and a a character per operand with an optional special size. For example:

@@ -1897,25 +2896,25 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

+ -

The PowerPC backend -

+ -

The PowerPC code generator lives in the lib/Target/PowerPC directory. The code generation is retargetable to several variations or subtargets of the PowerPC ISA; including ppc32, ppc64 and altivec.

- -

LLVM PowerPC ABI -

+ -

LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative (PIC) or static addressing for accessing global values, so no TOC @@ -1931,11 +2930,11 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

Frame Layout -

+ -

The size of a PowerPC frame is usually fixed for the duration of a function's invocation. Since the frame is fixed size, all references @@ -2078,11 +3077,11 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

Prolog/Epilog -

+ -

The llvm prolog and epilog are the same as described in the PowerPC ABI, with the following exceptions. Callee saved registers are spilled after the frame @@ -2095,16 +3094,83 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory

Dynamic Allocation -

+ -

TODO - More to come.

+ + +

+ The PTX backend +

+ +

The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.

+ +

The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.

+ +

Code Generator Options:

+ + + + + + + + + + + + + + + + + +

Option	Description
`double`	If enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic
`no-fma`	Disable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices
`smxy / computexy`	Set shader model/compute capability to x.y, + e.g. sm20 or compute13

+ +

Working:

Arithmetic instruction selection (including combo FMA)
Bitwise instruction selection
Control-flow instruction selection
Function calls (only on SM 2.0+ and no return arguments)
Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = + shared)
Thread synchronization (bar.sync)
Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)

+ +

In Progress:

Robust call instruction selection
Stack frame allocation
Device-specific instruction scheduling optimizations

+ + +

+ +

@@ -2115,7 +3181,7 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"> Chris Lattner
- The LLVM Compiler Infrastructure
+ The LLVM Compiler Infrastructure
Last modified: $Date$

The LLVM Target-Independent Code Generator -

Introduction -

Required components in the code generator -

The high-level design of the code generator -

Using TableGen for target description -

Target description classes -

The TargetMachine class -

The TargetData class -

The TargetLowering class -

The TargetRegisterInfo class -

The TargetInstrInfo class -

The TargetFrameInfo class -

The TargetSubtarget class -

The TargetJITInfo class -

Machine code description classes -

The MachineInstr class -

Using the MachineInstrBuilder.h functions -

Fixed (preassigned) registers -

+ Call-clobbered registers +

+ Machine code in SSA form +

The MachineBasicBlock class -

The MachineFunction class -

+ MachineInstr Bundles +

+ The "MC" Layer +

+ The MCStreamer API +

+ The MCContext class +

+ The MCSymbol class +

+ The MCSection class +

+ The MCInst class +

+ Target-independent code generation algorithms +

Instruction Selection -

Introduction to SelectionDAGs -

SelectionDAG Instruction Selection Process -

Initial SelectionDAG Construction -

SelectionDAG LegalizeTypes Phase -

SelectionDAG Legalize Phase -

+ + SelectionDAG Optimization Phase: the DAG Combiner + +

SelectionDAG Select Phase -

SelectionDAG Scheduling and Formation Phase -

Future directions for the SelectionDAG -

SSA-based Machine Code Optimizations -

Live Intervals -

Live Variable Analysis -

Live Intervals Analysis -

Register Allocation -

How registers are represented in LLVM -

Mapping virtual registers to physical registers -

Handling two address instructions -

The SSA deconstruction phase -

Instruction folding -

Built in register allocators -

Prolog/Epilog Code Insertion +

+ Compact Unwind +

Late Machine Code Optimizations -

Code Emission +

+ VLIW Packetizer +

+ Mapping from instructions to functional units +

+ + How the packetization tables are generated and used + +

+ Implementing a Native Assembler +

Instruction Parsing

+ Instruction Alias Processing +

Mnemonic Aliases

Instruction Aliases

Instruction Matching

Target-specific Implementation Notes -

+ Target Feature Matrix +

Is Generally Reliable

Assembly Parser

Disassembler

Inline Asm

JIT Support

.o File Writing

Tail Calls

Segmented Stacks

Tail call optimization -

+ Sibling call optimization +

The X86 backend -

X86 Target Triples supported -

X86 Calling Conventions supported -

Representing X86 addressing modes in MachineInstrs -

The `TargetMachine` class -

The `TargetData` class -

The `TargetLowering` class -

The `TargetRegisterInfo` class -

The `TargetInstrInfo` class -

The `TargetFrameInfo` class -

The `TargetSubtarget` class -

The `TargetJITInfo` class -

The `MachineInstr` class -

Using the `MachineInstrBuilder.h` functions -

The `MachineBasicBlock` class -

The `MachineFunction` class -

+ `MachineInstr Bundles` +

+ The `MCStreamer` API +

+ The `MCContext` class +

+ The `MCSymbol` class +

+ The `MCSection` class +

+ The `MCInst` class +