<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>The LLVM Target-Independent Code Generator</title>
<link rel="stylesheet" href="llvm.css" type="text/css">
+
+ <style type="text/css">
+ .unknown { background-color: #C0C0C0; text-align: center; }
+ .unknown:before { content: "?" }
+ .no { background-color: #C11B17 }
+ .no:before { content: "N" }
+ .partial { background-color: #F88017 }
+ .yes { background-color: #0F0; }
+ .yes:before { content: "Y" }
+ </style>
+
</head>
<body>
-<div class="doc_title">
+<h1>
The LLVM Target-Independent Code Generator
-</div>
+</h1>
<ol>
<li><a href="#introduction">Introduction</a>
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
+ <li><a href="#targetfeatures">Target Feature Matrix</a></li>
<li><a href="#tailcallopt">Tail call optimization</a></li>
<li><a href="#sibcallopt">Sibling call optimization</a></li>
<li><a href="#x86">The X86 backend</a></li>
</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="introduction">Introduction</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>The LLVM target-independent code generator is a framework that provides a
suite of reusable components for translating the LLVM internal representation
depend on the target-description and machine code representation classes,
ensuring that it is portable.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="required">Required components in the code generator</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The two pieces of the LLVM code generator are the high-level interface to the
code generator and the set of reusable components that can be used to build
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="high-level-design">The high-level design of the code generator</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The LLVM target-independent code generator is designed to support efficient
and quality code generation for standard register-based microprocessors.
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="tablegen">Using TableGen for target description</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The target description classes require a detailed description of the target
architecture. These target descriptions often have a large amount of common
</div>
+</div>
+
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="targetdesc">Target description classes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>The LLVM target description classes (located in the
<tt>include/llvm/Target</tt> directory) provide an abstract description of
<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
that should be implemented by the target.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
access the target-specific implementations of the various target description
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetdata">The <tt>TargetData</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetData</tt> class is the only required target description class,
and it is the only class that is not extensible (you cannot derived a new
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
selectors primarily to describe how LLVM code should be lowered to
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
of the target and any interactions between the registers.</p>
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
instructions supported by the target. It is essentially an array of
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
stack frame layout of the target. It holds the direction of stack growth, the
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
specific chip set being targeted. A sub-target informs code generation of
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
Just-In-Time code generator to perform target-specific activities, such as
</div>
+</div>
+
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="codegendesc">Machine code description classes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>At the high-level, LLVM code is translated to a machine specific
representation formed out of
SSA representation for machine code, as well as a register allocated, non-SSA
form.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Target machine instructions are represented as instances of the
<tt>MachineInstr</tt> class. This class is an extremely abstract way of
<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
instructions</a> whose only def is the first operand.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="fixedregs">Fixed (preassigned) registers</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>One important issue that the code generator needs to be aware of is the
presence of fixed registers. In particular, there are often places in the
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ssa">Machine code in SSA form</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
maintained in SSA-form until register allocation happens. For the most part,
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="mc">The "MC" Layer</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>
The MC Layer is used to represent and process code at the raw machine code
like label names, machine instructions, and sections in the object file. The
code in this layer is used for a number of important purposes: the tail end of
the code generator uses it to write a .s or .o file, and it is also used by the
-llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
+llvm-mc tool to implement standalone machine code assemblers and disassemblers.
</p>
<p>
in this manual.
</p>
-</div>
-
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
MCStreamer is best thought of as an assembler API. It is an abstract API which
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mccontext">The <tt>MCContext</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCContext class is the owner of a variety of uniqued data structures at the
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCSymbol class represents a symbol (aka label) in the assembly file. There
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="mcsection">The <tt>MCSection</tt> class</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCSection class represents an object-file specific section. It is subclassed
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
- <a name="mcinst">The <tt>MCInst</tt> class</a></li>
-</div>
+<h3>
+ <a name="mcinst">The <tt>MCInst</tt> class</a>
+</h3>
-<div class="doc_text">
+<div>
<p>
The MCInst class is a target-independent representation of an instruction. It
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="codegenalgs">Target-independent code generation algorithms</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>This section documents the phases described in the
<a href="#high-level-design">high-level design of the code generator</a>.
It explains how they work and some of the rationale behind their design.</p>
-</div>
-
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="instselect">Instruction Selection</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Instruction Selection is the process of translating LLVM code presented to
the code generator into target-specific machine instructions. There are
selector to be generated from these <tt>.td</tt> files, though currently
there are still things that require custom C++ code.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The SelectionDAG provides an abstraction for code representation in a way
that is amenable to instruction selection using automatic techniques
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>SelectionDAG-based instruction selection consists of the following steps:</p>
SelectionDAG optimizer is run to clean up redundancies exposed by type
legalization.</li>
- <li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> —
- This stage transforms SelectionDAG nodes to eliminate any types that are
- unsupported on the target.</li>
+ <li><a href="#selectiondag_legalize">Legalize SelectionDAG Ops</a> —
+ This stage transforms SelectionDAG nodes to eliminate any operations
+ that are unsupported on the target.</li>
<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
SelectionDAG optimizer is run to eliminate inefficiencies introduced by
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_build">Initial SelectionDAG Construction</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
input by the <tt>SelectionDAGLowering</tt> class in the
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Legalize phase is in charge of converting a DAG to only use the types
that are natively supported by the target.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Legalize phase is in charge of converting a DAG to only use the
operations that are natively supported by the target.</p>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
- <a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
- Combiner</a>
-</div>
+<h4>
+ <a name="selectiondag_optimize">
+ SelectionDAG Optimization Phase: the DAG Combiner
+ </a>
+</h4>
-<div class="doc_text">
+<div>
<p>The SelectionDAG optimization phase is run multiple times for code
generation, immediately after the DAG is built and once after each
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_select">SelectionDAG Select Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The Select phase is the bulk of the target-specific code for instruction
selection. This phase takes a legal SelectionDAG as input, pattern matches
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The scheduling phase takes the DAG of target instructions from the selection
phase and assigns an order. The scheduler can pick an order depending on
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="selectiondag_future">Future directions for the SelectionDAG</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<ol>
<li>Optional function-at-a-time selection.</li>
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="ssamco">SSA-based Machine Code Optimizations</a>
-</div>
-<div class="doc_text"><p>To Be Written</p></div>
+</h3>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="liveintervals">Live Intervals</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
They are used by some <a href="#regalloc">register allocator</a> passes to
register are live at the same point in the program (i.e., they conflict).
When this situation occurs, one virtual register must be <i>spilled</i>.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="livevariable_analysis">Live Variable Analysis</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The first step in determining the live intervals of variables is to calculate
the set of registers that are immediately dead after the instruction (i.e.,
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="liveintervals_analysis">Live Intervals Analysis</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>We now have the information available to perform the live intervals analysis
and build the live intervals themselves. We start off by numbering the basic
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="regalloc">Register Allocation</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The <i>Register Allocation problem</i> consists in mapping a program
<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
accommodate all the virtual registers, some of them will have to be mapped
into memory. These virtuals are called <i>spilled virtuals</i>.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_represent">How registers are represented in LLVM</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>In LLVM, physical registers are denoted by integer numbers that normally
range from 1 to 1023. To see how this numbering is defined for a particular
</p>
<p>Virtual registers are also denoted by integer numbers. Contrary to physical
- registers, different virtual registers never share the same number. The
- smallest virtual register is normally assigned the number 1024. This may
- change, so, in order to know which is the first virtual register, you should
- access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
- number is greater than or equal
- to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
- register. Whereas physical registers are statically defined in
- a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
- application developer, that is not the case with virtual registers. In order
- to create new virtual registers, use the
+ registers, different virtual registers never share the same number. Whereas
+ physical registers are statically defined in a <tt>TargetRegisterInfo.td</tt>
+ file and cannot be created by the application developer, that is not the case
+ with virtual registers. In order to create new virtual registers, use the
method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
- will return a virtual register with the highest code.</p>
+ will return a new virtual register. Use an <tt>IndexedMap<Foo,
+ VirtReg2IndexFunctor></tt> to hold information per virtual register. If you
+ need to enumerate all virtual registers, use the function
+ <tt>TargetRegisterInfo::index2VirtReg()</tt> to find the virtual register
+ numbers:</p>
+
+<div class="doc_code">
+<pre>
+ for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
+ unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
+ stuff(VirtReg);
+ }
+</pre>
+</div>
<p>Before register allocation, the operands of an instruction are mostly virtual
registers, although physical registers may also be used. In order to check if
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>There are two ways to map virtual registers to physical registers (or to
memory slots). The first way, that we will call <i>direct mapping</i>, is
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_twoAddr">Handling two address instructions</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>With very rare exceptions (e.g., function calls), the LLVM machine code
instructions are three address instructions. That is, each instruction is
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>An important transformation that happens during register allocation is called
the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_fold">Instruction folding</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><i>Instruction folding</i> is an optimization performed during register
allocation that removes unnecessary copy instructions. For instance, a
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="regAlloc_builtIn">Built in register allocators</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The LLVM infrastructure provides the application developer with three
different register allocators:</p>
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="proepicode">Prolog/Epilog Code Insertion</a>
-</div>
-<div class="doc_text"><p>To Be Written</p></div>
+</h3>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="latemco">Late Machine Code Optimizations</a>
-</div>
-<div class="doc_text"><p>To Be Written</p></div>
+</h3>
+<div><p>To Be Written</p></div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="codeemit">Code Emission</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The code emission step of code generation is responsible for lowering from
the code generator abstractions (like <a
</div>
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="nativeassembler">Implementing a Native Assembler</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>Though you're probably reading this because you want to write or maintain a
compiler backend, LLVM also fully supports building a native assemblers too.
part of the manual and repetitive data entry can be factored and shared with the
compiler.</p>
+<!-- ======================================================================= -->
+<h3 id="na_instparsing">Instruction Parsing</h3>
+
+<div><p>To Be Written</p></div>
+
+
+<!-- ======================================================================= -->
+<h3 id="na_instaliases">
+ Instruction Alias Processing
+</h3>
+
+<div>
+<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
+The MatchInstructionImpl function performs alias processing and then does
+actual matching.</p>
+
+<p>Alias processing is the phase that canonicalizes different lexical forms of
+the same instructions down to one representation. There are several different
+kinds of alias that are possible to implement and they are listed below in the
+order that they are processed (which is in order from simplest/weakest to most
+complex/powerful). Generally you want to use the first alias mechanism that
+meets the needs of your instruction, because it will allow a more concise
+description.</p>
+
+<!-- _______________________________________________________________________ -->
+<h4>Mnemonic Aliases</h4>
+<div>
+<p>The first phase of alias processing is simple instruction mnemonic
+remapping for classes of instructions which are allowed with two different
+mnemonics. This phase is a simple and unconditionally remapping from one input
+mnemonic to one output mnemonic. It isn't possible for this form of alias to
+look at the operands at all, so the remapping must apply for all forms of a
+given mnemonic. Mnemonic aliases are defined simply, for example X86 has:
+</p>
+
+<div class="doc_code">
+<pre>
+def : MnemonicAlias<"cbw", "cbtw">;
+def : MnemonicAlias<"smovq", "movsq">;
+def : MnemonicAlias<"fldcww", "fldcw">;
+def : MnemonicAlias<"fucompi", "fucomip">;
+def : MnemonicAlias<"ud2a", "ud2">;
+</pre>
</div>
+<p>... and many others. With a MnemonicAlias definition, the mnemonic is
+remapped simply and directly. Though MnemonicAlias's can't look at any aspect
+of the instruction (such as the operands) they can depend on global modes (the
+same ones supported by the matcher), through a Requires clause:</p>
-<!-- ======================================================================= -->
-<div class="doc_subsection">
- <a name="proepicode">Prolog/Epilog Code Insertion</a>
+<div class="doc_code">
+<pre>
+def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
+def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
+</pre>
</div>
-<div class="doc_text"><p>To Be Written</p></div>
+<p>In this example, the mnemonic gets mapped into different a new one depending
+on the current instruction set.</p>
+</div>
+<!-- _______________________________________________________________________ -->
+<h4>Instruction Aliases</h4>
+
+<div>
+
+<p>The most general phase of alias processing occurs while matching is
+happening: it provides new forms for the matcher to match along with a specific
+instruction to generate. An instruction alias has two parts: the string to
+match and the instruction to generate. For example:
+</p>
+
+<div class="doc_code">
+<pre>
+def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
+def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
+</pre>
+</div>
+
+<p>This shows a powerful example of the instruction aliases, matching the
+same mnemonic in multiple different ways depending on what operands are present
+in the assembly. The result of instruction aliases can include operands in a
+different order than the destination instruction, and can use an input
+multiple times, for example:</p>
+
+<div class="doc_code">
+<pre>
+def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>;
+def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
+def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
+def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
+</pre>
+</div>
+
+<p>This example also shows that tied operands are only listed once. In the X86
+backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied
+to the output). InstAliases take a flattened operand list without duplicates
+for tied operands. The result of an instruction alias can also use immediates
+and fixed physical registers which are added as simple immediate operands in the
+result, for example:</p>
+
+<div class="doc_code">
+<pre>
+// Fixed Immediate operand.
+def : InstAlias<"aad", (AAD8i8 10)>;
+
+// Fixed register operand.
+def : InstAlias<"fcomi", (COM_FIr ST1)>;
+
+// Simple alias.
+def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
+</pre>
+</div>
+
+
+<p>Instruction aliases can also have a Requires clause to make them
+subtarget specific.</p>
+
+<p>If the back-end supports it, the instruction printer can automatically emit
+ the alias rather than what's being aliased. It typically leads to better,
+ more readable code. If it's better to print out what's being aliased, then
+ pass a '0' as the third parameter to the InstAlias definition.</p>
+
+</div>
+
+</div>
+
+<!-- ======================================================================= -->
+<h3 id="na_matching">Instruction Matching</h3>
+
+<div><p>To Be Written</p></div>
+
+</div>
<!-- *********************************************************************** -->
-<div class="doc_section">
+<h2>
<a name="targetimpls">Target-specific Implementation Notes</a>
-</div>
+</h2>
<!-- *********************************************************************** -->
-<div class="doc_text">
+<div>
<p>This section of the document explains features or design decisions that are
- specific to the code generator for a particular target.</p>
+ specific to the code generator for a particular target. First we start
+ with a table that summarizes what features are supported by each target.</p>
+
+<!-- ======================================================================= -->
+<h3>
+ <a name="targetfeatures">Target Feature Matrix</a>
+</h3>
+
+<div>
+
+<p>Note that this table does not include the C backend or Cpp backends, since
+they do not use the target independent code generator infrastructure. It also
+doesn't list features that are not supported fully by any target yet. It
+considers a feature to be supported if at least one subtarget supports it. A
+feature being supported means that it is useful and works for most cases, it
+does not indicate that there are zero known bugs in the implementation. Here
+is the key:</p>
+
+
+<table border="1" cellspacing="0">
+ <tr>
+ <th>Unknown</th>
+ <th>No support</th>
+ <th>Partial Support</th>
+ <th>Complete Support</th>
+ </tr>
+ <tr>
+ <td class="unknown"></td>
+ <td class="no"></td>
+ <td class="partial"></td>
+ <td class="yes"></td>
+ </tr>
+</table>
+
+<p>Here is the table:</p>
+
+<table width="689" border="1" cellspacing="0">
+<tr><td></td>
+<td colspan="13" align="center" style="background-color:#ffc">Target</td>
+</tr>
+ <tr>
+ <th>Feature</th>
+ <th>ARM</th>
+ <th>Alpha</th>
+ <th>Blackfin</th>
+ <th>CellSPU</th>
+ <th>MBlaze</th>
+ <th>MSP430</th>
+ <th>Mips</th>
+ <th>PTX</th>
+ <th>PowerPC</th>
+ <th>Sparc</th>
+ <th>SystemZ</th>
+ <th>X86</th>
+ <th>XCore</th>
+ </tr>
+
+<tr>
+ <td><a href="#feat_reliable">is generally reliable</a></td>
+ <td class="yes"></td> <!-- ARM -->
+ <td class="unknown"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="no"></td> <!-- MBlaze -->
+ <td class="unknown"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="no"></td> <!-- PTX -->
+ <td class="yes"></td> <!-- PowerPC -->
+ <td class="yes"></td> <!-- Sparc -->
+ <td class="unknown"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="unknown"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_asmparser">assembly parser</a></td>
+ <td class="no"></td> <!-- ARM -->
+ <td class="no"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="yes"></td> <!-- MBlaze -->
+ <td class="no"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="no"></td> <!-- PTX -->
+ <td class="no"></td> <!-- PowerPC -->
+ <td class="no"></td> <!-- Sparc -->
+ <td class="no"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="no"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_disassembler">disassembler</a></td>
+ <td class="yes"></td> <!-- ARM -->
+ <td class="no"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="yes"></td> <!-- MBlaze -->
+ <td class="no"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="no"></td> <!-- PTX -->
+ <td class="no"></td> <!-- PowerPC -->
+ <td class="no"></td> <!-- Sparc -->
+ <td class="no"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="no"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_inlineasm">inline asm</a></td>
+ <td class="yes"></td> <!-- ARM -->
+ <td class="unknown"></td> <!-- Alpha -->
+ <td class="yes"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="yes"></td> <!-- MBlaze -->
+ <td class="unknown"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="unknown"></td> <!-- PTX -->
+ <td class="yes"></td> <!-- PowerPC -->
+ <td class="unknown"></td> <!-- Sparc -->
+ <td class="unknown"></td> <!-- SystemZ -->
+ <td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 -->
+ <td class="unknown"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_jit">jit</a></td>
+ <td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
+ <td class="no"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="no"></td> <!-- MBlaze -->
+ <td class="unknown"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="unknown"></td> <!-- PTX -->
+ <td class="yes"></td> <!-- PowerPC -->
+ <td class="unknown"></td> <!-- Sparc -->
+ <td class="unknown"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="unknown"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_objectwrite">.o file writing</a></td>
+ <td class="no"></td> <!-- ARM -->
+ <td class="no"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="yes"></td> <!-- MBlaze -->
+ <td class="no"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="no"></td> <!-- PTX -->
+ <td class="no"></td> <!-- PowerPC -->
+ <td class="no"></td> <!-- Sparc -->
+ <td class="no"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="no"></td> <!-- XCore -->
+</tr>
+
+<tr>
+ <td><a href="#feat_tailcall">tail calls</a></td>
+ <td class="yes"></td> <!-- ARM -->
+ <td class="unknown"></td> <!-- Alpha -->
+ <td class="no"></td> <!-- Blackfin -->
+ <td class="no"></td> <!-- CellSPU -->
+ <td class="no"></td> <!-- MBlaze -->
+ <td class="unknown"></td> <!-- MSP430 -->
+ <td class="no"></td> <!-- Mips -->
+ <td class="unknown"></td> <!-- PTX -->
+ <td class="yes"></td> <!-- PowerPC -->
+ <td class="unknown"></td> <!-- Sparc -->
+ <td class="unknown"></td> <!-- SystemZ -->
+ <td class="yes"></td> <!-- X86 -->
+ <td class="unknown"></td> <!-- XCore -->
+</tr>
+
+
+</table>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_reliable">Is Generally Reliable</h4>
+
+<div>
+<p>This box indicates whether the target is considered to be production quality.
+This indicates that the target has been used as a static compiler to
+compile large amounts of code by a variety of different people and is in
+continuous use.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_asmparser">Assembly Parser</h4>
+
+<div>
+<p>This box indicates whether the target supports parsing target specific .s
+files by implementing the MCAsmParser interface. This is required for llvm-mc
+to be able to act as a native assembler and is required for inline assembly
+support in the native .o file writer.</p>
+
+</div>
+
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_disassembler">Disassembler</h4>
+
+<div>
+<p>This box indicates whether the target supports the MCDisassembler API for
+disassembling machine opcode bytes into MCInst's.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_inlineasm">Inline Asm</h4>
+
+<div>
+<p>This box indicates whether the target supports most popular inline assembly
+constraints and modifiers.</p>
+
+<p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly
+constraints relating to the X86 floating point stack.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_jit">JIT Support</h4>
+
+<div>
+<p>This box indicates whether the target supports the JIT compiler through
+the ExecutionEngine interface.</p>
+
+<p id="feat_jit_arm">The ARM backend has basic support for integer code
+in ARM codegen mode, but lacks NEON and full Thumb support.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_objectwrite">.o File Writing</h4>
+
+<div>
+
+<p>This box indicates whether the target supports writing .o files (e.g. MachO,
+ELF, and/or COFF) files directly from the target. Note that the target also
+must include an assembly parser and general inline assembly support for full
+inline assembly support in the .o writer.</p>
+
+<p>Targets that don't support this feature can obviously still write out .o
+files, they just rely on having an external assembler to translate from a .s
+file to a .o file (as is the case for many C compilers).</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4 id="feat_tailcall">Tail Calls</h4>
+
+<div>
+
+<p>This box indicates whether the target supports guaranteed tail calls. These
+are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
+calling convention. Please see the <a href="#tailcallopt">tail call section
+more more details</a>.</p>
+
+</div>
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="tailcallopt">Tail call optimization</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Tail call optimization, callee reusing the stack of the caller, is currently
supported on x86/x86-64 and PowerPC. It is performed if:</p>
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="sibcallopt">Sibling call optimization</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>Sibling call optimization is a restricted form of tail call optimization.
Unlike tail call optimization described in the previous section, it can be
</div>
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="x86">The X86 backend</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
code generator is capable of targeting a variety of x86-32 and x86-64
processors, and includes support for ISA extensions such as MMX and SSE.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_tt">X86 Target Triples supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The following are the known target triples that are supported by the X86
backend. This is not an exhaustive list, and it would be useful to add those
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_cc">X86 Calling Conventions supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The following target-specific calling conventions are known to backend:</p>
<ul>
- <li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
- Windows platform (CC ID = 64).</li>
-
- <li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
- Windows platform (CC ID = 65).</li>
+<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
+ Windows platform (CC ID = 64).</li>
+<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
+ Windows platform (CC ID = 65).</li>
+<li><b>x86_ThisCall</b> — Similar to X86_StdCall. Passes first argument
+ in ECX, others via stack. Callee is responsible for stack cleaning. This
+ convention is used by MSVC by default for methods in its ABI
+ (CC ID = 70).</li>
</ul>
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The x86 has a very flexible way of accessing memory. It is capable of
forming memory addresses of the following expression directly in integer
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_memory">X86 address spaces supported</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
-<p>x86 has an experimental feature which provides
+<p>x86 has a feature which provides
the ability to perform loads and stores to different address spaces
via the x86 segment registers. A segment override prefix byte on an
instruction causes the instruction's memory access to go to the specified
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="x86_names">Instruction naming</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>An instruction name consists of the base name, a default operand size, and a
a character per operand with an optional special size. For example:</p>
</div>
+</div>
+
<!-- ======================================================================= -->
-<div class="doc_subsection">
+<h3>
<a name="ppc">The PowerPC backend</a>
-</div>
+</h3>
-<div class="doc_text">
+<div>
<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
code generation is retargetable to several variations or <i>subtargets</i> of
the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
-</div>
-
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_abi">LLVM PowerPC ABI</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
relative (PIC) or static addressing for accessing global values, so no TOC
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_frame">Frame Layout</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The size of a PowerPC frame is usually fixed for the duration of a
function's invocation. Since the frame is fixed size, all references
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_prolog">Prolog/Epilog</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
the following exceptions. Callee saved registers are spilled after the frame
</div>
<!-- _______________________________________________________________________ -->
-<div class="doc_subsubsection">
+<h4>
<a name="ppc_dynamic">Dynamic Allocation</a>
-</div>
+</h4>
-<div class="doc_text">
+<div>
<p><i>TODO - More to come.</i></p>
</div>
+</div>
+
+</div>
<!-- *********************************************************************** -->
<hr>
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
- <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
+ <a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Last modified: $Date$
</address>