<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
- <li><a href="#mregisterinfo">The <tt>MRegisterInfo</tt> class</a></li>
+ <li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
</li>
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
+ <li><a href="#tailcallopt">Tail call optimization</a></li>
<li><a href="#x86">The X86 backend</a></li>
<li><a href="#ppc">The PowerPC backend</a>
<ul>
<!-- ======================================================================= -->
<div class="doc_subsection">
- <a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a>
+ <a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
</div>
<div class="doc_text">
-<p>The <tt>MRegisterInfo</tt> class (which will eventually be renamed to
-<tt>TargetRegisterInfo</tt>) is used to describe the register file of the
-target and any interactions between the registers.</p>
+<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register
+file of the target and any interactions between the registers.</p>
<p>Registers in the code generator are represented in the code generator by
unsigned integers. Physical registers (those that actually exist in the target
(used to indicate whether one register overlaps with another).
</p>
-<p>In addition to the per-register description, the <tt>MRegisterInfo</tt> class
-exposes a set of processor specific register classes (instances of the
+<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
+class exposes a set of processor specific register classes (instances of the
<tt>TargetRegisterClass</tt> class). Each register class contains sets of
registers that have the same properties (for example, they are all 32-bit
integer registers). Each SSA virtual register created by the instruction
<div class="doc_code">
<pre>
-int %test(int %X, int %Y) {
- %Z = div int %X, %Y
- ret int %Z
+define i32 @test(i32 %X, i32 %Y) {
+ %Z = udiv i32 %X, %Y
+ ret i32 %Z
}
</pre>
</div>
corresponds one-to-one with the LLVM function input to the instruction selector.
In addition to a list of basic blocks, the <tt>MachineFunction</tt> contains a
a <tt>MachineConstantPool</tt>, a <tt>MachineFrameInfo</tt>, a
-<tt>MachineFunctionInfo</tt>, a <tt>SSARegMap</tt>, and a set of live in and
-live out registers for the function. See
+<tt>MachineFunctionInfo</tt>, and a <tt>MachineRegisterInfo</tt>. See
<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
</div>
<p>
Instruction Selection is the process of translating LLVM code presented to the
code generator into target-specific machine instructions. There are several
-well-known ways to do this in the literature. In LLVM there are two main forms:
-the SelectionDAG based instruction selector framework and an old-style 'simple'
-instruction selector, which effectively peephole selects each LLVM instruction
-into a series of machine instructions. We recommend that all targets use the
-SelectionDAG infrastructure.
+well-known ways to do this in the literature. LLVM uses a SelectionDAG based
+instruction selector.
</p>
<p>Portions of the DAG instruction selector are generated from the target
description (<tt>*.td</tt>) files. Our goal is for the entire instruction
-selector to be generated from these <tt>.td</tt> files.</p>
+selector to be generated from these <tt>.td</tt> files, though currently
+there are still things that require custom C++ code.</p>
</div>
<!-- _______________________________________________________________________ -->
both the dividend and the remainder. Many other situations require multiple
values as well. Each node also has some number of operands, which are edges
to the node defining the used value. Because nodes may define multiple values,
-edges are represented by instances of the <tt>SDOperand</tt> class, which is
+edges are represented by instances of the <tt>SDValue</tt> class, which is
a <tt><SDNode, unsigned></tt> pair, indicating the node and result
value being used, respectively. Each value produced by an <tt>SDNode</tt> has
-an associated <tt>MVT::ValueType</tt> indicating what type the value is.</p>
+an associated <tt>MVT</tt> (Machine Value Type) indicating what the type of the
+value is.</p>
<p>SelectionDAGs contain two different kinds of values: those that represent
data flow and those that represent control flow dependencies. Data values are
rest of the code generation passes are run.</p>
<p>One great way to visualize what is going on here is to take advantage of a
-few LLC command line options. In particular, the <tt>-view-isel-dags</tt>
-option pops up a window with the SelectionDAG input to the Select phase for all
-of the code compiled (if you only get errors printed to the console while using
-this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
-your system</a> to add support for it). The <tt>-view-sched-dags</tt> option
-views the SelectionDAG output from the Select phase and input to the Scheduler
-phase.</p>
+few LLC command line options. The following options pop up a window displaying
+the SelectionDAG at specific times (if you only get errors printed to the console
+while using this, you probably
+<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a> to
+add support for it).</p>
+
+<ul>
+<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built, before
+ the first optimization pass.</li>
+<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
+<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
+ optimization pass.</li>
+<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
+<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
+</ul>
+
+<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
+This graph is based on the final SelectionDAG, with nodes that must be
+scheduled together bundled into a single scheduling-unit node, and with
+immediate operands and other nodes that aren't relevent for scheduling
+omitted.
+</p>
</div>
<li>There is no great way to support matching complex addressing modes yet. In
the future, we will extend pattern fragments to allow them to define
multiple values (e.g. the four operands of the <a href="#x86_memory">X86
- addressing mode</a>). In addition, we'll extend fragments so that a
+ addressing mode</a>, which are currently matched with custom C++ code).
+ In addition, we'll extend fragments so that a
fragment can match multiple different patterns.</li>
<li>We don't automatically infer flags like isStore/isLoad yet.</li>
<li>We don't automatically generate the set of supported registers and
marked as <i>aliased</i> in LLVM. Given a particular architecture, you
can check which registers are aliased by inspecting its
<tt>RegisterInfo.td</tt> file. Moreover, the method
-<tt>MRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
+<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
all the physical registers aliased to the register <tt>p_reg</tt>.</p>
<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
bool RegMapping_Fer::compatible_class(MachineFunction &mf,
unsigned v_reg,
unsigned p_reg) {
- assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
+ assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
"Target register must be physical");
- const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
- return trc->contains(p_reg);
+ const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
+ return trc->contains(p_reg);
}
</pre>
</div>
number. The smallest virtual register is normally assigned the number
1024. This may change, so, in order to know which is the first virtual
register, you should access
-<tt>MRegisterInfo::FirstVirtualRegister</tt>. Any register whose
+<tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
number is greater than or equal to
-<tt>MRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
+<tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
register. Whereas physical registers are statically defined in a
<tt>TargetRegisterInfo.td</tt> file and cannot be created by the
application developer, that is not the case with virtual registers.
In order to create new virtual registers, use the method
-<tt>SSARegMap::createVirtualRegister()</tt>. This method will return a
+<tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method will return a
virtual register with the highest code.
</p>
<p>There are two ways to map virtual registers to physical registers (or to
memory slots). The first way, that we will call <i>direct mapping</i>,
-is based on the use of methods of the classes <tt>MRegisterInfo</tt>,
+is based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
and <tt>MachineOperand</tt>. The second way, that we will call
<i>indirect mapping</i>, relies on the <tt>VirtRegMap</tt> class in
order to insert loads and stores sending and getting values to and from
memory. To assign a physical register to a virtual register present in
a given operand, use <tt>MachineOperand::setReg(p_reg)</tt>. To insert
a store instruction, use
-<tt>MRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
-instruction, use <tt>MRegisterInfo::loadRegFromStackSlot</tt>.</p>
+<tt>TargetRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
+instruction, use <tt>TargetRegisterInfo::loadRegFromStackSlot</tt>.</p>
<p>The indirect mapping shields the application developer from the
complexities of inserting load and store instructions. In order to map
<div class="doc_code">
<pre>
%a = MOVE %b
-%a = ADD %a %b
+%a = ADD %a %c
</pre>
</div>
<p>Notice that, internally, the second instruction is represented as
-<tt>ADD %a[def/use] %b</tt>. I.e., the register operand <tt>%a</tt> is
+<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is
both used and defined by the instruction.</p>
</div>
</div>
<p>Instructions can be folded with the
-<tt>MRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
+<tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
taken when folding instructions; a folded instruction can be quite
different from the original instruction. See
<tt>LiveIntervals::addIntervalsForSpills</tt> in
</div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="tailcallopt">Tail call optimization</a>
+</div>
+
+<div class="doc_text">
+ <p>Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:
+ <ul>
+ <li>Caller and callee have the calling convention <tt>fastcc</tt>.</li>
+ <li>The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).</li>
+ <li>Option <tt>-tailcallopt</tt> is enabled.</li>
+ <li>Platform specific constraints are met.</li>
+ </ul>
+ </p>
+ <p>x86/x86-64 constraints:
+ <ul>
+ <li>No variable argument lists are used.</li>
+ <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.</li>
+ </ul>
+ </p>
+ <p>PowerPC constraints:
+ <ul>
+ <li>No variable argument lists are used.</li>
+ <li>No byval parameters are used.</li>
+ <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
+ </ul>
+ </p>
+ <p>Example:</p>
+ <p>Call as <tt>llc -tailcallopt test.ll</tt>.
+ <div class="doc_code">
+ <pre>
+declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
+
+define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
+ %l1 = add i32 %in1, %in2
+ %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
+ ret i32 %tmp
+}</pre>
+ </div>
+ </p>
+ <p>Implications of <tt>-tailcallopt</tt>:</p>
+ <p>To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each <tt>fastcc</tt> call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.</p>
+ <p>On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a function pointer). So there is one less register for integer argument passing. For x86 this means 2 registers (if <tt>inreg</tt> parameter attribute is used) and for x86-64 this means 5 register are used.</p>
+</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="x86">The X86 backend</a>
<div class="doc_text">
<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
-code generator currently targets a generic P6-like processor. As such, it
-produces a few P6-and-above instructions (like conditional moves), but it does
-not make use of newer features like MMX or SSE. In the future, the X86 backend
-will have sub-target support added for specific processor families and
-implementations.</p>
+code generator is capable of targeting a variety of x86-32 and x86-64
+processors, and includes support for ISA extensions such as MMX and SSE.
+</p>
</div>