"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
+ <meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>The LLVM Target-Independent Code Generator</title>
<link rel="stylesheet" href="llvm.css" type="text/css">
</head>
<li><a href="#selectiondag_future">Future directions for the
SelectionDAG</a></li>
</ul></li>
- <ul>
+ <li><a href="#liveintervals">Live Intervals</a>
+ <ul>
+ <li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
+ <li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
+ </ul></li>
<li><a href="#regalloc">Register Allocation</a>
<ul>
<li><a href="#regAlloc_represent">How registers are represented in
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
<li><a href="#x86">The X86 backend</a></li>
- </ul>
- </li>
+ <li><a href="#ppc">The PowerPC backend</a>
+ <ul>
+ <li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
+ <li><a href="#ppc_frame">Frame Layout</a></li>
+ <li><a href="#ppc_prolog">Prolog/Epilog</a></li>
+ <li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
+ </ul></li>
+ </ul></li>
</ol>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
- <a href="mailto:isanbard@gmail.com">Bill Wendling</a>, and
+ <a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
<a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
- Pereira</a></p>
+ Pereira</a> and
+ <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
</div>
<div class="doc_warning">
<li>the type to use for shift amounts</li>
<li>various high-level characteristics, like whether it is profitable to turn
division by a constant into a multiplication sequence</li>
-</ol>
+</ul>
</div>
<p>
Instruction Selection is the process of translating LLVM code presented to the
code generator into target-specific machine instructions. There are several
-well-known ways to do this in the literature. In LLVM there are two main forms:
-the SelectionDAG based instruction selector framework and an old-style 'simple'
-instruction selector, which effectively peephole selects each LLVM instruction
-into a series of machine instructions. We recommend that all targets use the
-SelectionDAG infrastructure.
+well-known ways to do this in the literature. LLVM uses a SelectionDAG based
+instruction selector.
</p>
<p>Portions of the DAG instruction selector are generated from the target
this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
your system</a> to add support for it). The <tt>-view-sched-dags</tt> option
views the SelectionDAG output from the Select phase and input to the Scheduler
-phase.</p>
+phase. The <tt>-view-sunit-dags</tt> option views the ScheduleDAG, which is
+based on the final SelectionDAG, with nodes that must be scheduled as a unit
+bundled together into a single node, and with immediate operands and other
+nodes that aren't relevent for scheduling omitted.
+</p>
</div>
fragment can match multiple different patterns.</li>
<li>We don't automatically infer flags like isStore/isLoad yet.</li>
<li>We don't automatically generate the set of supported registers and
- operations for the <a href="#"selectiondag_legalize>Legalizer</a> yet.</li>
+ operations for the <a href="#selectiondag_legalize">Legalizer</a> yet.</li>
<li>We don't have a way of tying in custom legalized nodes yet.</li>
</ul>
<ol>
<li>Optional function-at-a-time selection.</li>
<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
-</li>
</ol>
</div>
</div>
<div class="doc_text"><p>To Be Written</p></div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="liveintervals">Live Intervals</a>
+</div>
+
+<div class="doc_text">
+
+<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
+They are used by some <a href="#regalloc">register allocator</a> passes to
+determine if two or more virtual registers which require the same physical
+register are live at the same point in the program (i.e., they conflict). When
+this situation occurs, one virtual register must be <i>spilled</i>.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="livevariable_analysis">Live Variable Analysis</a>
+</div>
+
+<div class="doc_text">
+
+<p>The first step in determining the live intervals of variables is to
+calculate the set of registers that are immediately dead after the
+instruction (i.e., the instruction calculates the value, but it is
+never used) and the set of registers that are used by the instruction,
+but are never used after the instruction (i.e., they are killed). Live
+variable information is computed for each <i>virtual</i> register and
+<i>register allocatable</i> physical register in the function. This
+is done in a very efficient manner because it uses SSA to sparsely
+compute lifetime information for virtual registers (which are in SSA
+form) and only has to track physical registers within a block. Before
+register allocation, LLVM can assume that physical registers are only
+live within a single basic block. This allows it to do a single,
+local analysis to resolve physical register lifetimes within each
+basic block. If a physical register is not register allocatable (e.g.,
+a stack pointer or condition codes), it is not tracked.</p>
+
+<p>Physical registers may be live in to or out of a function. Live in values
+are typically arguments in registers. Live out values are typically return
+values in registers. Live in values are marked as such, and are given a dummy
+"defining" instruction during live intervals analysis. If the last basic block
+of a function is a <tt>return</tt>, then it's marked as using all live out
+values in the function.</p>
+
+<p><tt>PHI</tt> nodes need to be handled specially, because the calculation
+of the live variable information from a depth first traversal of the CFG of
+the function won't guarantee that a virtual register used by the <tt>PHI</tt>
+node is defined before it's used. When a <tt>PHI</tt> node is encounted, only
+the definition is handled, because the uses will be handled in other basic
+blocks.</p>
+
+<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
+assignment at the end of the current basic block and traverse the successor
+basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
+the <tt>PHI</tt> node's operands is coming from the current basic block,
+then the variable is marked as <i>alive</i> within the current basic block
+and all of its predecessor basic blocks, until the basic block with the
+defining instruction is encountered.</p>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="liveintervals_analysis">Live Intervals Analysis</a>
+</div>
+
+<div class="doc_text">
+
+<p>We now have the information available to perform the live intervals analysis
+and build the live intervals themselves. We start off by numbering the basic
+blocks and machine instructions. We then handle the "live-in" values. These
+are in physical registers, so the physical register is assumed to be killed by
+the end of the basic block. Live intervals for virtual registers are computed
+for some ordering of the machine instructions <tt>[1, N]</tt>. A live interval
+is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j < N</tt>, for which a
+variable is live.</p>
+
+<p><i><b>More to come...</b></i></p>
+
+</div>
+
<!-- ======================================================================= -->
<div class="doc_subsection">
<a name="regalloc">Register Allocation</a>
<div class="doc_text">
-<p>The <i>Register Allocation problem</i> consists in mapping a
-program <i>P<sub>v</sub></i>, that can use an unbounded number of
-virtual registers, to a program <i>P<sub>p</sub></i> that contains a
-finite (possibly small) number of physical registers. Each target
-architecture has a different number of physical registers. If the
-number of physical registers is not enough to accommodate all the
-virtual registers, some of them will have to be mapped into
-memory. These virtuals are called <i>spilled virtuals</i>.</p>
+<p>The <i>Register Allocation problem</i> consists in mapping a program
+<i>P<sub>v</sub></i>, that can use an unbounded number of virtual
+registers, to a program <i>P<sub>p</sub></i> that contains a finite
+(possibly small) number of physical registers. Each target architecture has
+a different number of physical registers. If the number of physical
+registers is not enough to accommodate all the virtual registers, some of
+them will have to be mapped into memory. These virtuals are called
+<i>spilled virtuals</i>.</p>
</div>
<div class="doc_code">
<pre>
-bool RegMapping_Fer::compatible_class(MachineFunction &mf,
+bool RegMapping_Fer::compatible_class(MachineFunction &mf,
unsigned v_reg,
unsigned p_reg) {
- assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
+ assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
"Target register must be physical");
const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
return trc->contains(p_reg);
<tt>MachineOperand::isDef()</tt> informs if that registers is being
defined.</p>
-<p>We will call physical registers present in the LLVM bytecode before
+<p>We will call physical registers present in the LLVM bitcode before
register allocation <i>pre-colored registers</i>. Pre-colored
registers are used in many different situations, for instance, to pass
parameters of functions calls, and to store results of particular
<li><b>i386-unknown-freebsd5.3</b> - FreeBSD 5.3</li>
<li><b>i686-pc-cygwin</b> - Cygwin on Win32</li>
<li><b>i686-pc-mingw32</b> - MingW on Win32</li>
+<li><b>i386-pc-mingw32msvc</b> - MingW crosscompiler on Linux</li>
<li><b>i686-apple-darwin*</b> - Apple Darwin on X86</li>
</ul>
</div>
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="x86_cc">X86 Calling Conventions supported</a>
+</div>
+
+
+<div class="doc_text">
+
+<p>The folowing target-specific calling conventions are known to backend:</p>
+
+<ul>
+<li><b>x86_StdCall</b> - stdcall calling convention seen on Microsoft Windows
+platform (CC ID = 64).</li>
+<li><b>x86_FastCall</b> - fastcall calling convention seen on Microsoft Windows
+platform (CC ID = 65).</li>
+</ul>
+
+</div>
+
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection">
<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
</div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="ppc">The PowerPC backend</a>
+</div>
+
+<div class="doc_text">
+<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
+code generation is retargetable to several variations or <i>subtargets</i> of
+the PowerPC ISA; including ppc32, ppc64 and altivec.
+</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_abi">LLVM PowerPC ABI</a>
+</div>
+
+<div class="doc_text">
+<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
+relative (PIC) or static addressing for accessing global values, so no TOC (r2)
+is used. Second, r31 is used as a frame pointer to allow dynamic growth of a
+stack frame. LLVM takes advantage of having no TOC to provide space to save
+the frame pointer in the PowerPC linkage area of the caller frame. Other
+details of PowerPC ABI can be found at <a href=
+"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
+>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The
+64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is
+reserved for system use.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_frame">Frame Layout</a>
+</div>
+
+<div class="doc_text">
+<p>The size of a PowerPC frame is usually fixed for the duration of a
+function’s invocation. Since the frame is fixed size, all references into
+the frame can be accessed via fixed offsets from the stack pointer. The
+exception to this is when dynamic alloca or variable sized arrays are present,
+then a base pointer (r31) is used as a proxy for the stack pointer and stack
+pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is
+not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to
+16 bytes, so that space allocated for altivec vectors will be properly
+aligned.</p>
+<p>An invocation frame is layed out as follows (low memory at top);</p>
+</div>
+
+<div class="doc_text">
+<table class="layout">
+ <tr>
+ <td>Linkage<br><br></td>
+ </tr>
+ <tr>
+ <td>Parameter area<br><br></td>
+ </tr>
+ <tr>
+ <td>Dynamic area<br><br></td>
+ </tr>
+ <tr>
+ <td>Locals area<br><br></td>
+ </tr>
+ <tr>
+ <td>Saved registers area<br><br></td>
+ </tr>
+ <tr style="border-style: none hidden none hidden;">
+ <td><br></td>
+ </tr>
+ <tr>
+ <td>Previous Frame<br><br></td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>The <i>linkage</i> area is used by a callee to save special registers prior
+to allocating its own frame. Only three entries are relevant to LLVM. The
+first entry is the previous stack pointer (sp), aka link. This allows probing
+tools like gdb or exception handlers to quickly scan the frames in the stack. A
+function epilog can also use the link to pop the frame from the stack. The
+third entry in the linkage area is used to save the return address from the lr
+register. Finally, as mentioned above, the last entry is used to save the
+previous frame pointer (r31.) The entries in the linkage area are the size of a
+GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
+bit mode.</p>
+</div>
+
+<div class="doc_text">
+<p>32 bit linkage area</p>
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>20</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>64 bit linkage area</p>
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>24</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>32</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>40</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>The <i>parameter area</i> is used to store arguments being passed to a callee
+function. Following the PowerPC ABI, the first few arguments are actually
+passed in registers, with the space in the parameter area unused. However, if
+there are not enough registers or the callee is a thunk or vararg function,
+these register arguments can be spilled into the parameter area. Thus, the
+parameter area must be large enough to store all the parameters for the largest
+call sequence made by the caller. The size must also be mimimally large enough
+to spill registers r3-r10. This allows callees blind to the call signature,
+such as thunks and vararg functions, enough space to cache the argument
+registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
+bit mode.) Also note that since the parameter area is a fixed offset from the
+top of the frame, that a callee can access its spilt arguments using fixed
+offsets from the stack pointer (or base pointer.)</p>
+</div>
+
+<div class="doc_text">
+<p>Combining the information about the linkage, parameter areas and alignment. A
+stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
+mode.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
+alloca then space is added to the stack, the linkage and parameter areas are
+shifted to top of stack, and the new space is available immediately below the
+linkage and parameter areas. The cost of shifting the linkage and parameter
+areas is minor since only the link value needs to be copied. The link value can
+be easily fetched by adding the original frame size to the base pointer. Note
+that allocations in the dynamic space need to observe 16 byte aligment.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>locals area</i> is where the llvm compiler reserves space for local
+variables.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>saved registers area</i> is where the llvm compiler spills callee saved
+registers on entry to the callee.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_prolog">Prolog/Epilog</a>
+</div>
+
+<div class="doc_text">
+<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
+the following exceptions. Callee saved registers are spilled after the frame is
+created. This allows the llvm epilog/prolog support to be common with other
+targets. The base pointer callee saved register r31 is saved in the TOC slot of
+linkage area. This simplifies allocation of space for the base pointer and
+makes it convenient to locate programatically and during debugging.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_dynamic">Dynamic Allocation</a>
+</div>
+
+<div class="doc_text">
+<p></p>
+</div>
+
+<div class="doc_text">
+<p><i>TODO - More to come.</i></p>
+</div>
+
+
<!-- *********************************************************************** -->
<hr>
<address>