"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
+ <meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>The LLVM Target-Independent Code Generator</title>
<link rel="stylesheet" href="llvm.css" type="text/css">
</head>
<li><a href="#selectiondag_future">Future directions for the
SelectionDAG</a></li>
</ul></li>
- <li><a href="#liveinterval_analysis">Live Interval Analysis</a>
+ <li><a href="#liveintervals">Live Intervals</a>
<ul>
<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
+ <li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
</ul></li>
<li><a href="#regalloc">Register Allocation</a>
<ul>
<li><a href="#targetimpls">Target-specific Implementation Notes</a>
<ul>
<li><a href="#x86">The X86 backend</a></li>
- </ul>
- </li>
+ <li><a href="#ppc">The PowerPC backend</a>
+ <ul>
+ <li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
+ <li><a href="#ppc_frame">Frame Layout</a></li>
+ <li><a href="#ppc_prolog">Prolog/Epilog</a></li>
+ <li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
+ </ul></li>
+ </ul></li>
</ol>
<div class="doc_author">
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>,
- <a href="mailto:isanbard@gmail.com">Bill Wendling</a>, and
+ <a href="mailto:isanbard@gmail.com">Bill Wendling</a>,
<a href="mailto:pronesto@gmail.com">Fernando Magno Quintao
- Pereira</a></p>
+ Pereira</a> and
+ <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
</div>
<div class="doc_warning">
<li>the type to use for shift amounts</li>
<li>various high-level characteristics, like whether it is profitable to turn
division by a constant into a multiplication sequence</li>
-</ol>
+</ul>
</div>
<p>
Instruction Selection is the process of translating LLVM code presented to the
code generator into target-specific machine instructions. There are several
-well-known ways to do this in the literature. In LLVM there are two main forms:
-the SelectionDAG based instruction selector framework and an old-style 'simple'
-instruction selector, which effectively peephole selects each LLVM instruction
-into a series of machine instructions. We recommend that all targets use the
-SelectionDAG infrastructure.
+well-known ways to do this in the literature. LLVM uses a SelectionDAG based
+instruction selector.
</p>
<p>Portions of the DAG instruction selector are generated from the target
this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
your system</a> to add support for it). The <tt>-view-sched-dags</tt> option
views the SelectionDAG output from the Select phase and input to the Scheduler
-phase.</p>
+phase. The <tt>-view-sunit-dags</tt> option views the ScheduleDAG, which is
+based on the final SelectionDAG, with nodes that must be scheduled as a unit
+bundled together into a single node, and with immediate operands and other
+nodes that aren't relevent for scheduling omitted.
+</p>
</div>
fragment can match multiple different patterns.</li>
<li>We don't automatically infer flags like isStore/isLoad yet.</li>
<li>We don't automatically generate the set of supported registers and
- operations for the <a href="#"selectiondag_legalize>Legalizer</a> yet.</li>
+ operations for the <a href="#selectiondag_legalize">Legalizer</a> yet.</li>
<li>We don't have a way of tying in custom legalized nodes yet.</li>
</ul>
<ol>
<li>Optional function-at-a-time selection.</li>
<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
-</li>
</ol>
</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
- <a name="liveinterval_analysis">Live Interval Analysis</a>
+ <a name="liveintervals">Live Intervals</a>
</div>
<div class="doc_text">
-<p>Live Interval Analysis identifies the ranges where a variable is <i>live</i>.
-It's used by the <a href="#regalloc">register allocator pass</a> to determine
-if two or more virtual registers which require the same register are live at
-the same point in the program (conflict). When this situation occurs, one
-virtual register must be <i>spilt</i>.</p>
+<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
+They are used by some <a href="#regalloc">register allocator</a> passes to
+determine if two or more virtual registers which require the same physical
+register are live at the same point in the program (i.e., they conflict). When
+this situation occurs, one virtual register must be <i>spilled</i>.</p>
</div>
<div class="doc_text">
-<p>The first step to determining the live intervals of variables is to
+<p>The first step in determining the live intervals of variables is to
calculate the set of registers that are immediately dead after the
-instruction (i.e., the instruction calculates the value, but it is never
-used) and the set of registers that are used by the instruction, but are
-never used after the instruction (i.e., they are killed). Live variable
-information is computed for each <i>virtual</i> and <i>register
-allocatable</i> physical register in the function. LLVM assumes that
-physical registers are only live within a single basic block. This allows
-it to do a single, local analysis to resolve physical register lifetimes in
-each basic block. If a physical register is not register allocatable (e.g.,
+instruction (i.e., the instruction calculates the value, but it is
+never used) and the set of registers that are used by the instruction,
+but are never used after the instruction (i.e., they are killed). Live
+variable information is computed for each <i>virtual</i> register and
+<i>register allocatable</i> physical register in the function. This
+is done in a very efficient manner because it uses SSA to sparsely
+compute lifetime information for virtual registers (which are in SSA
+form) and only has to track physical registers within a block. Before
+register allocation, LLVM can assume that physical registers are only
+live within a single basic block. This allows it to do a single,
+local analysis to resolve physical register lifetimes within each
+basic block. If a physical register is not register allocatable (e.g.,
a stack pointer or condition codes), it is not tracked.</p>
<p>Physical registers may be live in to or out of a function. Live in values
-are typically arguments in register. Live out values are typically return
+are typically arguments in registers. Live out values are typically return
values in registers. Live in values are marked as such, and are given a dummy
-"defining" instruction during live interval analysis. If the last basic block
-of a function is a <tt>return</tt>, then it's marked as using all live-out
+"defining" instruction during live intervals analysis. If the last basic block
+of a function is a <tt>return</tt>, then it's marked as using all live out
values in the function.</p>
<p><tt>PHI</tt> nodes need to be handled specially, because the calculation
of the live variable information from a depth first traversal of the CFG of
-the function won't guarantee that a virtual register is defined before it's
-used. When a <tt>PHI</tt> node is encounted, only the definition is
-handled, because the uses will be handled in other basic blocks.</p>
+the function won't guarantee that a virtual register used by the <tt>PHI</tt>
+node is defined before it's used. When a <tt>PHI</tt> node is encounted, only
+the definition is handled, because the uses will be handled in other basic
+blocks.</p>
<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
assignment at the end of the current basic block and traverse the successor
</div>
-<!-- FIXME:
-
-A. General Overview
-B. Describe Default RA (Linear Scan)
- 1. LiveVariable Analysis
- a. All physical register references presumed dead across BBs
- b. Mark live-in regs as live-in
- c. Calculate LV info in DFS order
- 1) We'll see def of vreg before its uses
- 2) PHI nodes are treated specially
- a) Only handle its def
- b) Uses handled in other BBs
- 3) Handle all uses and defs
- a) Handle implicit preg uses
- (1) "TargetInstrDescriptor" from "TargetInstructionInfo"
- b) Handle explicit preg and vreg uses
- c) Handle implicit preg defs
- (1) "TargetInstrDescriptor" from "TargetInstructionInfo"
- d) Handle explicit preg and vreg defs
- 4) Use of vreg marks it killed (last use in BB)
- a) Updates (expands) live range
- b) Marks vreg as alive in dominating blocks
- 5) Use of preg updates info and used tables
- 6) Def of vreg defaults to "dead"
- a) Expanded later (see B.1.c.4)
- 7) Def of preg updates info, used, RegsKilled, and RegsDead tables.
- 8) Handle virt assigns from PHI nodes at the bottom of the BB
- a) If successor block has PHI nodes
- (1) Simulate an assignment at the end of current BB
- (i.e., mark it as alive in current BB)
- 9) If last block is a "return"
- a) Mark it as using all live-out values
- 10) Kill all pregs available at the end of the BB
- d. Update "RegistersDead" and "RegistersKilled"
- 1) RegistersDead - This map keeps track of all of the registers that
- are dead immediately after an instruction executes, which are not
- dead after the operands are evaluated. In practice, this only
- contains registers which are defined by an instruction, but never
- used.
- 2) RegistersKilled - This map keeps track of all of the registers that
- are dead immediately after an instruction reads its operands. If an
- instruction does not have an entry in this map, it kills no
- registers.
- 2. LiveInterval Analysis
- a. Use LV pass to conservatively compute live intervals for vregs and pregs
- b. For some ordering of the machine instrs [1,N], a live interval is an
- interval [i,j) where 1 <= i <= j < N for which a variable is live
- c. Function has live ins
- 1) Insert dummy instr at beginning
- 2) Pretend dummy instr "defines" values
- d. Number each machine instruction -- depth-first order
- 1) An interval [i, j) == Live interval for reg v if there is no
- instr with num j' > j s.t. v is live at j' and there is no instr
- with number i' < i s.t. v is live at i'
- 2) Intervals can have holes: [1,20), [50,65), [1000,1001)
- e. Handle line-in values
- f. Compute live intervals
- 1) Each live range is assigned a value num within the live interval
- 2) vreg
- a) May be defined multiple times (due to phi and 2-addr elimination)
- b) Live only within defining BB
- (1) Single kill after def in BB
- c) Lives to end of defining BB, potentially across some BBs
- (1) Add range that goes from def to end of defining BB
- (2) Iterate over all BBs that the var is completely live in
- (a) add [instrIndex(begin), InstrIndex(end)+4) to LI
- (3) Vreg is live from start of any killing block to 'use'
- d) If seeing vreg again (because of phi or 2-addr elimination)
- (1) If 2-addr elim, then vreg is 1st op and a def-and-use
- (a) Didn't realize there are 2 values in LI
- (b) Need to take LR that defs vreg and split it into 2 vals
- (1) Delete initial value (from first def to redef)
- (2) Get new value num (#1)
- (3) Value#0 is now defined by the 2-addr instr
- (4) Add new LR which replaces the range for input copy
- (2) Else phi-elimination
- (a) If first redef of vreg, change LR in PHI block to be
- a different Value Number
- (b) Each variable def is only live until the end of the BB
- 3) preg
- a) Cannot be live across BB
- b) Lifetime must end somewhere in its defining BB
- c) Dead at def instr, if not used after def
- (1) Interval: [defSlot(def), defSlot(def) + 1)
- d) Killed by subsequent instr, if not dead on def
- (1) Interval: [defSlot(def), useSlot(kill) + 1)
- e) If neither, then it's live-in to func and never used
- (1) Interval: [start, start + 1)
- e. Join intervals
- f. Compute spill weights
- g. Coalesce vregs
- h. Remove identity moves
- 3. Linear Scan RA
- a.
-
-
- /// VarInfo - This represents the regions where a virtual register is live in
- /// the program. We represent this with three different pieces of
- /// information: the instruction that uniquely defines the value, the set of
- /// blocks the instruction is live into and live out of, and the set of
- /// non-phi instructions that are the last users of the value.
- ///
- /// In the common case where a value is defined and killed in the same block,
- /// DefInst is the defining inst, there is one killing instruction, and
- /// AliveBlocks is empty.
- ///
- /// Otherwise, the value is live out of the block. If the value is live
- /// across any blocks, these blocks are listed in AliveBlocks. Blocks where
- /// the liveness range ends are not included in AliveBlocks, instead being
- /// captured by the Kills set. In these blocks, the value is live into the
- /// block (unless the value is defined and killed in the same block) and lives
- /// until the specified instruction. Note that there cannot ever be a value
- /// whose Kills set contains two instructions from the same basic block.
- ///
- /// PHI nodes complicate things a bit. If a PHI node is the last user of a
- /// value in one of its predecessor blocks, it is not listed in the kills set,
- /// but does include the predecessor block in the AliveBlocks set (unless that
- /// block also defines the value). This leads to the (perfectly sensical)
- /// situation where a value is defined in a block, and the last use is a phi
- /// node in the successor. In this case, DefInst will be the defining
- /// instruction, AliveBlocks is empty (the value is not live across any
- /// blocks) and Kills is empty (phi nodes are not included). This is sensical
- /// because the value must be live to the end of the block, but is not live in
- /// any successor blocks.
-
- -->
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="liveintervals_analysis">Live Intervals Analysis</a>
+</div>
+
+<div class="doc_text">
+
+<p>We now have the information available to perform the live intervals analysis
+and build the live intervals themselves. We start off by numbering the basic
+blocks and machine instructions. We then handle the "live-in" values. These
+are in physical registers, so the physical register is assumed to be killed by
+the end of the basic block. Live intervals for virtual registers are computed
+for some ordering of the machine instructions <tt>[1, N]</tt>. A live interval
+is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j < N</tt>, for which a
+variable is live.</p>
+
+<p><i><b>More to come...</b></i></p>
+
+</div>
<!-- ======================================================================= -->
<div class="doc_subsection">
<div class="doc_text">
-<p>The <i>Register Allocation problem</i> consists in mapping a
-program <i>P<sub>v</sub></i>, that can use an unbounded number of
-virtual registers, to a program <i>P<sub>p</sub></i> that contains a
-finite (possibly small) number of physical registers. Each target
-architecture has a different number of physical registers. If the
-number of physical registers is not enough to accommodate all the
-virtual registers, some of them will have to be mapped into
-memory. These virtuals are called <i>spilled virtuals</i>.</p>
+<p>The <i>Register Allocation problem</i> consists in mapping a program
+<i>P<sub>v</sub></i>, that can use an unbounded number of virtual
+registers, to a program <i>P<sub>p</sub></i> that contains a finite
+(possibly small) number of physical registers. Each target architecture has
+a different number of physical registers. If the number of physical
+registers is not enough to accommodate all the virtual registers, some of
+them will have to be mapped into memory. These virtuals are called
+<i>spilled virtuals</i>.</p>
</div>
<div class="doc_code">
<pre>
-bool RegMapping_Fer::compatible_class(MachineFunction &mf,
+bool RegMapping_Fer::compatible_class(MachineFunction &mf,
unsigned v_reg,
unsigned p_reg) {
- assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
+ assert(MRegisterInfo::isPhysicalRegister(p_reg) &&
"Target register must be physical");
const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
return trc->contains(p_reg);
<tt>MachineOperand::isDef()</tt> informs if that registers is being
defined.</p>
-<p>We will call physical registers present in the LLVM bytecode before
+<p>We will call physical registers present in the LLVM bitcode before
register allocation <i>pre-colored registers</i>. Pre-colored
registers are used in many different situations, for instance, to pass
parameters of functions calls, and to store results of particular
<li><b>i386-unknown-freebsd5.3</b> - FreeBSD 5.3</li>
<li><b>i686-pc-cygwin</b> - Cygwin on Win32</li>
<li><b>i686-pc-mingw32</b> - MingW on Win32</li>
+<li><b>i386-pc-mingw32msvc</b> - MingW crosscompiler on Linux</li>
<li><b>i686-apple-darwin*</b> - Apple Darwin on X86</li>
</ul>
</div>
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="x86_cc">X86 Calling Conventions supported</a>
+</div>
+
+
+<div class="doc_text">
+
+<p>The folowing target-specific calling conventions are known to backend:</p>
+
+<ul>
+<li><b>x86_StdCall</b> - stdcall calling convention seen on Microsoft Windows
+platform (CC ID = 64).</li>
+<li><b>x86_FastCall</b> - fastcall calling convention seen on Microsoft Windows
+platform (CC ID = 65).</li>
+</ul>
+
+</div>
+
<!-- _______________________________________________________________________ -->
<div class="doc_subsubsection">
<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
</div>
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+ <a name="ppc">The PowerPC backend</a>
+</div>
+
+<div class="doc_text">
+<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
+code generation is retargetable to several variations or <i>subtargets</i> of
+the PowerPC ISA; including ppc32, ppc64 and altivec.
+</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_abi">LLVM PowerPC ABI</a>
+</div>
+
+<div class="doc_text">
+<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
+relative (PIC) or static addressing for accessing global values, so no TOC (r2)
+is used. Second, r31 is used as a frame pointer to allow dynamic growth of a
+stack frame. LLVM takes advantage of having no TOC to provide space to save
+the frame pointer in the PowerPC linkage area of the caller frame. Other
+details of PowerPC ABI can be found at <a href=
+"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
+>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The
+64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is
+reserved for system use.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_frame">Frame Layout</a>
+</div>
+
+<div class="doc_text">
+<p>The size of a PowerPC frame is usually fixed for the duration of a
+function’s invocation. Since the frame is fixed size, all references into
+the frame can be accessed via fixed offsets from the stack pointer. The
+exception to this is when dynamic alloca or variable sized arrays are present,
+then a base pointer (r31) is used as a proxy for the stack pointer and stack
+pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is
+not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to
+16 bytes, so that space allocated for altivec vectors will be properly
+aligned.</p>
+<p>An invocation frame is layed out as follows (low memory at top);</p>
+</div>
+
+<div class="doc_text">
+<table class="layout">
+ <tr>
+ <td>Linkage<br><br></td>
+ </tr>
+ <tr>
+ <td>Parameter area<br><br></td>
+ </tr>
+ <tr>
+ <td>Dynamic area<br><br></td>
+ </tr>
+ <tr>
+ <td>Locals area<br><br></td>
+ </tr>
+ <tr>
+ <td>Saved registers area<br><br></td>
+ </tr>
+ <tr style="border-style: none hidden none hidden;">
+ <td><br></td>
+ </tr>
+ <tr>
+ <td>Previous Frame<br><br></td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>The <i>linkage</i> area is used by a callee to save special registers prior
+to allocating its own frame. Only three entries are relevant to LLVM. The
+first entry is the previous stack pointer (sp), aka link. This allows probing
+tools like gdb or exception handlers to quickly scan the frames in the stack. A
+function epilog can also use the link to pop the frame from the stack. The
+third entry in the linkage area is used to save the return address from the lr
+register. Finally, as mentioned above, the last entry is used to save the
+previous frame pointer (r31.) The entries in the linkage area are the size of a
+GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64
+bit mode.</p>
+</div>
+
+<div class="doc_text">
+<p>32 bit linkage area</p>
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>4</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>12</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>20</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>64 bit linkage area</p>
+<table class="layout">
+ <tr>
+ <td>0</td>
+ <td>Saved SP (r1)</td>
+ </tr>
+ <tr>
+ <td>8</td>
+ <td>Saved CR</td>
+ </tr>
+ <tr>
+ <td>16</td>
+ <td>Saved LR</td>
+ </tr>
+ <tr>
+ <td>24</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>32</td>
+ <td>Reserved</td>
+ </tr>
+ <tr>
+ <td>40</td>
+ <td>Saved FP (r31)</td>
+ </tr>
+</table>
+</div>
+
+<div class="doc_text">
+<p>The <i>parameter area</i> is used to store arguments being passed to a callee
+function. Following the PowerPC ABI, the first few arguments are actually
+passed in registers, with the space in the parameter area unused. However, if
+there are not enough registers or the callee is a thunk or vararg function,
+these register arguments can be spilled into the parameter area. Thus, the
+parameter area must be large enough to store all the parameters for the largest
+call sequence made by the caller. The size must also be mimimally large enough
+to spill registers r3-r10. This allows callees blind to the call signature,
+such as thunks and vararg functions, enough space to cache the argument
+registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64
+bit mode.) Also note that since the parameter area is a fixed offset from the
+top of the frame, that a callee can access its spilt arguments using fixed
+offsets from the stack pointer (or base pointer.)</p>
+</div>
+
+<div class="doc_text">
+<p>Combining the information about the linkage, parameter areas and alignment. A
+stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
+mode.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
+alloca then space is added to the stack, the linkage and parameter areas are
+shifted to top of stack, and the new space is available immediately below the
+linkage and parameter areas. The cost of shifting the linkage and parameter
+areas is minor since only the link value needs to be copied. The link value can
+be easily fetched by adding the original frame size to the base pointer. Note
+that allocations in the dynamic space need to observe 16 byte aligment.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>locals area</i> is where the llvm compiler reserves space for local
+variables.</p>
+</div>
+
+<div class="doc_text">
+<p>The <i>saved registers area</i> is where the llvm compiler spills callee saved
+registers on entry to the callee.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_prolog">Prolog/Epilog</a>
+</div>
+
+<div class="doc_text">
+<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
+the following exceptions. Callee saved registers are spilled after the frame is
+created. This allows the llvm epilog/prolog support to be common with other
+targets. The base pointer callee saved register r31 is saved in the TOC slot of
+linkage area. This simplifies allocation of space for the base pointer and
+makes it convenient to locate programatically and during debugging.</p>
+</div>
+
+<!-- _______________________________________________________________________ -->
+<div class="doc_subsubsection">
+ <a name="ppc_dynamic">Dynamic Allocation</a>
+</div>
+
+<div class="doc_text">
+<p></p>
+</div>
+
+<div class="doc_text">
+<p><i>TODO - More to come.</i></p>
+</div>
+
+
<!-- *********************************************************************** -->
<hr>
<address>