Initial support for the CMake build system.

[oota-llvm.git] / docs / CodeGenerator.html
diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html

index bc82b46735b307eca8187098d0795493db4ab4a5..ad0a5b558826419dd75ee5bec89a89b077175718 100644 (file)
--- a/docs/CodeGenerator.html
+++ b/docs/CodeGenerator.html
@@ -26,7 +26,7 @@
        <li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
        <li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
        <li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
-      <li><a href="#mregisterinfo">The <tt>MRegisterInfo</tt> class</a></li>
+      <li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
        <li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
        <li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
        <li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
@@ -84,6 +84,7 @@
    </li>
    <li><a href="#targetimpls">Target-specific Implementation Notes</a>
      <ul>
+    <li><a href="#tailcallopt">Tail call optimization</a></li>
      <li><a href="#x86">The X86 backend</a></li>
      <li><a href="#ppc">The PowerPC backend</a>
        <ul>
@@ -388,14 +389,13 @@ operations.  Among other things, this class indicates:</p>
  
  <!-- ======================================================================= -->
  <div class="doc_subsection">
-  <a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a>
+  <a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
  </div>
  
  <div class="doc_text">
  
-<p>The <tt>MRegisterInfo</tt> class (which will eventually be renamed to
-<tt>TargetRegisterInfo</tt>) is used to describe the register file of the
-target and any interactions between the registers.</p>
+<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register
+file of the target and any interactions between the registers.</p>
  
  <p>Registers in the code generator are represented in the code generator by
  unsigned integers.  Physical registers (those that actually exist in the target
@@ -408,8 +408,8 @@ register (used for assembly output and debugging dumps) and a set of aliases
  (used to indicate whether one register overlaps with another).
  </p>
  
-<p>In addition to the per-register description, the <tt>MRegisterInfo</tt> class
-exposes a set of processor specific register classes (instances of the
+<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
+class exposes a set of processor specific register classes (instances of the
  <tt>TargetRegisterClass</tt> class).  Each register class contains sets of
  registers that have the same properties (for example, they are all 32-bit
  integer registers).  Each SSA virtual register created by the instruction
@@ -621,9 +621,9 @@ copies a virtual register into or out of a physical register when needed.</p>
  
  <div class="doc_code">
  <pre>
-int %test(int %X, int %Y) {
-  %Z = div int %X, %Y
-  ret int %Z
+define i32 @test(i32 %X, i32 %Y) {
+  %Z = udiv i32 %X, %Y
+  ret i32 %Z
  }
  </pre>
  </div>
@@ -719,8 +719,7 @@ comes from.</p>
  corresponds one-to-one with the LLVM function input to the instruction selector.
  In addition to a list of basic blocks, the <tt>MachineFunction</tt> contains a
  a <tt>MachineConstantPool</tt>, a <tt>MachineFrameInfo</tt>, a
-<tt>MachineFunctionInfo</tt>, a <tt>SSARegMap</tt>, and a set of live in and
-live out registers for the function.  See
+<tt>MachineFunctionInfo</tt>, and a <tt>MachineRegisterInfo</tt>.  See
  <tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
  
  </div>
@@ -748,16 +747,14 @@ explains how they work and some of the rationale behind their design.</p>
  <p>
  Instruction Selection is the process of translating LLVM code presented to the
  code generator into target-specific machine instructions.  There are several
-well-known ways to do this in the literature.  In LLVM there are two main forms:
-the SelectionDAG based instruction selector framework and an old-style 'simple'
-instruction selector, which effectively peephole selects each LLVM instruction
-into a series of machine instructions.  We recommend that all targets use the
-SelectionDAG infrastructure.
+well-known ways to do this in the literature.  LLVM uses a SelectionDAG based
+instruction selector.
  </p>
  
  <p>Portions of the DAG instruction selector are generated from the target 
  description (<tt>*.td</tt>) files.  Our goal is for the entire instruction
-selector to be generated from these <tt>.td</tt> files.</p>
+selector to be generated from these <tt>.td</tt> files, though currently
+there are still things that require custom C++ code.</p>
  </div>
  
  <!-- _______________________________________________________________________ -->
@@ -790,10 +787,11 @@ define multiple values.  For example, a combined div/rem operation will define
  both the dividend and the remainder. Many other situations require multiple
  values as well.  Each node also has some number of operands, which are edges 
  to the node defining the used value.  Because nodes may define multiple values,
-edges are represented by instances of the <tt>SDOperand</tt> class, which is 
+edges are represented by instances of the <tt>SDValue</tt> class, which is 
  a <tt>&lt;SDNode, unsigned&gt;</tt> pair, indicating the node and result
  value being used, respectively.  Each value produced by an <tt>SDNode</tt> has
-an associated <tt>MVT::ValueType</tt> indicating what type the value is.</p>
+an associated <tt>MVT</tt> (Machine Value Type) indicating what the type of the
+value is.</p>
  
  <p>SelectionDAGs contain two different kinds of values: those that represent
  data flow and those that represent control flow dependencies.  Data values are
@@ -859,13 +857,28 @@ an illegal DAG into a legal DAG.</p>
  rest of the code generation passes are run.</p>
  
  <p>One great way to visualize what is going on here is to take advantage of a 
-few LLC command line options.  In particular, the <tt>-view-isel-dags</tt>
-option pops up a window with the SelectionDAG input to the Select phase for all
-of the code compiled (if you only get errors printed to the console while using
-this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
-your system</a> to add support for it).  The <tt>-view-sched-dags</tt> option
-views the SelectionDAG output from the Select phase and input to the Scheduler
-phase.</p>
+few LLC command line options.  The following options pop up a window displaying
+the SelectionDAG at specific times (if you only get errors printed to the console
+while using this, you probably
+<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a> to
+add support for it).</p>
+
+<ul>
+<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built, before
+    the first optimization pass.</li>
+<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
+<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
+    optimization pass.</li>
+<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
+<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
+</ul>
+
+<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
+This graph is based on the final SelectionDAG, with nodes that must be
+scheduled together bundled into a single scheduling-unit node, and with
+immediate operands and other nodes that aren't relevent for scheduling
+omitted.
+</p>
  
  </div>
  
@@ -1111,7 +1124,8 @@ primarily because it is a work in progress and is not yet finished:</p>
  <li>There is no great way to support matching complex addressing modes yet.  In
      the future, we will extend pattern fragments to allow them to define
      multiple values (e.g. the four operands of the <a href="#x86_memory">X86
-    addressing mode</a>).  In addition, we'll extend fragments so that a
+    addressing mode</a>, which are currently matched with custom C++ code).
+    In addition, we'll extend fragments so that a
      fragment can match multiple different patterns.</li>
  <li>We don't automatically infer flags like isStore/isLoad yet.</li>
  <li>We don't automatically generate the set of supported registers and
@@ -1290,7 +1304,7 @@ X86 architecture, the registers <tt>EAX</tt>, <tt>AX</tt> and
  marked as <i>aliased</i> in LLVM. Given a particular architecture, you
  can check which registers are aliased by inspecting its
  <tt>RegisterInfo.td</tt> file. Moreover, the method
-<tt>MRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
+<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
  all the physical registers aliased to the register <tt>p_reg</tt>.</p>
  
  <p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
@@ -1308,10 +1322,10 @@ this code can be used:
  bool RegMapping_Fer::compatible_class(MachineFunction &amp;mf,
                                        unsigned v_reg,
                                        unsigned p_reg) {
-  assert(MRegisterInfo::isPhysicalRegister(p_reg) &amp;&amp;
+  assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &amp;&amp;
           "Target register must be physical");
-  const TargetRegisterClass *trc = mf.getSSARegMap()->getRegClass(v_reg);
-  return trc->contains(p_reg);
+  const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
+  return trc-&gt;contains(p_reg);
  }
  </pre>
  </div>
@@ -1333,14 +1347,14 @@ physical registers, different virtual registers never share the same
  number. The smallest virtual register is normally assigned the number
  1024. This may change, so, in order to know which is the first virtual
  register, you should access
-<tt>MRegisterInfo::FirstVirtualRegister</tt>. Any register whose
+<tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
  number is greater than or equal to
-<tt>MRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
+<tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
  register. Whereas physical registers are statically defined in a
  <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
  application developer, that is not the case with virtual registers.
  In order to create new virtual registers, use the method
-<tt>SSARegMap::createVirtualRegister()</tt>. This method will return a
+<tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method will return a
  virtual register with the highest code.
  </p>
  
@@ -1392,7 +1406,7 @@ overwritten by the values of virtual registers while still alive.</p>
  
  <p>There are two ways to map virtual registers to physical registers (or to
  memory slots). The first way, that we will call <i>direct mapping</i>,
-is based on the use of methods of the classes <tt>MRegisterInfo</tt>,
+is based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
  and <tt>MachineOperand</tt>. The second way, that we will call
  <i>indirect mapping</i>, relies on the <tt>VirtRegMap</tt> class in
  order to insert loads and stores sending and getting values to and from
@@ -1406,8 +1420,8 @@ target function being compiled in order to get and store values in
  memory. To assign a physical register to a virtual register present in
  a given operand, use <tt>MachineOperand::setReg(p_reg)</tt>. To insert
  a store instruction, use
-<tt>MRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
-instruction, use <tt>MRegisterInfo::loadRegFromStackSlot</tt>.</p>
+<tt>TargetRegisterInfo::storeRegToStackSlot(...)</tt>, and to insert a load
+instruction, use <tt>TargetRegisterInfo::loadRegFromStackSlot</tt>.</p>
  
  <p>The indirect mapping shields the application developer from the
  complexities of inserting load and store instructions. In order to map
@@ -1465,12 +1479,12 @@ instance, in situations where an instruction such as <tt>%a = ADD %b
  <div class="doc_code">
  <pre>
  %a = MOVE %b
-%a = ADD %a %b
+%a = ADD %a %c
  </pre>
  </div>
  
  <p>Notice that, internally, the second instruction is represented as
-<tt>ADD %a[def/use] %b</tt>. I.e., the register operand <tt>%a</tt> is
+<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is
  both used and defined by the instruction.</p>
  
  </div>
@@ -1527,7 +1541,7 @@ instance, a sequence of instructions such as:</p>
  </div>
  
  <p>Instructions can be folded with the
-<tt>MRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
+<tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
  taken when folding instructions; a folded instruction can be quite
  different from the original instruction. See
  <tt>LiveIntervals::addIntervalsForSpills</tt> in
@@ -1619,7 +1633,51 @@ are specific to the code generator for a particular target.</p>
  
  </div>
  
+<!-- ======================================================================= -->
+<div class="doc_subsection">
+  <a name="tailcallopt">Tail call optimization</a>
+</div>
+
+<div class="doc_text">
+  <p>Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:
+    <ul>
+      <li>Caller and callee have the calling convention <tt>fastcc</tt>.</li>
+      <li>The call is a tail call - in tail position (ret immediately follows call and ret uses value of call or is void).</li>
+      <li>Option <tt>-tailcallopt</tt> is enabled.</li>
+      <li>Platform specific constraints are met.</li>
+    </ul>
+  </p>
  
+  <p>x86/x86-64 constraints:
+    <ul>
+      <li>No variable argument lists are used.</li>
+      <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility = hidden or protected) are supported.</li>
+    </ul>
+  </p>
+  <p>PowerPC constraints:
+    <ul>
+      <li>No variable argument lists are used.</li>
+      <li>No byval parameters are used.</li>
+      <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
+    </ul>
+  </p>
+  <p>Example:</p>
+  <p>Call as <tt>llc -tailcallopt test.ll</tt>.
+    <div class="doc_code">
+      <pre>
+declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
+
+define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
+  %l1 = add i32 %in1, %in2
+  %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
+  ret i32 %tmp
+}</pre>
+    </div>
+  </p>
+  <p>Implications of <tt>-tailcallopt</tt>:</p>
+  <p>To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each <tt>fastcc</tt> call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.</p>
+  <p>On x86 and x86-64 one register is reserved for indirect tail calls (e.g via a function pointer). So there is one less register for integer argument passing. For x86 this means 2 registers (if <tt>inreg</tt> parameter attribute is used) and for x86-64 this means 5 register are used.</p>
+</div>
  <!-- ======================================================================= -->
  <div class="doc_subsection">
    <a name="x86">The X86 backend</a>
@@ -1628,11 +1686,9 @@ are specific to the code generator for a particular target.</p>
  <div class="doc_text">
  
  <p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory.  This
-code generator currently targets a generic P6-like processor.  As such, it
-produces a few P6-and-above instructions (like conditional moves), but it does
-not make use of newer features like MMX or SSE.  In the future, the X86 backend
-will have sub-target support added for specific processor families and 
-implementations.</p>
+code generator is capable of targeting a variety of x86-32 and x86-64
+processors, and includes support for ISA extensions such as MMX and SSE.
+</p>
  
  </div>