docs/CodeGenerator.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <meta http-equiv="content-type" content="text/html; charset=utf-8">
   6   <title>The LLVM Target-Independent Code Generator</title>
   7   <link rel="stylesheet" href="llvm.css" type="text/css">
   8 </head>
   9 <body>
  10
  11 <div class="doc_title">
  12   The LLVM Target-Independent Code Generator
  13 </div>
  14
  15 <ol>
  16   <li><a href="#introduction">Introduction</a>
  17     <ul>
  18       <li><a href="#required">Required components in the code generator</a></li>
  19       <li><a href="#high-level-design">The high-level design of the code
  20           generator</a></li>
  21       <li><a href="#tablegen">Using TableGen for target description</a></li>
  22     </ul>
  23   </li>
  24   <li><a href="#targetdesc">Target description classes</a>
  25     <ul>
  26       <li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
  27       <li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
  28       <li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
  29       <li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
  30       <li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
  31       <li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
  32       <li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
  33       <li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
  34     </ul>
  35   </li>
  36   <li><a href="#codegendesc">The "Machine" Code Generator classes</a>
  37     <ul>
  38     <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
  39     <li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
  40                                      class</a></li>
  41     <li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
  42     </ul>
  43   </li>
  44   <li><a href="#mc">The "MC" Layer</a>
  45     <ul>
  46     <li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
  47     <li><a href="#mccontext">The <tt>MCContext</tt> class</a>
  48     <li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
  49     <li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
  50     <li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
  51     </ul>
  52   </li>
  53   <li><a href="#codegenalgs">Target-independent code generation algorithms</a>
  54     <ul>
  55     <li><a href="#instselect">Instruction Selection</a>
  56       <ul>
  57       <li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
  58       <li><a href="#selectiondag_process">SelectionDAG Code Generation
  59                                           Process</a></li>
  60       <li><a href="#selectiondag_build">Initial SelectionDAG
  61                                         Construction</a></li>
  62       <li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
  63       <li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
  64       <li><a href="#selectiondag_optimize">SelectionDAG Optimization
  65                                            Phase: the DAG Combiner</a></li>
  66       <li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
  67       <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
  68                                         Phase</a></li>
  69       <li><a href="#selectiondag_future">Future directions for the
  70                                          SelectionDAG</a></li>
  71       </ul></li>
  72      <li><a href="#liveintervals">Live Intervals</a>
  73        <ul>
  74        <li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
  75        <li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
  76        </ul></li>
  77     <li><a href="#regalloc">Register Allocation</a>
  78       <ul>
  79       <li><a href="#regAlloc_represent">How registers are represented in
  80                                         LLVM</a></li>
  81       <li><a href="#regAlloc_howTo">Mapping virtual registers to physical
  82                                     registers</a></li>
  83       <li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
  84       <li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
  85       <li><a href="#regAlloc_fold">Instruction folding</a></li>
  86       <li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
  87       </ul></li>
  88     <li><a href="#codeemit">Code Emission</a></li>
  89     </ul>
  90   </li>
  91   <li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
  92
  93   <li><a href="#targetimpls">Target-specific Implementation Notes</a>
  94     <ul>
  95     <li><a href="#tailcallopt">Tail call optimization</a></li>
  96     <li><a href="#sibcallopt">Sibling call optimization</a></li>
  97     <li><a href="#x86">The X86 backend</a></li>
  98     <li><a href="#ppc">The PowerPC backend</a>
  99       <ul>
 100       <li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
 101       <li><a href="#ppc_frame">Frame Layout</a></li>
 102       <li><a href="#ppc_prolog">Prolog/Epilog</a></li>
 103       <li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
 104       </ul></li>
 105     </ul></li>
 106
 107 </ol>
 108
 109 <div class="doc_author">
 110   <p>Written by the LLVM Team.</p>
 111 </div>
 112
 113 <div class="doc_warning">
 114   <p>Warning: This is a work in progress.</p>
 115 </div>
 116
 117 <!-- *********************************************************************** -->
 118 <div class="doc_section">
 119   <a name="introduction">Introduction</a>
 120 </div>
 121 <!-- *********************************************************************** -->
 122
 123 <div class="doc_text">
 124
 125 <p>The LLVM target-independent code generator is a framework that provides a
 126    suite of reusable components for translating the LLVM internal representation
 127    to the machine code for a specified target&mdash;either in assembly form
 128    (suitable for a static compiler) or in binary machine code format (usable for
 129    a JIT compiler). The LLVM target-independent code generator consists of six
 130    main components:</p>
 131
 132 <ol>
 133   <li><a href="#targetdesc">Abstract target description</a> interfaces which
 134       capture important properties about various aspects of the machine,
 135       independently of how they will be used.  These interfaces are defined in
 136       <tt>include/llvm/Target/</tt>.</li>
 137
 138   <li>Classes used to represent the <a href="#codegendesc">code being
 139       generated</a> for a target.  These classes are intended to be abstract
 140       enough to represent the machine code for <i>any</i> target machine.  These
 141       classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
 142       concepts like "constant pool entries" and "jump tables" are explicitly
 143       exposed.</li>
 144
 145   <li>Classes and algorithms used to represent code as the object file level,
 146       the <a href="#mc">MC Layer</a>.  These classes represent assembly level
 147       constructs like labels, sections, and instructions.  At this level,
 148       concepts like "constant pool entries" and "jump tables" don't exist.</li>
 149
 150   <li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
 151       various phases of native code generation (register allocation, scheduling,
 152       stack frame representation, etc).  This code lives
 153       in <tt>lib/CodeGen/</tt>.</li>
 154
 155   <li><a href="#targetimpls">Implementations of the abstract target description
 156       interfaces</a> for particular targets.  These machine descriptions make
 157       use of the components provided by LLVM, and can optionally provide custom
 158       target-specific passes, to build complete code generators for a specific
 159       target.  Target descriptions live in <tt>lib/Target/</tt>.</li>
 160
 161   <li><a href="#jit">The target-independent JIT components</a>.  The LLVM JIT is
 162       completely target independent (it uses the <tt>TargetJITInfo</tt>
 163       structure to interface for target-specific issues.  The code for the
 164       target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
 165 </ol>
 166
 167 <p>Depending on which part of the code generator you are interested in working
 168    on, different pieces of this will be useful to you.  In any case, you should
 169    be familiar with the <a href="#targetdesc">target description</a>
 170    and <a href="#codegendesc">machine code representation</a> classes.  If you
 171    want to add a backend for a new target, you will need
 172    to <a href="#targetimpls">implement the target description</a> classes for
 173    your new target and understand the <a href="LangRef.html">LLVM code
 174    representation</a>.  If you are interested in implementing a
 175    new <a href="#codegenalgs">code generation algorithm</a>, it should only
 176    depend on the target-description and machine code representation classes,
 177    ensuring that it is portable.</p>
 178
 179 </div>
 180
 181 <!-- ======================================================================= -->
 182 <div class="doc_subsection">
 183  <a name="required">Required components in the code generator</a>
 184 </div>
 185
 186 <div class="doc_text">
 187
 188 <p>The two pieces of the LLVM code generator are the high-level interface to the
 189    code generator and the set of reusable components that can be used to build
 190    target-specific backends.  The two most important interfaces
 191    (<a href="#targetmachine"><tt>TargetMachine</tt></a>
 192    and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
 193    required to be defined for a backend to fit into the LLVM system, but the
 194    others must be defined if the reusable code generator components are going to
 195    be used.</p>
 196
 197 <p>This design has two important implications.  The first is that LLVM can
 198    support completely non-traditional code generation targets.  For example, the
 199    C backend does not require register allocation, instruction selection, or any
 200    of the other standard components provided by the system.  As such, it only
 201    implements these two interfaces, and does its own thing.  Another example of
 202    a code generator like this is a (purely hypothetical) backend that converts
 203    LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
 204
 205 <p>This design also implies that it is possible to design and implement
 206    radically different code generators in the LLVM system that do not make use
 207    of any of the built-in components.  Doing so is not recommended at all, but
 208    could be required for radically different targets that do not fit into the
 209    LLVM machine description model: FPGAs for example.</p>
 210
 211 </div>
 212
 213 <!-- ======================================================================= -->
 214 <div class="doc_subsection">
 215  <a name="high-level-design">The high-level design of the code generator</a>
 216 </div>
 217
 218 <div class="doc_text">
 219
 220 <p>The LLVM target-independent code generator is designed to support efficient
 221    and quality code generation for standard register-based microprocessors.
 222    Code generation in this model is divided into the following stages:</p>
 223
 224 <ol>
 225   <li><b><a href="#instselect">Instruction Selection</a></b> &mdash; This phase
 226       determines an efficient way to express the input LLVM code in the target
 227       instruction set.  This stage produces the initial code for the program in
 228       the target instruction set, then makes use of virtual registers in SSA
 229       form and physical registers that represent any required register
 230       assignments due to target constraints or calling conventions.  This step
 231       turns the LLVM code into a DAG of target instructions.</li>
 232
 233   <li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> &mdash;
 234       This phase takes the DAG of target instructions produced by the
 235       instruction selection phase, determines an ordering of the instructions,
 236       then emits the instructions
 237       as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
 238       Note that we describe this in the <a href="#instselect">instruction
 239       selection section</a> because it operates on
 240       a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
 241
 242   <li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> &mdash;
 243       This optional stage consists of a series of machine-code optimizations
 244       that operate on the SSA-form produced by the instruction selector.
 245       Optimizations like modulo-scheduling or peephole optimization work
 246       here.</li>
 247
 248   <li><b><a href="#regalloc">Register Allocation</a></b> &mdash; The target code
 249       is transformed from an infinite virtual register file in SSA form to the
 250       concrete register file used by the target.  This phase introduces spill
 251       code and eliminates all virtual register references from the program.</li>
 252
 253   <li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> &mdash; Once
 254       the machine code has been generated for the function and the amount of
 255       stack space required is known (used for LLVM alloca's and spill slots),
 256       the prolog and epilog code for the function can be inserted and "abstract
 257       stack location references" can be eliminated.  This stage is responsible
 258       for implementing optimizations like frame-pointer elimination and stack
 259       packing.</li>
 260
 261   <li><b><a href="#latemco">Late Machine Code Optimizations</a></b> &mdash;
 262       Optimizations that operate on "final" machine code can go here, such as
 263       spill code scheduling and peephole optimizations.</li>
 264
 265   <li><b><a href="#codeemit">Code Emission</a></b> &mdash; The final stage
 266       actually puts out the code for the current function, either in the target
 267       assembler format or in machine code.</li>
 268 </ol>
 269
 270 <p>The code generator is based on the assumption that the instruction selector
 271    will use an optimal pattern matching selector to create high-quality
 272    sequences of native instructions.  Alternative code generator designs based
 273    on pattern expansion and aggressive iterative peephole optimization are much
 274    slower.  This design permits efficient compilation (important for JIT
 275    environments) and aggressive optimization (used when generating code offline)
 276    by allowing components of varying levels of sophistication to be used for any
 277    step of compilation.</p>
 278
 279 <p>In addition to these stages, target implementations can insert arbitrary
 280    target-specific passes into the flow.  For example, the X86 target uses a
 281    special pass to handle the 80x87 floating point stack architecture.  Other
 282    targets with unusual requirements can be supported with custom passes as
 283    needed.</p>
 284
 285 </div>
 286
 287 <!-- ======================================================================= -->
 288 <div class="doc_subsection">
 289  <a name="tablegen">Using TableGen for target description</a>
 290 </div>
 291
 292 <div class="doc_text">
 293
 294 <p>The target description classes require a detailed description of the target
 295    architecture.  These target descriptions often have a large amount of common
 296    information (e.g., an <tt>add</tt> instruction is almost identical to a
 297    <tt>sub</tt> instruction).  In order to allow the maximum amount of
 298    commonality to be factored out, the LLVM code generator uses
 299    the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
 300    chunks of the target machine, which allows the use of domain-specific and
 301    target-specific abstractions to reduce the amount of repetition.</p>
 302
 303 <p>As LLVM continues to be developed and refined, we plan to move more and more
 304    of the target description to the <tt>.td</tt> form.  Doing so gives us a
 305    number of advantages.  The most important is that it makes it easier to port
 306    LLVM because it reduces the amount of C++ code that has to be written, and
 307    the surface area of the code generator that needs to be understood before
 308    someone can get something working.  Second, it makes it easier to change
 309    things. In particular, if tables and other things are all emitted
 310    by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
 311    update all of the targets to a new interface.</p>
 312
 313 </div>
 314
 315 <!-- *********************************************************************** -->
 316 <div class="doc_section">
 317   <a name="targetdesc">Target description classes</a>
 318 </div>
 319 <!-- *********************************************************************** -->
 320
 321 <div class="doc_text">
 322
 323 <p>The LLVM target description classes (located in the
 324    <tt>include/llvm/Target</tt> directory) provide an abstract description of
 325    the target machine independent of any particular client.  These classes are
 326    designed to capture the <i>abstract</i> properties of the target (such as the
 327    instructions and registers it has), and do not incorporate any particular
 328    pieces of code generation algorithms.</p>
 329
 330 <p>All of the target description classes (except the
 331    <tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
 332    subclassed by the concrete target implementation, and have virtual methods
 333    implemented.  To get to these implementations, the
 334    <tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
 335    that should be implemented by the target.</p>
 336
 337 </div>
 338
 339 <!-- ======================================================================= -->
 340 <div class="doc_subsection">
 341   <a name="targetmachine">The <tt>TargetMachine</tt> class</a>
 342 </div>
 343
 344 <div class="doc_text">
 345
 346 <p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
 347    access the target-specific implementations of the various target description
 348    classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
 349    <tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.).  This class is
 350    designed to be specialized by a concrete target implementation
 351    (e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
 352    methods.  The only required target description class is
 353    the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
 354    generator components are to be used, the other interfaces should be
 355    implemented as well.</p>
 356
 357 </div>
 358
 359 <!-- ======================================================================= -->
 360 <div class="doc_subsection">
 361   <a name="targetdata">The <tt>TargetData</tt> class</a>
 362 </div>
 363
 364 <div class="doc_text">
 365
 366 <p>The <tt>TargetData</tt> class is the only required target description class,
 367    and it is the only class that is not extensible (you cannot derived a new
 368    class from it).  <tt>TargetData</tt> specifies information about how the
 369    target lays out memory for structures, the alignment requirements for various
 370    data types, the size of pointers in the target, and whether the target is
 371    little-endian or big-endian.</p>
 372
 373 </div>
 374
 375 <!-- ======================================================================= -->
 376 <div class="doc_subsection">
 377   <a name="targetlowering">The <tt>TargetLowering</tt> class</a>
 378 </div>
 379
 380 <div class="doc_text">
 381
 382 <p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
 383    selectors primarily to describe how LLVM code should be lowered to
 384    SelectionDAG operations.  Among other things, this class indicates:</p>
 385
 386 <ul>
 387   <li>an initial register class to use for various <tt>ValueType</tt>s,</li>
 388
 389   <li>which operations are natively supported by the target machine,</li>
 390
 391   <li>the return type of <tt>setcc</tt> operations,</li>
 392
 393   <li>the type to use for shift amounts, and</li>
 394
 395   <li>various high-level characteristics, like whether it is profitable to turn
 396       division by a constant into a multiplication sequence</li>
 397 </ul>
 398
 399 </div>
 400
 401 <!-- ======================================================================= -->
 402 <div class="doc_subsection">
 403   <a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
 404 </div>
 405
 406 <div class="doc_text">
 407
 408 <p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
 409    of the target and any interactions between the registers.</p>
 410
 411 <p>Registers in the code generator are represented in the code generator by
 412    unsigned integers.  Physical registers (those that actually exist in the
 413    target description) are unique small numbers, and virtual registers are
 414    generally large.  Note that register #0 is reserved as a flag value.</p>
 415
 416 <p>Each register in the processor description has an associated
 417    <tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
 418    register (used for assembly output and debugging dumps) and a set of aliases
 419    (used to indicate whether one register overlaps with another).</p>
 420
 421 <p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
 422    class exposes a set of processor specific register classes (instances of the
 423    <tt>TargetRegisterClass</tt> class).  Each register class contains sets of
 424    registers that have the same properties (for example, they are all 32-bit
 425    integer registers).  Each SSA virtual register created by the instruction
 426    selector has an associated register class.  When the register allocator runs,
 427    it replaces virtual registers with a physical register in the set.</p>
 428
 429 <p>The target-specific implementations of these classes is auto-generated from
 430    a <a href="TableGenFundamentals.html">TableGen</a> description of the
 431    register file.</p>
 432
 433 </div>
 434
 435 <!-- ======================================================================= -->
 436 <div class="doc_subsection">
 437   <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
 438 </div>
 439
 440 <div class="doc_text">
 441
 442 <p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
 443    instructions supported by the target. It is essentially an array of
 444    <tt>TargetInstrDescriptor</tt> objects, each of which describes one
 445    instruction the target supports. Descriptors define things like the mnemonic
 446    for the opcode, the number of operands, the list of implicit register uses
 447    and defs, whether the instruction has certain target-independent properties
 448    (accesses memory, is commutable, etc), and holds any target-specific
 449    flags.</p>
 450
 451 </div>
 452
 453 <!-- ======================================================================= -->
 454 <div class="doc_subsection">
 455   <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
 456 </div>
 457
 458 <div class="doc_text">
 459
 460 <p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
 461    stack frame layout of the target. It holds the direction of stack growth, the
 462    known stack alignment on entry to each function, and the offset to the local
 463    area.  The offset to the local area is the offset from the stack pointer on
 464    function entry to the first location where function data (local variables,
 465    spill locations) can be stored.</p>
 466
 467 </div>
 468
 469 <!-- ======================================================================= -->
 470 <div class="doc_subsection">
 471   <a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
 472 </div>
 473
 474 <div class="doc_text">
 475
 476 <p>The <tt>TargetSubtarget</tt> class is used to provide information about the
 477    specific chip set being targeted.  A sub-target informs code generation of
 478    which instructions are supported, instruction latencies and instruction
 479    execution itinerary; i.e., which processing units are used, in what order,
 480    and for how long.</p>
 481
 482 </div>
 483
 484
 485 <!-- ======================================================================= -->
 486 <div class="doc_subsection">
 487   <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
 488 </div>
 489
 490 <div class="doc_text">
 491
 492 <p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
 493    Just-In-Time code generator to perform target-specific activities, such as
 494    emitting stubs.  If a <tt>TargetMachine</tt> supports JIT code generation, it
 495    should provide one of these objects through the <tt>getJITInfo</tt>
 496    method.</p>
 497
 498 </div>
 499
 500 <!-- *********************************************************************** -->
 501 <div class="doc_section">
 502   <a name="codegendesc">Machine code description classes</a>
 503 </div>
 504 <!-- *********************************************************************** -->
 505
 506 <div class="doc_text">
 507
 508 <p>At the high-level, LLVM code is translated to a machine specific
 509    representation formed out of
 510    <a href="#machinefunction"><tt>MachineFunction</tt></a>,
 511    <a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
 512    and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
 513    in <tt>include/llvm/CodeGen</tt>).  This representation is completely target
 514    agnostic, representing instructions in their most abstract form: an opcode
 515    and a series of operands.  This representation is designed to support both an
 516    SSA representation for machine code, as well as a register allocated, non-SSA
 517    form.</p>
 518
 519 </div>
 520
 521 <!-- ======================================================================= -->
 522 <div class="doc_subsection">
 523   <a name="machineinstr">The <tt>MachineInstr</tt> class</a>
 524 </div>
 525
 526 <div class="doc_text">
 527
 528 <p>Target machine instructions are represented as instances of the
 529    <tt>MachineInstr</tt> class.  This class is an extremely abstract way of
 530    representing machine instructions.  In particular, it only keeps track of an
 531    opcode number and a set of operands.</p>
 532
 533 <p>The opcode number is a simple unsigned integer that only has meaning to a
 534    specific backend.  All of the instructions for a target should be defined in
 535    the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
 536    auto-generated from this description.  The <tt>MachineInstr</tt> class does
 537    not have any information about how to interpret the instruction (i.e., what
 538    the semantics of the instruction are); for that you must refer to the
 539    <tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
 540
 541 <p>The operands of a machine instruction can be of several different types: a
 542    register reference, a constant integer, a basic block reference, etc.  In
 543    addition, a machine operand should be marked as a def or a use of the value
 544    (though only registers are allowed to be defs).</p>
 545
 546 <p>By convention, the LLVM code generator orders instruction operands so that
 547    all register definitions come before the register uses, even on architectures
 548    that are normally printed in other orders.  For example, the SPARC add
 549    instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
 550    and stores the result into the "%i3" register.  In the LLVM code generator,
 551    the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
 552    destination first.</p>
 553
 554 <p>Keeping destination (definition) operands at the beginning of the operand
 555    list has several advantages.  In particular, the debugging printer will print
 556    the instruction like this:</p>
 557
 558 <div class="doc_code">
 559 <pre>
 560 %r3 = add %i1, %i2
 561 </pre>
 562 </div>
 563
 564 <p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
 565    instructions</a> whose only def is the first operand.</p>
 566
 567 </div>
 568
 569 <!-- _______________________________________________________________________ -->
 570 <div class="doc_subsubsection">
 571   <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
 572 </div>
 573
 574 <div class="doc_text">
 575
 576 <p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
 577    located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file.  The
 578    <tt>BuildMI</tt> functions make it easy to build arbitrary machine
 579    instructions.  Usage of the <tt>BuildMI</tt> functions look like this:</p>
 580
 581 <div class="doc_code">
 582 <pre>
 583 // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
 584 // instruction.  The '1' specifies how many operands will be added.
 585 MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
 586
 587 // Create the same instr, but insert it at the end of a basic block.
 588 MachineBasicBlock &amp;MBB = ...
 589 BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
 590
 591 // Create the same instr, but insert it before a specified iterator point.
 592 MachineBasicBlock::iterator MBBI = ...
 593 BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
 594
 595 // Create a 'cmp Reg, 0' instruction, no destination reg.
 596 MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
 597 // Create an 'sahf' instruction which takes no operands and stores nothing.
 598 MI = BuildMI(X86::SAHF, 0);
 599
 600 // Create a self looping branch instruction.
 601 BuildMI(MBB, X86::JNE, 1).addMBB(&amp;MBB);
 602 </pre>
 603 </div>
 604
 605 <p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
 606    have to specify the number of operands that the machine instruction will
 607    take.  This allows for efficient memory allocation.  You also need to specify
 608    if operands default to be uses of values, not definitions.  If you need to
 609    add a definition operand (other than the optional destination register), you
 610    must explicitly mark it as such:</p>
 611
 612 <div class="doc_code">
 613 <pre>
 614 MI.addReg(Reg, RegState::Define);
 615 </pre>
 616 </div>
 617
 618 </div>
 619
 620 <!-- _______________________________________________________________________ -->
 621 <div class="doc_subsubsection">
 622   <a name="fixedregs">Fixed (preassigned) registers</a>
 623 </div>
 624
 625 <div class="doc_text">
 626
 627 <p>One important issue that the code generator needs to be aware of is the
 628    presence of fixed registers.  In particular, there are often places in the
 629    instruction stream where the register allocator <em>must</em> arrange for a
 630    particular value to be in a particular register.  This can occur due to
 631    limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
 632    with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
 633    calling conventions.  In any case, the instruction selector should emit code
 634    that copies a virtual register into or out of a physical register when
 635    needed.</p>
 636
 637 <p>For example, consider this simple LLVM example:</p>
 638
 639 <div class="doc_code">
 640 <pre>
 641 define i32 @test(i32 %X, i32 %Y) {
 642   %Z = udiv i32 %X, %Y
 643   ret i32 %Z
 644 }
 645 </pre>
 646 </div>
 647
 648 <p>The X86 instruction selector produces this machine code for the <tt>div</tt>
 649    and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
 650    get this):</p>
 651
 652 <div class="doc_code">
 653 <pre>
 654 ;; Start of div
 655 %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
 656 %reg1027 = sar %reg1024, 31
 657 %EDX = mov %reg1027           ;; Sign extend X into EDX
 658 idiv %reg1025                 ;; Divide by Y (in reg1025)
 659 %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
 660
 661 ;; Start of ret
 662 %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
 663 ret
 664 </pre>
 665 </div>
 666
 667 <p>By the end of code generation, the register allocator has coalesced the
 668    registers and deleted the resultant identity moves producing the following
 669    code:</p>
 670
 671 <div class="doc_code">
 672 <pre>
 673 ;; X is in EAX, Y is in ECX
 674 mov %EAX, %EDX
 675 sar %EDX, 31
 676 idiv %ECX
 677 ret
 678 </pre>
 679 </div>
 680
 681 <p>This approach is extremely general (if it can handle the X86 architecture, it
 682    can handle anything!) and allows all of the target specific knowledge about
 683    the instruction stream to be isolated in the instruction selector.  Note that
 684    physical registers should have a short lifetime for good code generation, and
 685    all physical registers are assumed dead on entry to and exit from basic
 686    blocks (before register allocation).  Thus, if you need a value to be live
 687    across basic block boundaries, it <em>must</em> live in a virtual
 688    register.</p>
 689
 690 </div>
 691
 692 <!-- _______________________________________________________________________ -->
 693 <div class="doc_subsubsection">
 694   <a name="ssa">Machine code in SSA form</a>
 695 </div>
 696
 697 <div class="doc_text">
 698
 699 <p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
 700    maintained in SSA-form until register allocation happens.  For the most part,
 701    this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
 702    become machine code PHI nodes, and virtual registers are only allowed to have
 703    a single definition.</p>
 704
 705 <p>After register allocation, machine code is no longer in SSA-form because
 706    there are no virtual registers left in the code.</p>
 707
 708 </div>
 709
 710 <!-- ======================================================================= -->
 711 <div class="doc_subsection">
 712   <a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
 713 </div>
 714
 715 <div class="doc_text">
 716
 717 <p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
 718    (<tt><a href="#machineinstr">MachineInstr</a></tt> instances).  It roughly
 719    corresponds to the LLVM code input to the instruction selector, but there can
 720    be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
 721    machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
 722    "<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
 723    comes from.</p>
 724
 725 </div>
 726
 727 <!-- ======================================================================= -->
 728 <div class="doc_subsection">
 729   <a name="machinefunction">The <tt>MachineFunction</tt> class</a>
 730 </div>
 731
 732 <div class="doc_text">
 733
 734 <p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
 735    (<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances).  It
 736    corresponds one-to-one with the LLVM function input to the instruction
 737    selector.  In addition to a list of basic blocks,
 738    the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
 739    a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
 740    <tt>MachineRegisterInfo</tt>.  See
 741    <tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
 742
 743 </div>
 744
 745
 746 <!-- *********************************************************************** -->
 747 <div class="doc_section">
 748   <a name="mc">The "MC" Layer</a>
 749 </div>
 750 <!-- *********************************************************************** -->
 751
 752 <div class="doc_text">
 753
 754 <p>
 755 The MC Layer is used to represent and process code at the raw machine code
 756 level, devoid of "high level" information like "constant pools", "jump tables",
 757 "global variables" or anything like that.  At this level, LLVM handles things
 758 like label names, machine instructions, and sections in the object file.  The
 759 code in this layer is used for a number of important purposes: the tail end of
 760 the code generator uses it to write a .s or .o file, and it is also used by the
 761 llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
 762 </p>
 763
 764 <p>
 765 This section describes some of the important classes.  There are also a number
 766 of important subsystems that interact at this layer, they are described later
 767 in this manual.
 768 </p>
 769
 770 </div>
 771
 772
 773 <!-- ======================================================================= -->
 774 <div class="doc_subsection">
 775   <a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
 776 </div>
 777
 778 <div class="doc_text">
 779
 780 <p>
 781 MCStreamer is best thought of as an assembler API.  It is an abstract API which
 782 is <em>implemented</em> in different ways (e.g. to output a .s file, output an
 783 ELF .o file, etc) but whose API correspond directly to what you see in a .s
 784 file.  MCStreamer has one method per directive, such as EmitLabel,
 785 EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
 786 directly correspond to assembly level directives.  It also has an
 787 EmitInstruction method, which is used to output an MCInst to the streamer.
 788 </p>
 789
 790 <p>
 791 This API is most important for two clients: the llvm-mc stand-alone assembler is
 792 effectively a parser that parses a line, then invokes a method on MCStreamer. In
 793 the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
 794 generator lowers higher level LLVM IR and Machine* constructs down to the MC
 795 layer, emitting directives through MCStreamer.</p>
 796
 797 <p>
 798 On the implementation side of MCStreamer, there are two major implementations:
 799 one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
 800 file (MCObjectStreamer).  MCAsmStreamer is a straight-forward implementation
 801 that prints out a directive for each method (e.g. EmitValue -&gt; .byte), but
 802 MCObjectStreamer implements a full assembler.
 803 </p>
 804
 805 </div>
 806
 807 <!-- ======================================================================= -->
 808 <div class="doc_subsection">
 809   <a name="mccontext">The <tt>MCContext</tt> class</a>
 810 </div>
 811
 812 <div class="doc_text">
 813
 814 <p>
 815 The MCContext class is the owner of a variety of uniqued data structures at the
 816 MC layer, including symbols, sections, etc.  As such, this is the class that you
 817 interact with to create symbols and sections.  This class can not be subclassed.
 818 </p>
 819
 820 </div>
 821
 822 <!-- ======================================================================= -->
 823 <div class="doc_subsection">
 824   <a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
 825 </div>
 826
 827 <div class="doc_text">
 828
 829 <p>
 830 The MCSymbol class represents a symbol (aka label) in the assembly file.  There
 831 are two interesting kinds of symbols: assembler temporary symbols, and normal
 832 symbols.  Assembler temporary symbols are used and processed by the assembler
 833 but are discarded when the object file is produced.  The distinction is usually
 834 represented by adding a prefix to the label, for example "L" labels are
 835 assembler temporary labels in MachO.
 836 </p>
 837
 838 <p>MCSymbols are created by MCContext and uniqued there.  This means that
 839 MCSymbols can be compared for pointer equivalence to find out if they are the
 840 same symbol.  Note that pointer inequality does not guarantee the labels will
 841 end up at different addresses though.  It's perfectly legal to output something
 842 like this to the .s file:<p>
 843
 844 <pre>
 845   foo:
 846   bar:
 847     .byte 4
 848 </pre>
 849
 850 <p>In this case, both the foo and bar symbols will have the same address.</p>
 851
 852 </div>
 853
 854 <!-- ======================================================================= -->
 855 <div class="doc_subsection">
 856   <a name="mcsection">The <tt>MCSection</tt> class</a>
 857 </div>
 858
 859 <div class="doc_text">
 860
 861 <p>
 862 The MCSection class represents an object-file specific section. It is subclassed
 863 by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
 864 <tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
 865 by MCContext.  The MCStreamer has a notion of the current section, which can be
 866 changed with the SwitchToSection method (which corresponds to a ".section"
 867 directive in a .s file).
 868 </p>
 869
 870 </div>
 871
 872 <!-- ======================================================================= -->
 873 <div class="doc_subsection">
 874   <a name="mcinst">The <tt>MCInst</tt> class</a></li>
 875 </div>
 876
 877 <div class="doc_text">
 878
 879 <p>
 880 The MCInst class is a target-independent representation of an instruction.  It
 881 is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
 882 that holds a target-specific opcode and a vector of MCOperands.  MCOperand, in
 883 turn, is a simple discriminated union of three cases: 1) a simple immediate,
 884 2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
 885 MCExpr.
 886 </p>
 887
 888 <p>MCInst is the common currency used to represent machine instructions at the
 889 MC layer.  It is the type used by the instruction encoder, the instruction
 890 printer, and the type generated by the assembly parser and disassembler.
 891 </p>
 892
 893 </div>
 894
 895
 896 <!-- *********************************************************************** -->
 897 <div class="doc_section">
 898   <a name="codegenalgs">Target-independent code generation algorithms</a>
 899 </div>
 900 <!-- *********************************************************************** -->
 901
 902 <div class="doc_text">
 903
 904 <p>This section documents the phases described in the
 905    <a href="#high-level-design">high-level design of the code generator</a>.
 906    It explains how they work and some of the rationale behind their design.</p>
 907
 908 </div>
 909
 910 <!-- ======================================================================= -->
 911 <div class="doc_subsection">
 912   <a name="instselect">Instruction Selection</a>
 913 </div>
 914
 915 <div class="doc_text">
 916
 917 <p>Instruction Selection is the process of translating LLVM code presented to
 918    the code generator into target-specific machine instructions.  There are
 919    several well-known ways to do this in the literature.  LLVM uses a
 920    SelectionDAG based instruction selector.</p>
 921
 922 <p>Portions of the DAG instruction selector are generated from the target
 923    description (<tt>*.td</tt>) files.  Our goal is for the entire instruction
 924    selector to be generated from these <tt>.td</tt> files, though currently
 925    there are still things that require custom C++ code.</p>
 926
 927 </div>
 928
 929 <!-- _______________________________________________________________________ -->
 930 <div class="doc_subsubsection">
 931   <a name="selectiondag_intro">Introduction to SelectionDAGs</a>
 932 </div>
 933
 934 <div class="doc_text">
 935
 936 <p>The SelectionDAG provides an abstraction for code representation in a way
 937    that is amenable to instruction selection using automatic techniques
 938    (e.g. dynamic-programming based optimal pattern matching selectors). It is
 939    also well-suited to other phases of code generation; in particular,
 940    instruction scheduling (SelectionDAG's are very close to scheduling DAGs
 941    post-selection).  Additionally, the SelectionDAG provides a host
 942    representation where a large variety of very-low-level (but
 943    target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
 944    performed; ones which require extensive information about the instructions
 945    efficiently supported by the target.</p>
 946
 947 <p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
 948    <tt>SDNode</tt> class.  The primary payload of the <tt>SDNode</tt> is its
 949    operation code (Opcode) that indicates what operation the node performs and
 950    the operands to the operation.  The various operation node types are
 951    described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
 952    file.</p>
 953
 954 <p>Although most operations define a single value, each node in the graph may
 955    define multiple values.  For example, a combined div/rem operation will
 956    define both the dividend and the remainder. Many other situations require
 957    multiple values as well.  Each node also has some number of operands, which
 958    are edges to the node defining the used value.  Because nodes may define
 959    multiple values, edges are represented by instances of the <tt>SDValue</tt>
 960    class, which is a <tt>&lt;SDNode, unsigned&gt;</tt> pair, indicating the node
 961    and result value being used, respectively.  Each value produced by
 962    an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
 963    indicating what the type of the value is.</p>
 964
 965 <p>SelectionDAGs contain two different kinds of values: those that represent
 966    data flow and those that represent control flow dependencies.  Data values
 967    are simple edges with an integer or floating point value type.  Control edges
 968    are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
 969    These edges provide an ordering between nodes that have side effects (such as
 970    loads, stores, calls, returns, etc).  All nodes that have side effects should
 971    take a token chain as input and produce a new one as output.  By convention,
 972    token chain inputs are always operand #0, and chain results are always the
 973    last value produced by an operation.</p>
 974
 975 <p>A SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
 976    always a marker node with an Opcode of <tt>ISD::EntryToken</tt>.  The Root
 977    node is the final side-effecting node in the token chain. For example, in a
 978    single basic block function it would be the return node.</p>
 979
 980 <p>One important concept for SelectionDAGs is the notion of a "legal" vs.
 981    "illegal" DAG.  A legal DAG for a target is one that only uses supported
 982    operations and supported types.  On a 32-bit PowerPC, for example, a DAG with
 983    a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
 984    uses a SREM or UREM operation.  The
 985    <a href="#selectinodag_legalize_types">legalize types</a> and
 986    <a href="#selectiondag_legalize">legalize operations</a> phases are
 987    responsible for turning an illegal DAG into a legal DAG.</p>
 988
 989 </div>
 990
 991 <!-- _______________________________________________________________________ -->
 992 <div class="doc_subsubsection">
 993   <a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
 994 </div>
 995
 996 <div class="doc_text">
 997
 998 <p>SelectionDAG-based instruction selection consists of the following steps:</p>
 999
1000 <ol>
1001   <li><a href="#selectiondag_build">Build initial DAG</a> &mdash; This stage
1002       performs a simple translation from the input LLVM code to an illegal
1003       SelectionDAG.</li>
1004
1005   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; This
1006       stage performs simple optimizations on the SelectionDAG to simplify it,
1007       and recognize meta instructions (like rotates
1008       and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
1009       operations.  This makes the resultant code more efficient and
1010       the <a href="#selectiondag_select">select instructions from DAG</a> phase
1011       (below) simpler.</li>
1012
1013   <li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
1014       &mdash; This stage transforms SelectionDAG nodes to eliminate any types
1015       that are unsupported on the target.</li>
1016
1017   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; The
1018       SelectionDAG optimizer is run to clean up redundancies exposed by type
1019       legalization.</li>
1020
1021   <li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> &mdash;
1022       This stage transforms SelectionDAG nodes to eliminate any types that are
1023       unsupported on the target.</li>
1024
1025   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; The
1026       SelectionDAG optimizer is run to eliminate inefficiencies introduced by
1027       operation legalization.</li>
1028
1029   <li><a href="#selectiondag_select">Select instructions from DAG</a> &mdash;
1030       Finally, the target instruction selector matches the DAG operations to
1031       target instructions.  This process translates the target-independent input
1032       DAG into another DAG of target instructions.</li>
1033
1034   <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
1035       &mdash; The last phase assigns a linear order to the instructions in the
1036       target-instruction DAG and emits them into the MachineFunction being
1037       compiled.  This step uses traditional prepass scheduling techniques.</li>
1038 </ol>
1039
1040 <p>After all of these steps are complete, the SelectionDAG is destroyed and the
1041    rest of the code generation passes are run.</p>
1042
1043 <p>One great way to visualize what is going on here is to take advantage of a
1044    few LLC command line options.  The following options pop up a window
1045    displaying the SelectionDAG at specific times (if you only get errors printed
1046    to the console while using this, you probably
1047    <a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
1048    to add support for it).</p>
1049
1050 <ul>
1051   <li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
1052       before the first optimization pass.</li>
1053
1054   <li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
1055
1056   <li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
1057       optimization pass.</li>
1058
1059   <li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
1060
1061   <li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
1062 </ul>
1063
1064 <p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
1065    This graph is based on the final SelectionDAG, with nodes that must be
1066    scheduled together bundled into a single scheduling-unit node, and with
1067    immediate operands and other nodes that aren't relevant for scheduling
1068    omitted.</p>
1069
1070 </div>
1071
1072 <!-- _______________________________________________________________________ -->
1073 <div class="doc_subsubsection">
1074   <a name="selectiondag_build">Initial SelectionDAG Construction</a>
1075 </div>
1076
1077 <div class="doc_text">
1078
1079 <p>The initial SelectionDAG is na&iuml;vely peephole expanded from the LLVM
1080    input by the <tt>SelectionDAGLowering</tt> class in the
1081    <tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file.  The intent of
1082    this pass is to expose as much low-level, target-specific details to the
1083    SelectionDAG as possible.  This pass is mostly hard-coded (e.g. an
1084    LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
1085    <tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
1086    requires target-specific hooks to lower calls, returns, varargs, etc.  For
1087    these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
1088    interface is used.</p>
1089
1090 </div>
1091
1092 <!-- _______________________________________________________________________ -->
1093 <div class="doc_subsubsection">
1094   <a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
1095 </div>
1096
1097 <div class="doc_text">
1098
1099 <p>The Legalize phase is in charge of converting a DAG to only use the types
1100    that are natively supported by the target.</p>
1101
1102 <p>There are two main ways of converting values of unsupported scalar types to
1103    values of supported types: converting small types to larger types
1104    ("promoting"), and breaking up large integer types into smaller ones
1105    ("expanding").  For example, a target might require that all f32 values are
1106    promoted to f64 and that all i1/i8/i16 values are promoted to i32.  The same
1107    target might require that all i64 values be expanded into pairs of i32
1108    values.  These changes can insert sign and zero extensions as needed to make
1109    sure that the final code has the same behavior as the input.</p>
1110
1111 <p>There are two main ways of converting values of unsupported vector types to
1112    value of supported types: splitting vector types, multiple times if
1113    necessary, until a legal type is found, and extending vector types by adding
1114    elements to the end to round them out to legal types ("widening").  If a
1115    vector gets split all the way down to single-element parts with no supported
1116    vector type being found, the elements are converted to scalars
1117    ("scalarizing").</p>
1118
1119 <p>A target implementation tells the legalizer which types are supported (and
1120    which register class to use for them) by calling the
1121    <tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
1122
1123 </div>
1124
1125 <!-- _______________________________________________________________________ -->
1126 <div class="doc_subsubsection">
1127   <a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
1128 </div>
1129
1130 <div class="doc_text">
1131
1132 <p>The Legalize phase is in charge of converting a DAG to only use the
1133    operations that are natively supported by the target.</p>
1134
1135 <p>Targets often have weird constraints, such as not supporting every operation
1136    on every supported datatype (e.g. X86 does not support byte conditional moves
1137    and PowerPC does not support sign-extending loads from a 16-bit memory
1138    location).  Legalize takes care of this by open-coding another sequence of
1139    operations to emulate the operation ("expansion"), by promoting one type to a
1140    larger type that supports the operation ("promotion"), or by using a
1141    target-specific hook to implement the legalization ("custom").</p>
1142
1143 <p>A target implementation tells the legalizer which operations are not
1144    supported (and which of the above three actions to take) by calling the
1145    <tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
1146    constructor.</p>
1147
1148 <p>Prior to the existence of the Legalize passes, we required that every target
1149    <a href="#selectiondag_optimize">selector</a> supported and handled every
1150    operator and type even if they are not natively supported.  The introduction
1151    of the Legalize phases allows all of the canonicalization patterns to be
1152    shared across targets, and makes it very easy to optimize the canonicalized
1153    code because it is still in the form of a DAG.</p>
1154
1155 </div>
1156
1157 <!-- _______________________________________________________________________ -->
1158 <div class="doc_subsubsection">
1159   <a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
1160   Combiner</a>
1161 </div>
1162
1163 <div class="doc_text">
1164
1165 <p>The SelectionDAG optimization phase is run multiple times for code
1166    generation, immediately after the DAG is built and once after each
1167    legalization.  The first run of the pass allows the initial code to be
1168    cleaned up (e.g. performing optimizations that depend on knowing that the
1169    operators have restricted type inputs).  Subsequent runs of the pass clean up
1170    the messy code generated by the Legalize passes, which allows Legalize to be
1171    very simple (it can focus on making code legal instead of focusing on
1172    generating <em>good</em> and legal code).</p>
1173
1174 <p>One important class of optimizations performed is optimizing inserted sign
1175    and zero extension instructions.  We currently use ad-hoc techniques, but
1176    could move to more rigorous techniques in the future.  Here are some good
1177    papers on the subject:</p>
1178
1179 <p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
1180    integer arithmetic</a>"<br>
1181    Kevin Redwine and Norman Ramsey<br>
1182    International Conference on Compiler Construction (CC) 2004</p>
1183
1184 <p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
1185    sign extension elimination</a>"<br>
1186    Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
1187    Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
1188    and Implementation.</p>
1189
1190 </div>
1191
1192 <!-- _______________________________________________________________________ -->
1193 <div class="doc_subsubsection">
1194   <a name="selectiondag_select">SelectionDAG Select Phase</a>
1195 </div>
1196
1197 <div class="doc_text">
1198
1199 <p>The Select phase is the bulk of the target-specific code for instruction
1200    selection.  This phase takes a legal SelectionDAG as input, pattern matches
1201    the instructions supported by the target to this DAG, and produces a new DAG
1202    of target code.  For example, consider the following LLVM fragment:</p>
1203
1204 <div class="doc_code">
1205 <pre>
1206 %t1 = fadd float %W, %X
1207 %t2 = fmul float %t1, %Y
1208 %t3 = fadd float %t2, %Z
1209 </pre>
1210 </div>
1211
1212 <p>This LLVM code corresponds to a SelectionDAG that looks basically like
1213    this:</p>
1214
1215 <div class="doc_code">
1216 <pre>
1217 (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
1218 </pre>
1219 </div>
1220
1221 <p>If a target supports floating point multiply-and-add (FMA) operations, one of
1222    the adds can be merged with the multiply.  On the PowerPC, for example, the
1223    output of the instruction selector might look like this DAG:</p>
1224
1225 <div class="doc_code">
1226 <pre>
1227 (FMADDS (FADDS W, X), Y, Z)
1228 </pre>
1229 </div>
1230
1231 <p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
1232 first two operands and adds the third (as single-precision floating-point
1233 numbers).  The <tt>FADDS</tt> instruction is a simple binary single-precision
1234 add instruction.  To perform this pattern match, the PowerPC backend includes
1235 the following instruction definitions:</p>
1236
1237 <div class="doc_code">
1238 <pre>
1239 def FMADDS : AForm_1&lt;59, 29,
1240                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
1241                     "fmadds $FRT, $FRA, $FRC, $FRB",
1242                     [<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
1243                                            F4RC:$FRB))</b>]&gt;;
1244 def FADDS : AForm_2&lt;59, 21,
1245                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
1246                     "fadds $FRT, $FRA, $FRB",
1247                     [<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]&gt;;
1248 </pre>
1249 </div>
1250
1251 <p>The portion of the instruction definition in bold indicates the pattern used
1252    to match the instruction.  The DAG operators
1253    (like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
1254    the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file.  "
1255    <tt>F4RC</tt>" is the register class of the input and result values.</p>
1256
1257 <p>The TableGen DAG instruction selector generator reads the instruction
1258    patterns in the <tt>.td</tt> file and automatically builds parts of the
1259    pattern matching code for your target.  It has the following strengths:</p>
1260
1261 <ul>
1262   <li>At compiler-compiler time, it analyzes your instruction patterns and tells
1263       you if your patterns make sense or not.</li>
1264
1265   <li>It can handle arbitrary constraints on operands for the pattern match.  In
1266       particular, it is straight-forward to say things like "match any immediate
1267       that is a 13-bit sign-extended value".  For examples, see the
1268       <tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
1269       backend.</li>
1270
1271   <li>It knows several important identities for the patterns defined.  For
1272       example, it knows that addition is commutative, so it allows the
1273       <tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
1274       well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
1275       to specially handle this case.</li>
1276
1277   <li>It has a full-featured type-inferencing system.  In particular, you should
1278       rarely have to explicitly tell the system what type parts of your patterns
1279       are.  In the <tt>FMADDS</tt> case above, we didn't have to tell
1280       <tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
1281       It was able to infer and propagate this knowledge from the fact that
1282       <tt>F4RC</tt> has type 'f32'.</li>
1283
1284   <li>Targets can define their own (and rely on built-in) "pattern fragments".
1285       Pattern fragments are chunks of reusable patterns that get inlined into
1286       your patterns during compiler-compiler time.  For example, the integer
1287       "<tt>(not x)</tt>" operation is actually defined as a pattern fragment
1288       that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
1289       have a native '<tt>not</tt>' operation.  Targets can define their own
1290       short-hand fragments as they see fit.  See the definition of
1291       '<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
1292
1293   <li>In addition to instructions, targets can specify arbitrary patterns that
1294       map to one or more instructions using the 'Pat' class.  For example, the
1295       PowerPC has no way to load an arbitrary integer immediate into a register
1296       in one instruction. To tell tblgen how to do this, it defines:
1297       <br>
1298       <br>
1299 <div class="doc_code">
1300 <pre>
1301 // Arbitrary immediate support.  Implement in terms of LIS/ORI.
1302 def : Pat&lt;(i32 imm:$imm),
1303           (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))&gt;;
1304 </pre>
1305 </div>
1306       <br>
1307       If none of the single-instruction patterns for loading an immediate into a
1308       register match, this will be used.  This rule says "match an arbitrary i32
1309       immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
1310       an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
1311       the left 16 bits') instruction".  To make this work, the
1312       <tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
1313       the input immediate (in this case, take the high or low 16-bits of the
1314       immediate).</li>
1315
1316   <li>While the system does automate a lot, it still allows you to write custom
1317       C++ code to match special cases if there is something that is hard to
1318       express.</li>
1319 </ul>
1320
1321 <p>While it has many strengths, the system currently has some limitations,
1322    primarily because it is a work in progress and is not yet finished:</p>
1323
1324 <ul>
1325   <li>Overall, there is no way to define or match SelectionDAG nodes that define
1326       multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
1327       etc).  This is the biggest reason that you currently still <em>have
1328       to</em> write custom C++ code for your instruction selector.</li>
1329
1330   <li>There is no great way to support matching complex addressing modes yet.
1331       In the future, we will extend pattern fragments to allow them to define
1332       multiple values (e.g. the four operands of the <a href="#x86_memory">X86
1333       addressing mode</a>, which are currently matched with custom C++ code).
1334       In addition, we'll extend fragments so that a fragment can match multiple
1335       different patterns.</li>
1336
1337   <li>We don't automatically infer flags like isStore/isLoad yet.</li>
1338
1339   <li>We don't automatically generate the set of supported registers and
1340       operations for the <a href="#selectiondag_legalize">Legalizer</a>
1341       yet.</li>
1342
1343   <li>We don't have a way of tying in custom legalized nodes yet.</li>
1344 </ul>
1345
1346 <p>Despite these limitations, the instruction selector generator is still quite
1347    useful for most of the binary and logical operations in typical instruction
1348    sets.  If you run into any problems or can't figure out how to do something,
1349    please let Chris know!</p>
1350
1351 </div>
1352
1353 <!-- _______________________________________________________________________ -->
1354 <div class="doc_subsubsection">
1355   <a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
1356 </div>
1357
1358 <div class="doc_text">
1359
1360 <p>The scheduling phase takes the DAG of target instructions from the selection
1361    phase and assigns an order.  The scheduler can pick an order depending on
1362    various constraints of the machines (i.e. order for minimal register pressure
1363    or try to cover instruction latencies).  Once an order is established, the
1364    DAG is converted to a list
1365    of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
1366    destroyed.</p>
1367
1368 <p>Note that this phase is logically separate from the instruction selection
1369    phase, but is tied to it closely in the code because it operates on
1370    SelectionDAGs.</p>
1371
1372 </div>
1373
1374 <!-- _______________________________________________________________________ -->
1375 <div class="doc_subsubsection">
1376   <a name="selectiondag_future">Future directions for the SelectionDAG</a>
1377 </div>
1378
1379 <div class="doc_text">
1380
1381 <ol>
1382   <li>Optional function-at-a-time selection.</li>
1383
1384   <li>Auto-generate entire selector from <tt>.td</tt> file.</li>
1385 </ol>
1386
1387 </div>
1388
1389 <!-- ======================================================================= -->
1390 <div class="doc_subsection">
1391   <a name="ssamco">SSA-based Machine Code Optimizations</a>
1392 </div>
1393 <div class="doc_text"><p>To Be Written</p></div>
1394
1395 <!-- ======================================================================= -->
1396 <div class="doc_subsection">
1397   <a name="liveintervals">Live Intervals</a>
1398 </div>
1399
1400 <div class="doc_text">
1401
1402 <p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
1403    They are used by some <a href="#regalloc">register allocator</a> passes to
1404    determine if two or more virtual registers which require the same physical
1405    register are live at the same point in the program (i.e., they conflict).
1406    When this situation occurs, one virtual register must be <i>spilled</i>.</p>
1407
1408 </div>
1409
1410 <!-- _______________________________________________________________________ -->
1411 <div class="doc_subsubsection">
1412   <a name="livevariable_analysis">Live Variable Analysis</a>
1413 </div>
1414
1415 <div class="doc_text">
1416
1417 <p>The first step in determining the live intervals of variables is to calculate
1418    the set of registers that are immediately dead after the instruction (i.e.,
1419    the instruction calculates the value, but it is never used) and the set of
1420    registers that are used by the instruction, but are never used after the
1421    instruction (i.e., they are killed). Live variable information is computed
1422    for each <i>virtual</i> register and <i>register allocatable</i> physical
1423    register in the function.  This is done in a very efficient manner because it
1424    uses SSA to sparsely compute lifetime information for virtual registers
1425    (which are in SSA form) and only has to track physical registers within a
1426    block.  Before register allocation, LLVM can assume that physical registers
1427    are only live within a single basic block.  This allows it to do a single,
1428    local analysis to resolve physical register lifetimes within each basic
1429    block. If a physical register is not register allocatable (e.g., a stack
1430    pointer or condition codes), it is not tracked.</p>
1431
1432 <p>Physical registers may be live in to or out of a function. Live in values are
1433    typically arguments in registers. Live out values are typically return values
1434    in registers. Live in values are marked as such, and are given a dummy
1435    "defining" instruction during live intervals analysis. If the last basic
1436    block of a function is a <tt>return</tt>, then it's marked as using all live
1437    out values in the function.</p>
1438
1439 <p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
1440    the live variable information from a depth first traversal of the CFG of the
1441    function won't guarantee that a virtual register used by the <tt>PHI</tt>
1442    node is defined before it's used. When a <tt>PHI</tt> node is encountered,
1443    only the definition is handled, because the uses will be handled in other
1444    basic blocks.</p>
1445
1446 <p>For each <tt>PHI</tt> node of the current basic block, we simulate an
1447    assignment at the end of the current basic block and traverse the successor
1448    basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
1449    the <tt>PHI</tt> node's operands is coming from the current basic block, then
1450    the variable is marked as <i>alive</i> within the current basic block and all
1451    of its predecessor basic blocks, until the basic block with the defining
1452    instruction is encountered.</p>
1453
1454 </div>
1455
1456 <!-- _______________________________________________________________________ -->
1457 <div class="doc_subsubsection">
1458   <a name="liveintervals_analysis">Live Intervals Analysis</a>
1459 </div>
1460
1461 <div class="doc_text">
1462
1463 <p>We now have the information available to perform the live intervals analysis
1464    and build the live intervals themselves.  We start off by numbering the basic
1465    blocks and machine instructions.  We then handle the "live-in" values.  These
1466    are in physical registers, so the physical register is assumed to be killed
1467    by the end of the basic block.  Live intervals for virtual registers are
1468    computed for some ordering of the machine instructions <tt>[1, N]</tt>.  A
1469    live interval is an interval <tt>[i, j)</tt>, where <tt>1 &lt;= i &lt;= j
1470    &lt; N</tt>, for which a variable is live.</p>
1471
1472 <p><i><b>More to come...</b></i></p>
1473
1474 </div>
1475
1476 <!-- ======================================================================= -->
1477 <div class="doc_subsection">
1478   <a name="regalloc">Register Allocation</a>
1479 </div>
1480
1481 <div class="doc_text">
1482
1483 <p>The <i>Register Allocation problem</i> consists in mapping a program
1484    <i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
1485    to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
1486    number of physical registers. Each target architecture has a different number
1487    of physical registers. If the number of physical registers is not enough to
1488    accommodate all the virtual registers, some of them will have to be mapped
1489    into memory. These virtuals are called <i>spilled virtuals</i>.</p>
1490
1491 </div>
1492
1493 <!-- _______________________________________________________________________ -->
1494
1495 <div class="doc_subsubsection">
1496   <a name="regAlloc_represent">How registers are represented in LLVM</a>
1497 </div>
1498
1499 <div class="doc_text">
1500
1501 <p>In LLVM, physical registers are denoted by integer numbers that normally
1502    range from 1 to 1023. To see how this numbering is defined for a particular
1503    architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
1504    architecture. For instance, by
1505    inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
1506    32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
1507    <tt>MM0</tt> is mapped to 48.</p>
1508
1509 <p>Some architectures contain registers that share the same physical location. A
1510    notable example is the X86 platform. For instance, in the X86 architecture,
1511    the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
1512    bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
1513    particular architecture, you can check which registers are aliased by
1514    inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
1515    <tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
1516    all the physical registers aliased to the register <tt>p_reg</tt>.</p>
1517
1518 <p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
1519    Elements in the same register class are functionally equivalent, and can be
1520    interchangeably used. Each virtual register can only be mapped to physical
1521    registers of a particular class. For instance, in the X86 architecture, some
1522    virtuals can only be allocated to 8 bit registers.  A register class is
1523    described by <tt>TargetRegisterClass</tt> objects.  To discover if a virtual
1524    register is compatible with a given physical, this code can be used:</p>
1525
1526 <div class="doc_code">
1527 <pre>
1528 bool RegMapping_Fer::compatible_class(MachineFunction &amp;mf,
1529                                       unsigned v_reg,
1530                                       unsigned p_reg) {
1531   assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &amp;&amp;
1532          "Target register must be physical");
1533   const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
1534   return trc-&gt;contains(p_reg);
1535 }
1536 </pre>
1537 </div>
1538
1539 <p>Sometimes, mostly for debugging purposes, it is useful to change the number
1540    of physical registers available in the target architecture. This must be done
1541    statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
1542    for <tt>RegisterClass</tt>, the last parameter of which is a list of
1543    registers. Just commenting some out is one simple way to avoid them being
1544    used. A more polite way is to explicitly exclude some registers from
1545    the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
1546    class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
1547    </p>
1548
1549 <p>Virtual registers are also denoted by integer numbers. Contrary to physical
1550    registers, different virtual registers never share the same number. The
1551    smallest virtual register is normally assigned the number 1024. This may
1552    change, so, in order to know which is the first virtual register, you should
1553    access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
1554    number is greater than or equal
1555    to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
1556    register. Whereas physical registers are statically defined in
1557    a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
1558    application developer, that is not the case with virtual registers.  In order
1559    to create new virtual registers, use the
1560    method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
1561    will return a virtual register with the highest code.</p>
1562
1563 <p>Before register allocation, the operands of an instruction are mostly virtual
1564    registers, although physical registers may also be used. In order to check if
1565    a given machine operand is a register, use the boolean
1566    function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
1567    a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
1568    or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
1569    defines the registers 1024, and uses registers 1025 and 1026. Given a
1570    register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
1571    register is being used by the instruction. The
1572    method <tt>MachineOperand::isDef()</tt> informs if that registers is being
1573    defined.</p>
1574
1575 <p>We will call physical registers present in the LLVM bitcode before register
1576    allocation <i>pre-colored registers</i>. Pre-colored registers are used in
1577    many different situations, for instance, to pass parameters of functions
1578    calls, and to store results of particular instructions. There are two types
1579    of pre-colored registers: the ones <i>implicitly</i> defined, and
1580    those <i>explicitly</i> defined. Explicitly defined registers are normal
1581    operands, and can be accessed
1582    with <tt>MachineInstr::getOperand(int)::getReg()</tt>.  In order to check
1583    which registers are implicitly defined by an instruction, use
1584    the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
1585    where <tt>opcode</tt> is the opcode of the target instruction. One important
1586    difference between explicit and implicit physical registers is that the
1587    latter are defined statically for each instruction, whereas the former may
1588    vary depending on the program being compiled. For example, an instruction
1589    that represents a function call will always implicitly define or use the same
1590    set of physical registers. To read the registers implicitly used by an
1591    instruction,
1592    use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
1593    registers impose constraints on any register allocation algorithm. The
1594    register allocator must make sure that none of them are overwritten by
1595    the values of virtual registers while still alive.</p>
1596
1597 </div>
1598
1599 <!-- _______________________________________________________________________ -->
1600
1601 <div class="doc_subsubsection">
1602   <a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
1603 </div>
1604
1605 <div class="doc_text">
1606
1607 <p>There are two ways to map virtual registers to physical registers (or to
1608    memory slots). The first way, that we will call <i>direct mapping</i>, is
1609    based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
1610    and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
1611    mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
1612    and stores sending and getting values to and from memory.</p>
1613
1614 <p>The direct mapping provides more flexibility to the developer of the register
1615    allocator; however, it is more error prone, and demands more implementation
1616    work.  Basically, the programmer will have to specify where load and store
1617    instructions should be inserted in the target function being compiled in
1618    order to get and store values in memory. To assign a physical register to a
1619    virtual register present in a given operand,
1620    use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
1621    use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
1622    load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
1623
1624 <p>The indirect mapping shields the application developer from the complexities
1625    of inserting load and store instructions. In order to map a virtual register
1626    to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>.  In
1627    order to map a certain virtual register to memory,
1628    use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
1629    the stack slot where <tt>vreg</tt>'s value will be located.  If it is
1630    necessary to map another virtual register to the same stack slot,
1631    use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
1632    important point to consider when using the indirect mapping, is that even if
1633    a virtual register is mapped to memory, it still needs to be mapped to a
1634    physical register. This physical register is the location where the virtual
1635    register is supposed to be found before being stored or after being
1636    reloaded.</p>
1637
1638 <p>If the indirect strategy is used, after all the virtual registers have been
1639    mapped to physical registers or stack slots, it is necessary to use a spiller
1640    object to place load and store instructions in the code. Every virtual that
1641    has been mapped to a stack slot will be stored to memory after been defined
1642    and will be loaded before being used. The implementation of the spiller tries
1643    to recycle load/store instructions, avoiding unnecessary instructions. For an
1644    example of how to invoke the spiller,
1645    see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
1646    in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
1647
1648 </div>
1649
1650 <!-- _______________________________________________________________________ -->
1651 <div class="doc_subsubsection">
1652   <a name="regAlloc_twoAddr">Handling two address instructions</a>
1653 </div>
1654
1655 <div class="doc_text">
1656
1657 <p>With very rare exceptions (e.g., function calls), the LLVM machine code
1658    instructions are three address instructions. That is, each instruction is
1659    expected to define at most one register, and to use at most two registers.
1660    However, some architectures use two address instructions. In this case, the
1661    defined register is also one of the used register. For instance, an
1662    instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
1663    to <tt>%EAX = %EAX + %EBX</tt>.</p>
1664
1665 <p>In order to produce correct code, LLVM must convert three address
1666    instructions that represent two address instructions into true two address
1667    instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
1668    this specific purpose. It must be run before register allocation takes
1669    place. After its execution, the resulting code may no longer be in SSA
1670    form. This happens, for instance, in situations where an instruction such
1671    as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
1672
1673 <div class="doc_code">
1674 <pre>
1675 %a = MOVE %b
1676 %a = ADD %a %c
1677 </pre>
1678 </div>
1679
1680 <p>Notice that, internally, the second instruction is represented as
1681    <tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
1682    used and defined by the instruction.</p>
1683
1684 </div>
1685
1686 <!-- _______________________________________________________________________ -->
1687 <div class="doc_subsubsection">
1688   <a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
1689 </div>
1690
1691 <div class="doc_text">
1692
1693 <p>An important transformation that happens during register allocation is called
1694    the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
1695    that are performed on the control flow graph of programs. However,
1696    traditional instruction sets do not implement PHI instructions. Thus, in
1697    order to generate executable code, compilers must replace PHI instructions
1698    with other instructions that preserve their semantics.</p>
1699
1700 <p>There are many ways in which PHI instructions can safely be removed from the
1701    target code. The most traditional PHI deconstruction algorithm replaces PHI
1702    instructions with copy instructions. That is the strategy adopted by
1703    LLVM. The SSA deconstruction algorithm is implemented
1704    in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
1705    identifier <tt>PHIEliminationID</tt> must be marked as required in the code
1706    of the register allocator.</p>
1707
1708 </div>
1709
1710 <!-- _______________________________________________________________________ -->
1711 <div class="doc_subsubsection">
1712   <a name="regAlloc_fold">Instruction folding</a>
1713 </div>
1714
1715 <div class="doc_text">
1716
1717 <p><i>Instruction folding</i> is an optimization performed during register
1718    allocation that removes unnecessary copy instructions. For instance, a
1719    sequence of instructions such as:</p>
1720
1721 <div class="doc_code">
1722 <pre>
1723 %EBX = LOAD %mem_address
1724 %EAX = COPY %EBX
1725 </pre>
1726 </div>
1727
1728 <p>can be safely substituted by the single instruction:</p>
1729
1730 <div class="doc_code">
1731 <pre>
1732 %EAX = LOAD %mem_address
1733 </pre>
1734 </div>
1735
1736 <p>Instructions can be folded with
1737    the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
1738    taken when folding instructions; a folded instruction can be quite different
1739    from the original
1740    instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
1741    in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
1742    use.</p>
1743
1744 </div>
1745
1746 <!-- _______________________________________________________________________ -->
1747
1748 <div class="doc_subsubsection">
1749   <a name="regAlloc_builtIn">Built in register allocators</a>
1750 </div>
1751
1752 <div class="doc_text">
1753
1754 <p>The LLVM infrastructure provides the application developer with three
1755    different register allocators:</p>
1756
1757 <ul>
1758   <li><i>Linear Scan</i> &mdash; <i>The default allocator</i>. This is the
1759       well-know linear scan register allocator. Whereas the
1760       <i>Simple</i> and <i>Local</i> algorithms use a direct mapping
1761       implementation technique, the <i>Linear Scan</i> implementation
1762       uses a spiller in order to place load and stores.</li>
1763
1764   <li><i>Fast</i> &mdash; This register allocator is the default for debug
1765       builds. It allocates registers on a basic block level, attempting to keep
1766       values in registers and reusing registers as appropriate.</li>
1767
1768   <li><i>PBQP</i> &mdash; A Partitioned Boolean Quadratic Programming (PBQP)
1769       based register allocator. This allocator works by constructing a PBQP
1770       problem representing the register allocation problem under consideration,
1771       solving this using a PBQP solver, and mapping the solution back to a
1772       register assignment.</li>
1773
1774 </ul>
1775
1776 <p>The type of register allocator used in <tt>llc</tt> can be chosen with the
1777    command line option <tt>-regalloc=...</tt>:</p>
1778
1779 <div class="doc_code">
1780 <pre>
1781 $ llc -regalloc=linearscan file.bc -o ln.s;
1782 $ llc -regalloc=fast file.bc -o fa.s;
1783 $ llc -regalloc=pbqp file.bc -o pbqp.s;
1784 </pre>
1785 </div>
1786
1787 </div>
1788
1789 <!-- ======================================================================= -->
1790 <div class="doc_subsection">
1791   <a name="proepicode">Prolog/Epilog Code Insertion</a>
1792 </div>
1793 <div class="doc_text"><p>To Be Written</p></div>
1794 <!-- ======================================================================= -->
1795 <div class="doc_subsection">
1796   <a name="latemco">Late Machine Code Optimizations</a>
1797 </div>
1798 <div class="doc_text"><p>To Be Written</p></div>
1799
1800 <!-- ======================================================================= -->
1801 <div class="doc_subsection">
1802   <a name="codeemit">Code Emission</a>
1803 </div>
1804
1805 <div class="doc_text">
1806
1807 <p>The code emission step of code generation is responsible for lowering from
1808 the code generator abstractions (like <a
1809 href="#machinefunction">MachineFunction</a>, <a
1810 href="#machineinstr">MachineInstr</a>, etc) down
1811 to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
1812 <a href="#mcstreamer">MCStreamer</a>, etc).  This is
1813 done with a combination of several different classes: the (misnamed)
1814 target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
1815 (such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
1816
1817 <p>Since the MC layer works at the level of abstraction of object files, it
1818 doesn't have a notion of functions, global variables etc.  Instead, it thinks
1819 about labels, directives, and instructions.  A key class used at this time is
1820 the MCStreamer class.  This is an abstract API that is implemented in different
1821 ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
1822 an "assembler API".  MCStreamer has one method per directive, such as EmitLabel,
1823 EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
1824 level directives.
1825 </p>
1826
1827 <p>If you are interested in implementing a code generator for a target, there
1828 are three important things that you have to implement for your target:</p>
1829
1830 <ol>
1831 <li>First, you need a subclass of AsmPrinter for your target.  This class
1832 implements the general lowering process converting MachineFunction's into MC
1833 label constructs.  The AsmPrinter base class provides a number of useful methods
1834 and routines, and also allows you to override the lowering process in some
1835 important ways.  You should get much of the lowering for free if you are
1836 implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
1837 class implements much of the common logic.</li>
1838
1839 <li>Second, you need to implement an instruction printer for your target.  The
1840 instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
1841 raw_ostream as text.  Most of this is automatically generated from the .td file
1842 (when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
1843 instructions), but you need to implement routines to print operands.</li>
1844
1845 <li>Third, you need to implement code that lowers a <a
1846 href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
1847 "&lt;target&gt;MCInstLower.cpp".  This lowering process is often target
1848 specific, and is responsible for turning jump table entries, constant pool
1849 indices, global variable addresses, etc into MCLabels as appropriate.  This
1850 translation layer is also responsible for expanding pseudo ops used by the code
1851 generator into the actual machine instructions they correspond to. The MCInsts
1852 that are generated by this are fed into the instruction printer or the encoder.
1853 </li>
1854
1855 </ol>
1856
1857 <p>Finally, at your choosing, you can also implement an subclass of
1858 MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
1859 This is important if you want to support direct .o file emission, or would like
1860 to implement an assembler for your target.</p>
1861
1862 </div>
1863
1864
1865 <!-- *********************************************************************** -->
1866 <div class="doc_section">
1867   <a name="nativeassembler">Implementing a Native Assembler</a>
1868 </div>
1869 <!-- *********************************************************************** -->
1870
1871 <div class="doc_text">
1872
1873 <p>Though you're probably reading this because you want to write or maintain a
1874 compiler backend, LLVM also fully supports building a native assemblers too.
1875 We've tried hard to automate the generation of the assembler from the .td files
1876 (in particular the instruction syntax and encodings), which means that a large
1877 part of the manual and repetitive data entry can be factored and shared with the
1878 compiler.</p>
1879
1880
1881
1882 </div>
1883
1884
1885 <!-- ======================================================================= -->
1886 <div class="doc_subsection">
1887   <a name="proepicode">Prolog/Epilog Code Insertion</a>
1888 </div>
1889 <div class="doc_text"><p>To Be Written</p></div>
1890
1891
1892
1893
1894 <!-- *********************************************************************** -->
1895 <div class="doc_section">
1896   <a name="targetimpls">Target-specific Implementation Notes</a>
1897 </div>
1898 <!-- *********************************************************************** -->
1899
1900 <div class="doc_text">
1901
1902 <p>This section of the document explains features or design decisions that are
1903    specific to the code generator for a particular target.</p>
1904
1905 </div>
1906
1907 <!-- ======================================================================= -->
1908 <div class="doc_subsection">
1909   <a name="tailcallopt">Tail call optimization</a>
1910 </div>
1911
1912 <div class="doc_text">
1913
1914 <p>Tail call optimization, callee reusing the stack of the caller, is currently
1915    supported on x86/x86-64 and PowerPC. It is performed if:</p>
1916
1917 <ul>
1918   <li>Caller and callee have the calling convention <tt>fastcc</tt> or
1919        <tt>cc 10</tt> (GHC call convention).</li>
1920
1921   <li>The call is a tail call - in tail position (ret immediately follows call
1922       and ret uses value of call or is void).</li>
1923
1924   <li>Option <tt>-tailcallopt</tt> is enabled.</li>
1925
1926   <li>Platform specific constraints are met.</li>
1927 </ul>
1928
1929 <p>x86/x86-64 constraints:</p>
1930
1931 <ul>
1932   <li>No variable argument lists are used.</li>
1933
1934   <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
1935   = hidden or protected) are supported.</li>
1936 </ul>
1937
1938 <p>PowerPC constraints:</p>
1939
1940 <ul>
1941   <li>No variable argument lists are used.</li>
1942
1943   <li>No byval parameters are used.</li>
1944
1945   <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
1946 </ul>
1947
1948 <p>Example:</p>
1949
1950 <p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
1951
1952 <div class="doc_code">
1953 <pre>
1954 declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
1955
1956 define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
1957   %l1 = add i32 %in1, %in2
1958   %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
1959   ret i32 %tmp
1960 }
1961 </pre>
1962 </div>
1963
1964 <p>Implications of <tt>-tailcallopt</tt>:</p>
1965
1966 <p>To support tail call optimization in situations where the callee has more
1967    arguments than the caller a 'callee pops arguments' convention is used. This
1968    currently causes each <tt>fastcc</tt> call that is not tail call optimized
1969    (because one or more of above constraints are not met) to be followed by a
1970    readjustment of the stack. So performance might be worse in such cases.</p>
1971
1972 </div>
1973 <!-- ======================================================================= -->
1974 <div class="doc_subsection">
1975   <a name="sibcallopt">Sibling call optimization</a>
1976 </div>
1977
1978 <div class="doc_text">
1979
1980 <p>Sibling call optimization is a restricted form of tail call optimization.
1981    Unlike tail call optimization described in the previous section, it can be
1982    performed automatically on any tail calls when <tt>-tailcallopt</tt> option
1983    is not specified.</p>
1984
1985 <p>Sibling call optimization is currently performed on x86/x86-64 when the
1986    following constraints are met:</p>
1987
1988 <ul>
1989   <li>Caller and callee have the same calling convention. It can be either
1990       <tt>c</tt> or <tt>fastcc</tt>.
1991
1992   <li>The call is a tail call - in tail position (ret immediately follows call
1993       and ret uses value of call or is void).</li>
1994
1995   <li>Caller and callee have matching return type or the callee result is not
1996       used.
1997
1998   <li>If any of the callee arguments are being passed in stack, they must be
1999       available in caller's own incoming argument stack and the frame offsets
2000       must be the same.
2001 </ul>
2002
2003 <p>Example:</p>
2004 <div class="doc_code">
2005 <pre>
2006 declare i32 @bar(i32, i32)
2007
2008 define i32 @foo(i32 %a, i32 %b, i32 %c) {
2009 entry:
2010   %0 = tail call i32 @bar(i32 %a, i32 %b)
2011   ret i32 %0
2012 }
2013 </pre>
2014 </div>
2015
2016 </div>
2017 <!-- ======================================================================= -->
2018 <div class="doc_subsection">
2019   <a name="x86">The X86 backend</a>
2020 </div>
2021
2022 <div class="doc_text">
2023
2024 <p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory.  This
2025    code generator is capable of targeting a variety of x86-32 and x86-64
2026    processors, and includes support for ISA extensions such as MMX and SSE.</p>
2027
2028 </div>
2029
2030 <!-- _______________________________________________________________________ -->
2031 <div class="doc_subsubsection">
2032   <a name="x86_tt">X86 Target Triples supported</a>
2033 </div>
2034
2035 <div class="doc_text">
2036
2037 <p>The following are the known target triples that are supported by the X86
2038    backend.  This is not an exhaustive list, and it would be useful to add those
2039    that people test.</p>
2040
2041 <ul>
2042   <li><b>i686-pc-linux-gnu</b> &mdash; Linux</li>
2043
2044   <li><b>i386-unknown-freebsd5.3</b> &mdash; FreeBSD 5.3</li>
2045
2046   <li><b>i686-pc-cygwin</b> &mdash; Cygwin on Win32</li>
2047
2048   <li><b>i686-pc-mingw32</b> &mdash; MingW on Win32</li>
2049
2050   <li><b>i386-pc-mingw32msvc</b> &mdash; MingW crosscompiler on Linux</li>
2051
2052   <li><b>i686-apple-darwin*</b> &mdash; Apple Darwin on X86</li>
2053
2054   <li><b>x86_64-unknown-linux-gnu</b> &mdash; Linux</li>
2055 </ul>
2056
2057 </div>
2058
2059 <!-- _______________________________________________________________________ -->
2060 <div class="doc_subsubsection">
2061   <a name="x86_cc">X86 Calling Conventions supported</a>
2062 </div>
2063
2064
2065 <div class="doc_text">
2066
2067 <p>The following target-specific calling conventions are known to backend:</p>
2068
2069 <ul>
2070   <li><b>x86_StdCall</b> &mdash; stdcall calling convention seen on Microsoft
2071       Windows platform (CC ID = 64).</li>
2072
2073   <li><b>x86_FastCall</b> &mdash; fastcall calling convention seen on Microsoft
2074       Windows platform (CC ID = 65).</li>
2075 </ul>
2076
2077 </div>
2078
2079 <!-- _______________________________________________________________________ -->
2080 <div class="doc_subsubsection">
2081   <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
2082 </div>
2083
2084 <div class="doc_text">
2085
2086 <p>The x86 has a very flexible way of accessing memory.  It is capable of
2087    forming memory addresses of the following expression directly in integer
2088    instructions (which use ModR/M addressing):</p>
2089
2090 <div class="doc_code">
2091 <pre>
2092 SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
2093 </pre>
2094 </div>
2095
2096 <p>In order to represent this, LLVM tracks no less than 5 operands for each
2097    memory operand of this form.  This means that the "load" form of
2098    '<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
2099
2100 <div class="doc_code">
2101 <pre>
2102 Index:        0     |    1        2       3           4          5
2103 Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
2104 OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg
2105 </pre>
2106 </div>
2107
2108 <p>Stores, and all other instructions, treat the four memory operands in the
2109    same way and in the same order.  If the segment register is unspecified
2110    (regno = 0), then no segment override is generated.  "Lea" operations do not
2111    have a segment register specified, so they only have 4 operands for their
2112    memory reference.</p>
2113
2114 </div>
2115
2116 <!-- _______________________________________________________________________ -->
2117 <div class="doc_subsubsection">
2118   <a name="x86_memory">X86 address spaces supported</a>
2119 </div>
2120
2121 <div class="doc_text">
2122
2123 <p>x86 has an experimental feature which provides
2124    the ability to perform loads and stores to different address spaces
2125    via the x86 segment registers.  A segment override prefix byte on an
2126    instruction causes the instruction's memory access to go to the specified
2127    segment.  LLVM address space 0 is the default address space, which includes
2128    the stack, and any unqualified memory accesses in a program.  Address spaces
2129    1-255 are currently reserved for user-defined code.  The GS-segment is
2130    represented by address space 256, while the FS-segment is represented by
2131    address space 257. Other x86 segments have yet to be allocated address space
2132    numbers.</p>
2133
2134 <p>While these address spaces may seem similar to TLS via the
2135    <tt>thread_local</tt> keyword, and often use the same underlying hardware,
2136    there are some fundamental differences.</p>
2137
2138 <p>The <tt>thread_local</tt> keyword applies to global variables and
2139    specifies that they are to be allocated in thread-local memory. There are
2140    no type qualifiers involved, and these variables can be pointed to with
2141    normal pointers and accessed with normal loads and stores.
2142    The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
2143    level (though LLVM doesn't yet have implementations of it for some
2144    configurations).<p>
2145
2146 <p>Special address spaces, in contrast, apply to static types. Every
2147    load and store has a particular address space in its address operand type,
2148    and this is what determines which address space is accessed.
2149    LLVM ignores these special address space qualifiers on global variables,
2150    and does not provide a way to directly allocate storage in them.
2151    At the LLVM IR level, the behavior of these special address spaces depends
2152    in part on the underlying OS or runtime environment, and they are specific
2153    to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
2154
2155 <p>Some operating systems and runtime environments use (or may in the future
2156    use) the FS/GS-segment registers for various low-level purposes, so care
2157    should be taken when considering them.</p>
2158
2159 </div>
2160
2161 <!-- _______________________________________________________________________ -->
2162 <div class="doc_subsubsection">
2163   <a name="x86_names">Instruction naming</a>
2164 </div>
2165
2166 <div class="doc_text">
2167
2168 <p>An instruction name consists of the base name, a default operand size, and a
2169    a character per operand with an optional special size. For example:</p>
2170
2171 <div class="doc_code">
2172 <pre>
2173 ADD8rr      -&gt; add, 8-bit register, 8-bit register
2174 IMUL16rmi   -&gt; imul, 16-bit register, 16-bit memory, 16-bit immediate
2175 IMUL16rmi8  -&gt; imul, 16-bit register, 16-bit memory, 8-bit immediate
2176 MOVSX32rm16 -&gt; movsx, 32-bit register, 16-bit memory
2177 </pre>
2178 </div>
2179
2180 </div>
2181
2182 <!-- ======================================================================= -->
2183 <div class="doc_subsection">
2184   <a name="ppc">The PowerPC backend</a>
2185 </div>
2186
2187 <div class="doc_text">
2188
2189 <p>The PowerPC code generator lives in the lib/Target/PowerPC directory.  The
2190    code generation is retargetable to several variations or <i>subtargets</i> of
2191    the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
2192
2193 </div>
2194
2195 <!-- _______________________________________________________________________ -->
2196 <div class="doc_subsubsection">
2197   <a name="ppc_abi">LLVM PowerPC ABI</a>
2198 </div>
2199
2200 <div class="doc_text">
2201
2202 <p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
2203    relative (PIC) or static addressing for accessing global values, so no TOC
2204    (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
2205    of a stack frame.  LLVM takes advantage of having no TOC to provide space to
2206    save the frame pointer in the PowerPC linkage area of the caller frame.
2207    Other details of PowerPC ABI can be found at <a href=
2208    "http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
2209    >PowerPC ABI.</a> Note: This link describes the 32 bit ABI.  The 64 bit ABI
2210    is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
2211    for system use.</p>
2212
2213 </div>
2214
2215 <!-- _______________________________________________________________________ -->
2216 <div class="doc_subsubsection">
2217   <a name="ppc_frame">Frame Layout</a>
2218 </div>
2219
2220 <div class="doc_text">
2221
2222 <p>The size of a PowerPC frame is usually fixed for the duration of a
2223    function's invocation.  Since the frame is fixed size, all references
2224    into the frame can be accessed via fixed offsets from the stack pointer.  The
2225    exception to this is when dynamic alloca or variable sized arrays are
2226    present, then a base pointer (r31) is used as a proxy for the stack pointer
2227    and stack pointer is free to grow or shrink.  A base pointer is also used if
2228    llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
2229    always aligned to 16 bytes, so that space allocated for altivec vectors will
2230    be properly aligned.</p>
2231
2232 <p>An invocation frame is laid out as follows (low memory at top);</p>
2233
2234 <table class="layout">
2235   <tr>
2236     <td>Linkage<br><br></td>
2237   </tr>
2238   <tr>
2239     <td>Parameter area<br><br></td>
2240   </tr>
2241   <tr>
2242     <td>Dynamic area<br><br></td>
2243   </tr>
2244   <tr>
2245     <td>Locals area<br><br></td>
2246   </tr>
2247   <tr>
2248     <td>Saved registers area<br><br></td>
2249   </tr>
2250   <tr style="border-style: none hidden none hidden;">
2251     <td><br></td>
2252   </tr>
2253   <tr>
2254     <td>Previous Frame<br><br></td>
2255   </tr>
2256 </table>
2257
2258 <p>The <i>linkage</i> area is used by a callee to save special registers prior
2259    to allocating its own frame.  Only three entries are relevant to LLVM. The
2260    first entry is the previous stack pointer (sp), aka link.  This allows
2261    probing tools like gdb or exception handlers to quickly scan the frames in
2262    the stack.  A function epilog can also use the link to pop the frame from the
2263    stack.  The third entry in the linkage area is used to save the return
2264    address from the lr register. Finally, as mentioned above, the last entry is
2265    used to save the previous frame pointer (r31.)  The entries in the linkage
2266    area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
2267    mode and 48 bytes in 64 bit mode.</p>
2268
2269 <p>32 bit linkage area</p>
2270
2271 <table class="layout">
2272   <tr>
2273     <td>0</td>
2274     <td>Saved SP (r1)</td>
2275   </tr>
2276   <tr>
2277     <td>4</td>
2278     <td>Saved CR</td>
2279   </tr>
2280   <tr>
2281     <td>8</td>
2282     <td>Saved LR</td>
2283   </tr>
2284   <tr>
2285     <td>12</td>
2286     <td>Reserved</td>
2287   </tr>
2288   <tr>
2289     <td>16</td>
2290     <td>Reserved</td>
2291   </tr>
2292   <tr>
2293     <td>20</td>
2294     <td>Saved FP (r31)</td>
2295   </tr>
2296 </table>
2297
2298 <p>64 bit linkage area</p>
2299
2300 <table class="layout">
2301   <tr>
2302     <td>0</td>
2303     <td>Saved SP (r1)</td>
2304   </tr>
2305   <tr>
2306     <td>8</td>
2307     <td>Saved CR</td>
2308   </tr>
2309   <tr>
2310     <td>16</td>
2311     <td>Saved LR</td>
2312   </tr>
2313   <tr>
2314     <td>24</td>
2315     <td>Reserved</td>
2316   </tr>
2317   <tr>
2318     <td>32</td>
2319     <td>Reserved</td>
2320   </tr>
2321   <tr>
2322     <td>40</td>
2323     <td>Saved FP (r31)</td>
2324   </tr>
2325 </table>
2326
2327 <p>The <i>parameter area</i> is used to store arguments being passed to a callee
2328    function.  Following the PowerPC ABI, the first few arguments are actually
2329    passed in registers, with the space in the parameter area unused.  However,
2330    if there are not enough registers or the callee is a thunk or vararg
2331    function, these register arguments can be spilled into the parameter area.
2332    Thus, the parameter area must be large enough to store all the parameters for
2333    the largest call sequence made by the caller.  The size must also be
2334    minimally large enough to spill registers r3-r10.  This allows callees blind
2335    to the call signature, such as thunks and vararg functions, enough space to
2336    cache the argument registers.  Therefore, the parameter area is minimally 32
2337    bytes (64 bytes in 64 bit mode.)  Also note that since the parameter area is
2338    a fixed offset from the top of the frame, that a callee can access its spilt
2339    arguments using fixed offsets from the stack pointer (or base pointer.)</p>
2340
2341 <p>Combining the information about the linkage, parameter areas and alignment. A
2342    stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
2343    mode.</p>
2344
2345 <p>The <i>dynamic area</i> starts out as size zero.  If a function uses dynamic
2346    alloca then space is added to the stack, the linkage and parameter areas are
2347    shifted to top of stack, and the new space is available immediately below the
2348    linkage and parameter areas.  The cost of shifting the linkage and parameter
2349    areas is minor since only the link value needs to be copied.  The link value
2350    can be easily fetched by adding the original frame size to the base pointer.
2351    Note that allocations in the dynamic space need to observe 16 byte
2352    alignment.</p>
2353
2354 <p>The <i>locals area</i> is where the llvm compiler reserves space for local
2355    variables.</p>
2356
2357 <p>The <i>saved registers area</i> is where the llvm compiler spills callee
2358    saved registers on entry to the callee.</p>
2359
2360 </div>
2361
2362 <!-- _______________________________________________________________________ -->
2363 <div class="doc_subsubsection">
2364   <a name="ppc_prolog">Prolog/Epilog</a>
2365 </div>
2366
2367 <div class="doc_text">
2368
2369 <p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
2370    the following exceptions.  Callee saved registers are spilled after the frame
2371    is created.  This allows the llvm epilog/prolog support to be common with
2372    other targets.  The base pointer callee saved register r31 is saved in the
2373    TOC slot of linkage area.  This simplifies allocation of space for the base
2374    pointer and makes it convenient to locate programatically and during
2375    debugging.</p>
2376
2377 </div>
2378
2379 <!-- _______________________________________________________________________ -->
2380 <div class="doc_subsubsection">
2381   <a name="ppc_dynamic">Dynamic Allocation</a>
2382 </div>
2383
2384 <div class="doc_text">
2385
2386 <p><i>TODO - More to come.</i></p>
2387
2388 </div>
2389
2390
2391 <!-- *********************************************************************** -->
2392 <hr>
2393 <address>
2394   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2395   src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
2396   <a href="http://validator.w3.org/check/referer"><img
2397   src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
2398
2399   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2400   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2401   Last modified: $Date$
2402 </address>
2403
2404 </body>
2405 </html>