docs/CodeGenerator.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <meta http-equiv="content-type" content="text/html; charset=utf-8">
   6   <title>The LLVM Target-Independent Code Generator</title>
   7   <link rel="stylesheet" href="llvm.css" type="text/css">
   8 </head>
   9 <body>
  10
  11 <div class="doc_title">
  12   The LLVM Target-Independent Code Generator
  13 </div>
  14
  15 <ol>
  16   <li><a href="#introduction">Introduction</a>
  17     <ul>
  18       <li><a href="#required">Required components in the code generator</a></li>
  19       <li><a href="#high-level-design">The high-level design of the code
  20           generator</a></li>
  21       <li><a href="#tablegen">Using TableGen for target description</a></li>
  22     </ul>
  23   </li>
  24   <li><a href="#targetdesc">Target description classes</a>
  25     <ul>
  26       <li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
  27       <li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
  28       <li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
  29       <li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
  30       <li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
  31       <li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
  32       <li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
  33       <li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
  34     </ul>
  35   </li>
  36   <li><a href="#codegendesc">The "Machine" Code Generator classes</a>
  37     <ul>
  38     <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
  39     <li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
  40                                      class</a></li>
  41     <li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
  42     </ul>
  43   </li>
  44   <li><a href="#mc">The "MC" Layer</a>
  45     <ul>
  46     <li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
  47     <li><a href="#mccontext">The <tt>MCContext</tt> class</a>
  48     <li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
  49     <li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
  50     <li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
  51     </ul>
  52   </li>
  53   <li><a href="#codegenalgs">Target-independent code generation algorithms</a>
  54     <ul>
  55     <li><a href="#instselect">Instruction Selection</a>
  56       <ul>
  57       <li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
  58       <li><a href="#selectiondag_process">SelectionDAG Code Generation
  59                                           Process</a></li>
  60       <li><a href="#selectiondag_build">Initial SelectionDAG
  61                                         Construction</a></li>
  62       <li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
  63       <li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
  64       <li><a href="#selectiondag_optimize">SelectionDAG Optimization
  65                                            Phase: the DAG Combiner</a></li>
  66       <li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
  67       <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
  68                                         Phase</a></li>
  69       <li><a href="#selectiondag_future">Future directions for the
  70                                          SelectionDAG</a></li>
  71       </ul></li>
  72      <li><a href="#liveintervals">Live Intervals</a>
  73        <ul>
  74        <li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
  75        <li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
  76        </ul></li>
  77     <li><a href="#regalloc">Register Allocation</a>
  78       <ul>
  79       <li><a href="#regAlloc_represent">How registers are represented in
  80                                         LLVM</a></li>
  81       <li><a href="#regAlloc_howTo">Mapping virtual registers to physical
  82                                     registers</a></li>
  83       <li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
  84       <li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
  85       <li><a href="#regAlloc_fold">Instruction folding</a></li>
  86       <li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
  87       </ul></li>
  88     <li><a href="#codeemit">Code Emission</a></li>
  89     </ul>
  90   </li>
  91   <li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
  92
  93   <li><a href="#targetimpls">Target-specific Implementation Notes</a>
  94     <ul>
  95     <li><a href="#targetfeatures">Target Feature Matrix</a></li>
  96     <li><a href="#tailcallopt">Tail call optimization</a></li>
  97     <li><a href="#sibcallopt">Sibling call optimization</a></li>
  98     <li><a href="#x86">The X86 backend</a></li>
  99     <li><a href="#ppc">The PowerPC backend</a>
 100       <ul>
 101       <li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
 102       <li><a href="#ppc_frame">Frame Layout</a></li>
 103       <li><a href="#ppc_prolog">Prolog/Epilog</a></li>
 104       <li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
 105       </ul></li>
 106     </ul></li>
 107
 108 </ol>
 109
 110 <div class="doc_author">
 111   <p>Written by the LLVM Team.</p>
 112 </div>
 113
 114 <div class="doc_warning">
 115   <p>Warning: This is a work in progress.</p>
 116 </div>
 117
 118 <!-- *********************************************************************** -->
 119 <div class="doc_section">
 120   <a name="introduction">Introduction</a>
 121 </div>
 122 <!-- *********************************************************************** -->
 123
 124 <div class="doc_text">
 125
 126 <p>The LLVM target-independent code generator is a framework that provides a
 127    suite of reusable components for translating the LLVM internal representation
 128    to the machine code for a specified target&mdash;either in assembly form
 129    (suitable for a static compiler) or in binary machine code format (usable for
 130    a JIT compiler). The LLVM target-independent code generator consists of six
 131    main components:</p>
 132
 133 <ol>
 134   <li><a href="#targetdesc">Abstract target description</a> interfaces which
 135       capture important properties about various aspects of the machine,
 136       independently of how they will be used.  These interfaces are defined in
 137       <tt>include/llvm/Target/</tt>.</li>
 138
 139   <li>Classes used to represent the <a href="#codegendesc">code being
 140       generated</a> for a target.  These classes are intended to be abstract
 141       enough to represent the machine code for <i>any</i> target machine.  These
 142       classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
 143       concepts like "constant pool entries" and "jump tables" are explicitly
 144       exposed.</li>
 145
 146   <li>Classes and algorithms used to represent code as the object file level,
 147       the <a href="#mc">MC Layer</a>.  These classes represent assembly level
 148       constructs like labels, sections, and instructions.  At this level,
 149       concepts like "constant pool entries" and "jump tables" don't exist.</li>
 150
 151   <li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
 152       various phases of native code generation (register allocation, scheduling,
 153       stack frame representation, etc).  This code lives
 154       in <tt>lib/CodeGen/</tt>.</li>
 155
 156   <li><a href="#targetimpls">Implementations of the abstract target description
 157       interfaces</a> for particular targets.  These machine descriptions make
 158       use of the components provided by LLVM, and can optionally provide custom
 159       target-specific passes, to build complete code generators for a specific
 160       target.  Target descriptions live in <tt>lib/Target/</tt>.</li>
 161
 162   <li><a href="#jit">The target-independent JIT components</a>.  The LLVM JIT is
 163       completely target independent (it uses the <tt>TargetJITInfo</tt>
 164       structure to interface for target-specific issues.  The code for the
 165       target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
 166 </ol>
 167
 168 <p>Depending on which part of the code generator you are interested in working
 169    on, different pieces of this will be useful to you.  In any case, you should
 170    be familiar with the <a href="#targetdesc">target description</a>
 171    and <a href="#codegendesc">machine code representation</a> classes.  If you
 172    want to add a backend for a new target, you will need
 173    to <a href="#targetimpls">implement the target description</a> classes for
 174    your new target and understand the <a href="LangRef.html">LLVM code
 175    representation</a>.  If you are interested in implementing a
 176    new <a href="#codegenalgs">code generation algorithm</a>, it should only
 177    depend on the target-description and machine code representation classes,
 178    ensuring that it is portable.</p>
 179
 180 </div>
 181
 182 <!-- ======================================================================= -->
 183 <div class="doc_subsection">
 184  <a name="required">Required components in the code generator</a>
 185 </div>
 186
 187 <div class="doc_text">
 188
 189 <p>The two pieces of the LLVM code generator are the high-level interface to the
 190    code generator and the set of reusable components that can be used to build
 191    target-specific backends.  The two most important interfaces
 192    (<a href="#targetmachine"><tt>TargetMachine</tt></a>
 193    and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
 194    required to be defined for a backend to fit into the LLVM system, but the
 195    others must be defined if the reusable code generator components are going to
 196    be used.</p>
 197
 198 <p>This design has two important implications.  The first is that LLVM can
 199    support completely non-traditional code generation targets.  For example, the
 200    C backend does not require register allocation, instruction selection, or any
 201    of the other standard components provided by the system.  As such, it only
 202    implements these two interfaces, and does its own thing.  Another example of
 203    a code generator like this is a (purely hypothetical) backend that converts
 204    LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
 205
 206 <p>This design also implies that it is possible to design and implement
 207    radically different code generators in the LLVM system that do not make use
 208    of any of the built-in components.  Doing so is not recommended at all, but
 209    could be required for radically different targets that do not fit into the
 210    LLVM machine description model: FPGAs for example.</p>
 211
 212 </div>
 213
 214 <!-- ======================================================================= -->
 215 <div class="doc_subsection">
 216  <a name="high-level-design">The high-level design of the code generator</a>
 217 </div>
 218
 219 <div class="doc_text">
 220
 221 <p>The LLVM target-independent code generator is designed to support efficient
 222    and quality code generation for standard register-based microprocessors.
 223    Code generation in this model is divided into the following stages:</p>
 224
 225 <ol>
 226   <li><b><a href="#instselect">Instruction Selection</a></b> &mdash; This phase
 227       determines an efficient way to express the input LLVM code in the target
 228       instruction set.  This stage produces the initial code for the program in
 229       the target instruction set, then makes use of virtual registers in SSA
 230       form and physical registers that represent any required register
 231       assignments due to target constraints or calling conventions.  This step
 232       turns the LLVM code into a DAG of target instructions.</li>
 233
 234   <li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> &mdash;
 235       This phase takes the DAG of target instructions produced by the
 236       instruction selection phase, determines an ordering of the instructions,
 237       then emits the instructions
 238       as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
 239       Note that we describe this in the <a href="#instselect">instruction
 240       selection section</a> because it operates on
 241       a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
 242
 243   <li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> &mdash;
 244       This optional stage consists of a series of machine-code optimizations
 245       that operate on the SSA-form produced by the instruction selector.
 246       Optimizations like modulo-scheduling or peephole optimization work
 247       here.</li>
 248
 249   <li><b><a href="#regalloc">Register Allocation</a></b> &mdash; The target code
 250       is transformed from an infinite virtual register file in SSA form to the
 251       concrete register file used by the target.  This phase introduces spill
 252       code and eliminates all virtual register references from the program.</li>
 253
 254   <li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> &mdash; Once
 255       the machine code has been generated for the function and the amount of
 256       stack space required is known (used for LLVM alloca's and spill slots),
 257       the prolog and epilog code for the function can be inserted and "abstract
 258       stack location references" can be eliminated.  This stage is responsible
 259       for implementing optimizations like frame-pointer elimination and stack
 260       packing.</li>
 261
 262   <li><b><a href="#latemco">Late Machine Code Optimizations</a></b> &mdash;
 263       Optimizations that operate on "final" machine code can go here, such as
 264       spill code scheduling and peephole optimizations.</li>
 265
 266   <li><b><a href="#codeemit">Code Emission</a></b> &mdash; The final stage
 267       actually puts out the code for the current function, either in the target
 268       assembler format or in machine code.</li>
 269 </ol>
 270
 271 <p>The code generator is based on the assumption that the instruction selector
 272    will use an optimal pattern matching selector to create high-quality
 273    sequences of native instructions.  Alternative code generator designs based
 274    on pattern expansion and aggressive iterative peephole optimization are much
 275    slower.  This design permits efficient compilation (important for JIT
 276    environments) and aggressive optimization (used when generating code offline)
 277    by allowing components of varying levels of sophistication to be used for any
 278    step of compilation.</p>
 279
 280 <p>In addition to these stages, target implementations can insert arbitrary
 281    target-specific passes into the flow.  For example, the X86 target uses a
 282    special pass to handle the 80x87 floating point stack architecture.  Other
 283    targets with unusual requirements can be supported with custom passes as
 284    needed.</p>
 285
 286 </div>
 287
 288 <!-- ======================================================================= -->
 289 <div class="doc_subsection">
 290  <a name="tablegen">Using TableGen for target description</a>
 291 </div>
 292
 293 <div class="doc_text">
 294
 295 <p>The target description classes require a detailed description of the target
 296    architecture.  These target descriptions often have a large amount of common
 297    information (e.g., an <tt>add</tt> instruction is almost identical to a
 298    <tt>sub</tt> instruction).  In order to allow the maximum amount of
 299    commonality to be factored out, the LLVM code generator uses
 300    the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
 301    chunks of the target machine, which allows the use of domain-specific and
 302    target-specific abstractions to reduce the amount of repetition.</p>
 303
 304 <p>As LLVM continues to be developed and refined, we plan to move more and more
 305    of the target description to the <tt>.td</tt> form.  Doing so gives us a
 306    number of advantages.  The most important is that it makes it easier to port
 307    LLVM because it reduces the amount of C++ code that has to be written, and
 308    the surface area of the code generator that needs to be understood before
 309    someone can get something working.  Second, it makes it easier to change
 310    things. In particular, if tables and other things are all emitted
 311    by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
 312    update all of the targets to a new interface.</p>
 313
 314 </div>
 315
 316 <!-- *********************************************************************** -->
 317 <div class="doc_section">
 318   <a name="targetdesc">Target description classes</a>
 319 </div>
 320 <!-- *********************************************************************** -->
 321
 322 <div class="doc_text">
 323
 324 <p>The LLVM target description classes (located in the
 325    <tt>include/llvm/Target</tt> directory) provide an abstract description of
 326    the target machine independent of any particular client.  These classes are
 327    designed to capture the <i>abstract</i> properties of the target (such as the
 328    instructions and registers it has), and do not incorporate any particular
 329    pieces of code generation algorithms.</p>
 330
 331 <p>All of the target description classes (except the
 332    <tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
 333    subclassed by the concrete target implementation, and have virtual methods
 334    implemented.  To get to these implementations, the
 335    <tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
 336    that should be implemented by the target.</p>
 337
 338 </div>
 339
 340 <!-- ======================================================================= -->
 341 <div class="doc_subsection">
 342   <a name="targetmachine">The <tt>TargetMachine</tt> class</a>
 343 </div>
 344
 345 <div class="doc_text">
 346
 347 <p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
 348    access the target-specific implementations of the various target description
 349    classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
 350    <tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.).  This class is
 351    designed to be specialized by a concrete target implementation
 352    (e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
 353    methods.  The only required target description class is
 354    the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
 355    generator components are to be used, the other interfaces should be
 356    implemented as well.</p>
 357
 358 </div>
 359
 360 <!-- ======================================================================= -->
 361 <div class="doc_subsection">
 362   <a name="targetdata">The <tt>TargetData</tt> class</a>
 363 </div>
 364
 365 <div class="doc_text">
 366
 367 <p>The <tt>TargetData</tt> class is the only required target description class,
 368    and it is the only class that is not extensible (you cannot derived a new
 369    class from it).  <tt>TargetData</tt> specifies information about how the
 370    target lays out memory for structures, the alignment requirements for various
 371    data types, the size of pointers in the target, and whether the target is
 372    little-endian or big-endian.</p>
 373
 374 </div>
 375
 376 <!-- ======================================================================= -->
 377 <div class="doc_subsection">
 378   <a name="targetlowering">The <tt>TargetLowering</tt> class</a>
 379 </div>
 380
 381 <div class="doc_text">
 382
 383 <p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
 384    selectors primarily to describe how LLVM code should be lowered to
 385    SelectionDAG operations.  Among other things, this class indicates:</p>
 386
 387 <ul>
 388   <li>an initial register class to use for various <tt>ValueType</tt>s,</li>
 389
 390   <li>which operations are natively supported by the target machine,</li>
 391
 392   <li>the return type of <tt>setcc</tt> operations,</li>
 393
 394   <li>the type to use for shift amounts, and</li>
 395
 396   <li>various high-level characteristics, like whether it is profitable to turn
 397       division by a constant into a multiplication sequence</li>
 398 </ul>
 399
 400 </div>
 401
 402 <!-- ======================================================================= -->
 403 <div class="doc_subsection">
 404   <a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
 405 </div>
 406
 407 <div class="doc_text">
 408
 409 <p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
 410    of the target and any interactions between the registers.</p>
 411
 412 <p>Registers in the code generator are represented in the code generator by
 413    unsigned integers.  Physical registers (those that actually exist in the
 414    target description) are unique small numbers, and virtual registers are
 415    generally large.  Note that register #0 is reserved as a flag value.</p>
 416
 417 <p>Each register in the processor description has an associated
 418    <tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
 419    register (used for assembly output and debugging dumps) and a set of aliases
 420    (used to indicate whether one register overlaps with another).</p>
 421
 422 <p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
 423    class exposes a set of processor specific register classes (instances of the
 424    <tt>TargetRegisterClass</tt> class).  Each register class contains sets of
 425    registers that have the same properties (for example, they are all 32-bit
 426    integer registers).  Each SSA virtual register created by the instruction
 427    selector has an associated register class.  When the register allocator runs,
 428    it replaces virtual registers with a physical register in the set.</p>
 429
 430 <p>The target-specific implementations of these classes is auto-generated from
 431    a <a href="TableGenFundamentals.html">TableGen</a> description of the
 432    register file.</p>
 433
 434 </div>
 435
 436 <!-- ======================================================================= -->
 437 <div class="doc_subsection">
 438   <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
 439 </div>
 440
 441 <div class="doc_text">
 442
 443 <p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
 444    instructions supported by the target. It is essentially an array of
 445    <tt>TargetInstrDescriptor</tt> objects, each of which describes one
 446    instruction the target supports. Descriptors define things like the mnemonic
 447    for the opcode, the number of operands, the list of implicit register uses
 448    and defs, whether the instruction has certain target-independent properties
 449    (accesses memory, is commutable, etc), and holds any target-specific
 450    flags.</p>
 451
 452 </div>
 453
 454 <!-- ======================================================================= -->
 455 <div class="doc_subsection">
 456   <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
 457 </div>
 458
 459 <div class="doc_text">
 460
 461 <p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
 462    stack frame layout of the target. It holds the direction of stack growth, the
 463    known stack alignment on entry to each function, and the offset to the local
 464    area.  The offset to the local area is the offset from the stack pointer on
 465    function entry to the first location where function data (local variables,
 466    spill locations) can be stored.</p>
 467
 468 </div>
 469
 470 <!-- ======================================================================= -->
 471 <div class="doc_subsection">
 472   <a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
 473 </div>
 474
 475 <div class="doc_text">
 476
 477 <p>The <tt>TargetSubtarget</tt> class is used to provide information about the
 478    specific chip set being targeted.  A sub-target informs code generation of
 479    which instructions are supported, instruction latencies and instruction
 480    execution itinerary; i.e., which processing units are used, in what order,
 481    and for how long.</p>
 482
 483 </div>
 484
 485
 486 <!-- ======================================================================= -->
 487 <div class="doc_subsection">
 488   <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
 489 </div>
 490
 491 <div class="doc_text">
 492
 493 <p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
 494    Just-In-Time code generator to perform target-specific activities, such as
 495    emitting stubs.  If a <tt>TargetMachine</tt> supports JIT code generation, it
 496    should provide one of these objects through the <tt>getJITInfo</tt>
 497    method.</p>
 498
 499 </div>
 500
 501 <!-- *********************************************************************** -->
 502 <div class="doc_section">
 503   <a name="codegendesc">Machine code description classes</a>
 504 </div>
 505 <!-- *********************************************************************** -->
 506
 507 <div class="doc_text">
 508
 509 <p>At the high-level, LLVM code is translated to a machine specific
 510    representation formed out of
 511    <a href="#machinefunction"><tt>MachineFunction</tt></a>,
 512    <a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
 513    and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
 514    in <tt>include/llvm/CodeGen</tt>).  This representation is completely target
 515    agnostic, representing instructions in their most abstract form: an opcode
 516    and a series of operands.  This representation is designed to support both an
 517    SSA representation for machine code, as well as a register allocated, non-SSA
 518    form.</p>
 519
 520 </div>
 521
 522 <!-- ======================================================================= -->
 523 <div class="doc_subsection">
 524   <a name="machineinstr">The <tt>MachineInstr</tt> class</a>
 525 </div>
 526
 527 <div class="doc_text">
 528
 529 <p>Target machine instructions are represented as instances of the
 530    <tt>MachineInstr</tt> class.  This class is an extremely abstract way of
 531    representing machine instructions.  In particular, it only keeps track of an
 532    opcode number and a set of operands.</p>
 533
 534 <p>The opcode number is a simple unsigned integer that only has meaning to a
 535    specific backend.  All of the instructions for a target should be defined in
 536    the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
 537    auto-generated from this description.  The <tt>MachineInstr</tt> class does
 538    not have any information about how to interpret the instruction (i.e., what
 539    the semantics of the instruction are); for that you must refer to the
 540    <tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
 541
 542 <p>The operands of a machine instruction can be of several different types: a
 543    register reference, a constant integer, a basic block reference, etc.  In
 544    addition, a machine operand should be marked as a def or a use of the value
 545    (though only registers are allowed to be defs).</p>
 546
 547 <p>By convention, the LLVM code generator orders instruction operands so that
 548    all register definitions come before the register uses, even on architectures
 549    that are normally printed in other orders.  For example, the SPARC add
 550    instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
 551    and stores the result into the "%i3" register.  In the LLVM code generator,
 552    the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
 553    destination first.</p>
 554
 555 <p>Keeping destination (definition) operands at the beginning of the operand
 556    list has several advantages.  In particular, the debugging printer will print
 557    the instruction like this:</p>
 558
 559 <div class="doc_code">
 560 <pre>
 561 %r3 = add %i1, %i2
 562 </pre>
 563 </div>
 564
 565 <p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
 566    instructions</a> whose only def is the first operand.</p>
 567
 568 </div>
 569
 570 <!-- _______________________________________________________________________ -->
 571 <div class="doc_subsubsection">
 572   <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
 573 </div>
 574
 575 <div class="doc_text">
 576
 577 <p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
 578    located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file.  The
 579    <tt>BuildMI</tt> functions make it easy to build arbitrary machine
 580    instructions.  Usage of the <tt>BuildMI</tt> functions look like this:</p>
 581
 582 <div class="doc_code">
 583 <pre>
 584 // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
 585 // instruction.  The '1' specifies how many operands will be added.
 586 MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
 587
 588 // Create the same instr, but insert it at the end of a basic block.
 589 MachineBasicBlock &amp;MBB = ...
 590 BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
 591
 592 // Create the same instr, but insert it before a specified iterator point.
 593 MachineBasicBlock::iterator MBBI = ...
 594 BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
 595
 596 // Create a 'cmp Reg, 0' instruction, no destination reg.
 597 MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
 598 // Create an 'sahf' instruction which takes no operands and stores nothing.
 599 MI = BuildMI(X86::SAHF, 0);
 600
 601 // Create a self looping branch instruction.
 602 BuildMI(MBB, X86::JNE, 1).addMBB(&amp;MBB);
 603 </pre>
 604 </div>
 605
 606 <p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
 607    have to specify the number of operands that the machine instruction will
 608    take.  This allows for efficient memory allocation.  You also need to specify
 609    if operands default to be uses of values, not definitions.  If you need to
 610    add a definition operand (other than the optional destination register), you
 611    must explicitly mark it as such:</p>
 612
 613 <div class="doc_code">
 614 <pre>
 615 MI.addReg(Reg, RegState::Define);
 616 </pre>
 617 </div>
 618
 619 </div>
 620
 621 <!-- _______________________________________________________________________ -->
 622 <div class="doc_subsubsection">
 623   <a name="fixedregs">Fixed (preassigned) registers</a>
 624 </div>
 625
 626 <div class="doc_text">
 627
 628 <p>One important issue that the code generator needs to be aware of is the
 629    presence of fixed registers.  In particular, there are often places in the
 630    instruction stream where the register allocator <em>must</em> arrange for a
 631    particular value to be in a particular register.  This can occur due to
 632    limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
 633    with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
 634    calling conventions.  In any case, the instruction selector should emit code
 635    that copies a virtual register into or out of a physical register when
 636    needed.</p>
 637
 638 <p>For example, consider this simple LLVM example:</p>
 639
 640 <div class="doc_code">
 641 <pre>
 642 define i32 @test(i32 %X, i32 %Y) {
 643   %Z = udiv i32 %X, %Y
 644   ret i32 %Z
 645 }
 646 </pre>
 647 </div>
 648
 649 <p>The X86 instruction selector produces this machine code for the <tt>div</tt>
 650    and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
 651    get this):</p>
 652
 653 <div class="doc_code">
 654 <pre>
 655 ;; Start of div
 656 %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
 657 %reg1027 = sar %reg1024, 31
 658 %EDX = mov %reg1027           ;; Sign extend X into EDX
 659 idiv %reg1025                 ;; Divide by Y (in reg1025)
 660 %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
 661
 662 ;; Start of ret
 663 %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
 664 ret
 665 </pre>
 666 </div>
 667
 668 <p>By the end of code generation, the register allocator has coalesced the
 669    registers and deleted the resultant identity moves producing the following
 670    code:</p>
 671
 672 <div class="doc_code">
 673 <pre>
 674 ;; X is in EAX, Y is in ECX
 675 mov %EAX, %EDX
 676 sar %EDX, 31
 677 idiv %ECX
 678 ret
 679 </pre>
 680 </div>
 681
 682 <p>This approach is extremely general (if it can handle the X86 architecture, it
 683    can handle anything!) and allows all of the target specific knowledge about
 684    the instruction stream to be isolated in the instruction selector.  Note that
 685    physical registers should have a short lifetime for good code generation, and
 686    all physical registers are assumed dead on entry to and exit from basic
 687    blocks (before register allocation).  Thus, if you need a value to be live
 688    across basic block boundaries, it <em>must</em> live in a virtual
 689    register.</p>
 690
 691 </div>
 692
 693 <!-- _______________________________________________________________________ -->
 694 <div class="doc_subsubsection">
 695   <a name="ssa">Machine code in SSA form</a>
 696 </div>
 697
 698 <div class="doc_text">
 699
 700 <p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
 701    maintained in SSA-form until register allocation happens.  For the most part,
 702    this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
 703    become machine code PHI nodes, and virtual registers are only allowed to have
 704    a single definition.</p>
 705
 706 <p>After register allocation, machine code is no longer in SSA-form because
 707    there are no virtual registers left in the code.</p>
 708
 709 </div>
 710
 711 <!-- ======================================================================= -->
 712 <div class="doc_subsection">
 713   <a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
 714 </div>
 715
 716 <div class="doc_text">
 717
 718 <p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
 719    (<tt><a href="#machineinstr">MachineInstr</a></tt> instances).  It roughly
 720    corresponds to the LLVM code input to the instruction selector, but there can
 721    be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
 722    machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
 723    "<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
 724    comes from.</p>
 725
 726 </div>
 727
 728 <!-- ======================================================================= -->
 729 <div class="doc_subsection">
 730   <a name="machinefunction">The <tt>MachineFunction</tt> class</a>
 731 </div>
 732
 733 <div class="doc_text">
 734
 735 <p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
 736    (<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances).  It
 737    corresponds one-to-one with the LLVM function input to the instruction
 738    selector.  In addition to a list of basic blocks,
 739    the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
 740    a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
 741    <tt>MachineRegisterInfo</tt>.  See
 742    <tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
 743
 744 </div>
 745
 746
 747 <!-- *********************************************************************** -->
 748 <div class="doc_section">
 749   <a name="mc">The "MC" Layer</a>
 750 </div>
 751 <!-- *********************************************************************** -->
 752
 753 <div class="doc_text">
 754
 755 <p>
 756 The MC Layer is used to represent and process code at the raw machine code
 757 level, devoid of "high level" information like "constant pools", "jump tables",
 758 "global variables" or anything like that.  At this level, LLVM handles things
 759 like label names, machine instructions, and sections in the object file.  The
 760 code in this layer is used for a number of important purposes: the tail end of
 761 the code generator uses it to write a .s or .o file, and it is also used by the
 762 llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
 763 </p>
 764
 765 <p>
 766 This section describes some of the important classes.  There are also a number
 767 of important subsystems that interact at this layer, they are described later
 768 in this manual.
 769 </p>
 770
 771 </div>
 772
 773
 774 <!-- ======================================================================= -->
 775 <div class="doc_subsection">
 776   <a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
 777 </div>
 778
 779 <div class="doc_text">
 780
 781 <p>
 782 MCStreamer is best thought of as an assembler API.  It is an abstract API which
 783 is <em>implemented</em> in different ways (e.g. to output a .s file, output an
 784 ELF .o file, etc) but whose API correspond directly to what you see in a .s
 785 file.  MCStreamer has one method per directive, such as EmitLabel,
 786 EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
 787 directly correspond to assembly level directives.  It also has an
 788 EmitInstruction method, which is used to output an MCInst to the streamer.
 789 </p>
 790
 791 <p>
 792 This API is most important for two clients: the llvm-mc stand-alone assembler is
 793 effectively a parser that parses a line, then invokes a method on MCStreamer. In
 794 the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
 795 generator lowers higher level LLVM IR and Machine* constructs down to the MC
 796 layer, emitting directives through MCStreamer.</p>
 797
 798 <p>
 799 On the implementation side of MCStreamer, there are two major implementations:
 800 one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
 801 file (MCObjectStreamer).  MCAsmStreamer is a straight-forward implementation
 802 that prints out a directive for each method (e.g. EmitValue -&gt; .byte), but
 803 MCObjectStreamer implements a full assembler.
 804 </p>
 805
 806 </div>
 807
 808 <!-- ======================================================================= -->
 809 <div class="doc_subsection">
 810   <a name="mccontext">The <tt>MCContext</tt> class</a>
 811 </div>
 812
 813 <div class="doc_text">
 814
 815 <p>
 816 The MCContext class is the owner of a variety of uniqued data structures at the
 817 MC layer, including symbols, sections, etc.  As such, this is the class that you
 818 interact with to create symbols and sections.  This class can not be subclassed.
 819 </p>
 820
 821 </div>
 822
 823 <!-- ======================================================================= -->
 824 <div class="doc_subsection">
 825   <a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
 826 </div>
 827
 828 <div class="doc_text">
 829
 830 <p>
 831 The MCSymbol class represents a symbol (aka label) in the assembly file.  There
 832 are two interesting kinds of symbols: assembler temporary symbols, and normal
 833 symbols.  Assembler temporary symbols are used and processed by the assembler
 834 but are discarded when the object file is produced.  The distinction is usually
 835 represented by adding a prefix to the label, for example "L" labels are
 836 assembler temporary labels in MachO.
 837 </p>
 838
 839 <p>MCSymbols are created by MCContext and uniqued there.  This means that
 840 MCSymbols can be compared for pointer equivalence to find out if they are the
 841 same symbol.  Note that pointer inequality does not guarantee the labels will
 842 end up at different addresses though.  It's perfectly legal to output something
 843 like this to the .s file:<p>
 844
 845 <pre>
 846   foo:
 847   bar:
 848     .byte 4
 849 </pre>
 850
 851 <p>In this case, both the foo and bar symbols will have the same address.</p>
 852
 853 </div>
 854
 855 <!-- ======================================================================= -->
 856 <div class="doc_subsection">
 857   <a name="mcsection">The <tt>MCSection</tt> class</a>
 858 </div>
 859
 860 <div class="doc_text">
 861
 862 <p>
 863 The MCSection class represents an object-file specific section. It is subclassed
 864 by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
 865 <tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
 866 by MCContext.  The MCStreamer has a notion of the current section, which can be
 867 changed with the SwitchToSection method (which corresponds to a ".section"
 868 directive in a .s file).
 869 </p>
 870
 871 </div>
 872
 873 <!-- ======================================================================= -->
 874 <div class="doc_subsection">
 875   <a name="mcinst">The <tt>MCInst</tt> class</a></li>
 876 </div>
 877
 878 <div class="doc_text">
 879
 880 <p>
 881 The MCInst class is a target-independent representation of an instruction.  It
 882 is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
 883 that holds a target-specific opcode and a vector of MCOperands.  MCOperand, in
 884 turn, is a simple discriminated union of three cases: 1) a simple immediate,
 885 2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
 886 MCExpr.
 887 </p>
 888
 889 <p>MCInst is the common currency used to represent machine instructions at the
 890 MC layer.  It is the type used by the instruction encoder, the instruction
 891 printer, and the type generated by the assembly parser and disassembler.
 892 </p>
 893
 894 </div>
 895
 896
 897 <!-- *********************************************************************** -->
 898 <div class="doc_section">
 899   <a name="codegenalgs">Target-independent code generation algorithms</a>
 900 </div>
 901 <!-- *********************************************************************** -->
 902
 903 <div class="doc_text">
 904
 905 <p>This section documents the phases described in the
 906    <a href="#high-level-design">high-level design of the code generator</a>.
 907    It explains how they work and some of the rationale behind their design.</p>
 908
 909 </div>
 910
 911 <!-- ======================================================================= -->
 912 <div class="doc_subsection">
 913   <a name="instselect">Instruction Selection</a>
 914 </div>
 915
 916 <div class="doc_text">
 917
 918 <p>Instruction Selection is the process of translating LLVM code presented to
 919    the code generator into target-specific machine instructions.  There are
 920    several well-known ways to do this in the literature.  LLVM uses a
 921    SelectionDAG based instruction selector.</p>
 922
 923 <p>Portions of the DAG instruction selector are generated from the target
 924    description (<tt>*.td</tt>) files.  Our goal is for the entire instruction
 925    selector to be generated from these <tt>.td</tt> files, though currently
 926    there are still things that require custom C++ code.</p>
 927
 928 </div>
 929
 930 <!-- _______________________________________________________________________ -->
 931 <div class="doc_subsubsection">
 932   <a name="selectiondag_intro">Introduction to SelectionDAGs</a>
 933 </div>
 934
 935 <div class="doc_text">
 936
 937 <p>The SelectionDAG provides an abstraction for code representation in a way
 938    that is amenable to instruction selection using automatic techniques
 939    (e.g. dynamic-programming based optimal pattern matching selectors). It is
 940    also well-suited to other phases of code generation; in particular,
 941    instruction scheduling (SelectionDAG's are very close to scheduling DAGs
 942    post-selection).  Additionally, the SelectionDAG provides a host
 943    representation where a large variety of very-low-level (but
 944    target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
 945    performed; ones which require extensive information about the instructions
 946    efficiently supported by the target.</p>
 947
 948 <p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
 949    <tt>SDNode</tt> class.  The primary payload of the <tt>SDNode</tt> is its
 950    operation code (Opcode) that indicates what operation the node performs and
 951    the operands to the operation.  The various operation node types are
 952    described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
 953    file.</p>
 954
 955 <p>Although most operations define a single value, each node in the graph may
 956    define multiple values.  For example, a combined div/rem operation will
 957    define both the dividend and the remainder. Many other situations require
 958    multiple values as well.  Each node also has some number of operands, which
 959    are edges to the node defining the used value.  Because nodes may define
 960    multiple values, edges are represented by instances of the <tt>SDValue</tt>
 961    class, which is a <tt>&lt;SDNode, unsigned&gt;</tt> pair, indicating the node
 962    and result value being used, respectively.  Each value produced by
 963    an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
 964    indicating what the type of the value is.</p>
 965
 966 <p>SelectionDAGs contain two different kinds of values: those that represent
 967    data flow and those that represent control flow dependencies.  Data values
 968    are simple edges with an integer or floating point value type.  Control edges
 969    are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
 970    These edges provide an ordering between nodes that have side effects (such as
 971    loads, stores, calls, returns, etc).  All nodes that have side effects should
 972    take a token chain as input and produce a new one as output.  By convention,
 973    token chain inputs are always operand #0, and chain results are always the
 974    last value produced by an operation.</p>
 975
 976 <p>A SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
 977    always a marker node with an Opcode of <tt>ISD::EntryToken</tt>.  The Root
 978    node is the final side-effecting node in the token chain. For example, in a
 979    single basic block function it would be the return node.</p>
 980
 981 <p>One important concept for SelectionDAGs is the notion of a "legal" vs.
 982    "illegal" DAG.  A legal DAG for a target is one that only uses supported
 983    operations and supported types.  On a 32-bit PowerPC, for example, a DAG with
 984    a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
 985    uses a SREM or UREM operation.  The
 986    <a href="#selectinodag_legalize_types">legalize types</a> and
 987    <a href="#selectiondag_legalize">legalize operations</a> phases are
 988    responsible for turning an illegal DAG into a legal DAG.</p>
 989
 990 </div>
 991
 992 <!-- _______________________________________________________________________ -->
 993 <div class="doc_subsubsection">
 994   <a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
 995 </div>
 996
 997 <div class="doc_text">
 998
 999 <p>SelectionDAG-based instruction selection consists of the following steps:</p>
1000
1001 <ol>
1002   <li><a href="#selectiondag_build">Build initial DAG</a> &mdash; This stage
1003       performs a simple translation from the input LLVM code to an illegal
1004       SelectionDAG.</li>
1005
1006   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; This
1007       stage performs simple optimizations on the SelectionDAG to simplify it,
1008       and recognize meta instructions (like rotates
1009       and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
1010       operations.  This makes the resultant code more efficient and
1011       the <a href="#selectiondag_select">select instructions from DAG</a> phase
1012       (below) simpler.</li>
1013
1014   <li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
1015       &mdash; This stage transforms SelectionDAG nodes to eliminate any types
1016       that are unsupported on the target.</li>
1017
1018   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; The
1019       SelectionDAG optimizer is run to clean up redundancies exposed by type
1020       legalization.</li>
1021
1022   <li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> &mdash;
1023       This stage transforms SelectionDAG nodes to eliminate any types that are
1024       unsupported on the target.</li>
1025
1026   <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> &mdash; The
1027       SelectionDAG optimizer is run to eliminate inefficiencies introduced by
1028       operation legalization.</li>
1029
1030   <li><a href="#selectiondag_select">Select instructions from DAG</a> &mdash;
1031       Finally, the target instruction selector matches the DAG operations to
1032       target instructions.  This process translates the target-independent input
1033       DAG into another DAG of target instructions.</li>
1034
1035   <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
1036       &mdash; The last phase assigns a linear order to the instructions in the
1037       target-instruction DAG and emits them into the MachineFunction being
1038       compiled.  This step uses traditional prepass scheduling techniques.</li>
1039 </ol>
1040
1041 <p>After all of these steps are complete, the SelectionDAG is destroyed and the
1042    rest of the code generation passes are run.</p>
1043
1044 <p>One great way to visualize what is going on here is to take advantage of a
1045    few LLC command line options.  The following options pop up a window
1046    displaying the SelectionDAG at specific times (if you only get errors printed
1047    to the console while using this, you probably
1048    <a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
1049    to add support for it).</p>
1050
1051 <ul>
1052   <li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
1053       before the first optimization pass.</li>
1054
1055   <li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
1056
1057   <li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
1058       optimization pass.</li>
1059
1060   <li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
1061
1062   <li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
1063 </ul>
1064
1065 <p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
1066    This graph is based on the final SelectionDAG, with nodes that must be
1067    scheduled together bundled into a single scheduling-unit node, and with
1068    immediate operands and other nodes that aren't relevant for scheduling
1069    omitted.</p>
1070
1071 </div>
1072
1073 <!-- _______________________________________________________________________ -->
1074 <div class="doc_subsubsection">
1075   <a name="selectiondag_build">Initial SelectionDAG Construction</a>
1076 </div>
1077
1078 <div class="doc_text">
1079
1080 <p>The initial SelectionDAG is na&iuml;vely peephole expanded from the LLVM
1081    input by the <tt>SelectionDAGLowering</tt> class in the
1082    <tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file.  The intent of
1083    this pass is to expose as much low-level, target-specific details to the
1084    SelectionDAG as possible.  This pass is mostly hard-coded (e.g. an
1085    LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
1086    <tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
1087    requires target-specific hooks to lower calls, returns, varargs, etc.  For
1088    these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
1089    interface is used.</p>
1090
1091 </div>
1092
1093 <!-- _______________________________________________________________________ -->
1094 <div class="doc_subsubsection">
1095   <a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
1096 </div>
1097
1098 <div class="doc_text">
1099
1100 <p>The Legalize phase is in charge of converting a DAG to only use the types
1101    that are natively supported by the target.</p>
1102
1103 <p>There are two main ways of converting values of unsupported scalar types to
1104    values of supported types: converting small types to larger types
1105    ("promoting"), and breaking up large integer types into smaller ones
1106    ("expanding").  For example, a target might require that all f32 values are
1107    promoted to f64 and that all i1/i8/i16 values are promoted to i32.  The same
1108    target might require that all i64 values be expanded into pairs of i32
1109    values.  These changes can insert sign and zero extensions as needed to make
1110    sure that the final code has the same behavior as the input.</p>
1111
1112 <p>There are two main ways of converting values of unsupported vector types to
1113    value of supported types: splitting vector types, multiple times if
1114    necessary, until a legal type is found, and extending vector types by adding
1115    elements to the end to round them out to legal types ("widening").  If a
1116    vector gets split all the way down to single-element parts with no supported
1117    vector type being found, the elements are converted to scalars
1118    ("scalarizing").</p>
1119
1120 <p>A target implementation tells the legalizer which types are supported (and
1121    which register class to use for them) by calling the
1122    <tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
1123
1124 </div>
1125
1126 <!-- _______________________________________________________________________ -->
1127 <div class="doc_subsubsection">
1128   <a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
1129 </div>
1130
1131 <div class="doc_text">
1132
1133 <p>The Legalize phase is in charge of converting a DAG to only use the
1134    operations that are natively supported by the target.</p>
1135
1136 <p>Targets often have weird constraints, such as not supporting every operation
1137    on every supported datatype (e.g. X86 does not support byte conditional moves
1138    and PowerPC does not support sign-extending loads from a 16-bit memory
1139    location).  Legalize takes care of this by open-coding another sequence of
1140    operations to emulate the operation ("expansion"), by promoting one type to a
1141    larger type that supports the operation ("promotion"), or by using a
1142    target-specific hook to implement the legalization ("custom").</p>
1143
1144 <p>A target implementation tells the legalizer which operations are not
1145    supported (and which of the above three actions to take) by calling the
1146    <tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
1147    constructor.</p>
1148
1149 <p>Prior to the existence of the Legalize passes, we required that every target
1150    <a href="#selectiondag_optimize">selector</a> supported and handled every
1151    operator and type even if they are not natively supported.  The introduction
1152    of the Legalize phases allows all of the canonicalization patterns to be
1153    shared across targets, and makes it very easy to optimize the canonicalized
1154    code because it is still in the form of a DAG.</p>
1155
1156 </div>
1157
1158 <!-- _______________________________________________________________________ -->
1159 <div class="doc_subsubsection">
1160   <a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
1161   Combiner</a>
1162 </div>
1163
1164 <div class="doc_text">
1165
1166 <p>The SelectionDAG optimization phase is run multiple times for code
1167    generation, immediately after the DAG is built and once after each
1168    legalization.  The first run of the pass allows the initial code to be
1169    cleaned up (e.g. performing optimizations that depend on knowing that the
1170    operators have restricted type inputs).  Subsequent runs of the pass clean up
1171    the messy code generated by the Legalize passes, which allows Legalize to be
1172    very simple (it can focus on making code legal instead of focusing on
1173    generating <em>good</em> and legal code).</p>
1174
1175 <p>One important class of optimizations performed is optimizing inserted sign
1176    and zero extension instructions.  We currently use ad-hoc techniques, but
1177    could move to more rigorous techniques in the future.  Here are some good
1178    papers on the subject:</p>
1179
1180 <p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
1181    integer arithmetic</a>"<br>
1182    Kevin Redwine and Norman Ramsey<br>
1183    International Conference on Compiler Construction (CC) 2004</p>
1184
1185 <p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
1186    sign extension elimination</a>"<br>
1187    Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
1188    Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
1189    and Implementation.</p>
1190
1191 </div>
1192
1193 <!-- _______________________________________________________________________ -->
1194 <div class="doc_subsubsection">
1195   <a name="selectiondag_select">SelectionDAG Select Phase</a>
1196 </div>
1197
1198 <div class="doc_text">
1199
1200 <p>The Select phase is the bulk of the target-specific code for instruction
1201    selection.  This phase takes a legal SelectionDAG as input, pattern matches
1202    the instructions supported by the target to this DAG, and produces a new DAG
1203    of target code.  For example, consider the following LLVM fragment:</p>
1204
1205 <div class="doc_code">
1206 <pre>
1207 %t1 = fadd float %W, %X
1208 %t2 = fmul float %t1, %Y
1209 %t3 = fadd float %t2, %Z
1210 </pre>
1211 </div>
1212
1213 <p>This LLVM code corresponds to a SelectionDAG that looks basically like
1214    this:</p>
1215
1216 <div class="doc_code">
1217 <pre>
1218 (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
1219 </pre>
1220 </div>
1221
1222 <p>If a target supports floating point multiply-and-add (FMA) operations, one of
1223    the adds can be merged with the multiply.  On the PowerPC, for example, the
1224    output of the instruction selector might look like this DAG:</p>
1225
1226 <div class="doc_code">
1227 <pre>
1228 (FMADDS (FADDS W, X), Y, Z)
1229 </pre>
1230 </div>
1231
1232 <p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
1233 first two operands and adds the third (as single-precision floating-point
1234 numbers).  The <tt>FADDS</tt> instruction is a simple binary single-precision
1235 add instruction.  To perform this pattern match, the PowerPC backend includes
1236 the following instruction definitions:</p>
1237
1238 <div class="doc_code">
1239 <pre>
1240 def FMADDS : AForm_1&lt;59, 29,
1241                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
1242                     "fmadds $FRT, $FRA, $FRC, $FRB",
1243                     [<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
1244                                            F4RC:$FRB))</b>]&gt;;
1245 def FADDS : AForm_2&lt;59, 21,
1246                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
1247                     "fadds $FRT, $FRA, $FRB",
1248                     [<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]&gt;;
1249 </pre>
1250 </div>
1251
1252 <p>The portion of the instruction definition in bold indicates the pattern used
1253    to match the instruction.  The DAG operators
1254    (like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
1255    the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file.  "
1256    <tt>F4RC</tt>" is the register class of the input and result values.</p>
1257
1258 <p>The TableGen DAG instruction selector generator reads the instruction
1259    patterns in the <tt>.td</tt> file and automatically builds parts of the
1260    pattern matching code for your target.  It has the following strengths:</p>
1261
1262 <ul>
1263   <li>At compiler-compiler time, it analyzes your instruction patterns and tells
1264       you if your patterns make sense or not.</li>
1265
1266   <li>It can handle arbitrary constraints on operands for the pattern match.  In
1267       particular, it is straight-forward to say things like "match any immediate
1268       that is a 13-bit sign-extended value".  For examples, see the
1269       <tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
1270       backend.</li>
1271
1272   <li>It knows several important identities for the patterns defined.  For
1273       example, it knows that addition is commutative, so it allows the
1274       <tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
1275       well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
1276       to specially handle this case.</li>
1277
1278   <li>It has a full-featured type-inferencing system.  In particular, you should
1279       rarely have to explicitly tell the system what type parts of your patterns
1280       are.  In the <tt>FMADDS</tt> case above, we didn't have to tell
1281       <tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
1282       It was able to infer and propagate this knowledge from the fact that
1283       <tt>F4RC</tt> has type 'f32'.</li>
1284
1285   <li>Targets can define their own (and rely on built-in) "pattern fragments".
1286       Pattern fragments are chunks of reusable patterns that get inlined into
1287       your patterns during compiler-compiler time.  For example, the integer
1288       "<tt>(not x)</tt>" operation is actually defined as a pattern fragment
1289       that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
1290       have a native '<tt>not</tt>' operation.  Targets can define their own
1291       short-hand fragments as they see fit.  See the definition of
1292       '<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
1293
1294   <li>In addition to instructions, targets can specify arbitrary patterns that
1295       map to one or more instructions using the 'Pat' class.  For example, the
1296       PowerPC has no way to load an arbitrary integer immediate into a register
1297       in one instruction. To tell tblgen how to do this, it defines:
1298       <br>
1299       <br>
1300 <div class="doc_code">
1301 <pre>
1302 // Arbitrary immediate support.  Implement in terms of LIS/ORI.
1303 def : Pat&lt;(i32 imm:$imm),
1304           (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))&gt;;
1305 </pre>
1306 </div>
1307       <br>
1308       If none of the single-instruction patterns for loading an immediate into a
1309       register match, this will be used.  This rule says "match an arbitrary i32
1310       immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
1311       an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
1312       the left 16 bits') instruction".  To make this work, the
1313       <tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
1314       the input immediate (in this case, take the high or low 16-bits of the
1315       immediate).</li>
1316
1317   <li>While the system does automate a lot, it still allows you to write custom
1318       C++ code to match special cases if there is something that is hard to
1319       express.</li>
1320 </ul>
1321
1322 <p>While it has many strengths, the system currently has some limitations,
1323    primarily because it is a work in progress and is not yet finished:</p>
1324
1325 <ul>
1326   <li>Overall, there is no way to define or match SelectionDAG nodes that define
1327       multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
1328       etc).  This is the biggest reason that you currently still <em>have
1329       to</em> write custom C++ code for your instruction selector.</li>
1330
1331   <li>There is no great way to support matching complex addressing modes yet.
1332       In the future, we will extend pattern fragments to allow them to define
1333       multiple values (e.g. the four operands of the <a href="#x86_memory">X86
1334       addressing mode</a>, which are currently matched with custom C++ code).
1335       In addition, we'll extend fragments so that a fragment can match multiple
1336       different patterns.</li>
1337
1338   <li>We don't automatically infer flags like isStore/isLoad yet.</li>
1339
1340   <li>We don't automatically generate the set of supported registers and
1341       operations for the <a href="#selectiondag_legalize">Legalizer</a>
1342       yet.</li>
1343
1344   <li>We don't have a way of tying in custom legalized nodes yet.</li>
1345 </ul>
1346
1347 <p>Despite these limitations, the instruction selector generator is still quite
1348    useful for most of the binary and logical operations in typical instruction
1349    sets.  If you run into any problems or can't figure out how to do something,
1350    please let Chris know!</p>
1351
1352 </div>
1353
1354 <!-- _______________________________________________________________________ -->
1355 <div class="doc_subsubsection">
1356   <a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
1357 </div>
1358
1359 <div class="doc_text">
1360
1361 <p>The scheduling phase takes the DAG of target instructions from the selection
1362    phase and assigns an order.  The scheduler can pick an order depending on
1363    various constraints of the machines (i.e. order for minimal register pressure
1364    or try to cover instruction latencies).  Once an order is established, the
1365    DAG is converted to a list
1366    of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
1367    destroyed.</p>
1368
1369 <p>Note that this phase is logically separate from the instruction selection
1370    phase, but is tied to it closely in the code because it operates on
1371    SelectionDAGs.</p>
1372
1373 </div>
1374
1375 <!-- _______________________________________________________________________ -->
1376 <div class="doc_subsubsection">
1377   <a name="selectiondag_future">Future directions for the SelectionDAG</a>
1378 </div>
1379
1380 <div class="doc_text">
1381
1382 <ol>
1383   <li>Optional function-at-a-time selection.</li>
1384
1385   <li>Auto-generate entire selector from <tt>.td</tt> file.</li>
1386 </ol>
1387
1388 </div>
1389
1390 <!-- ======================================================================= -->
1391 <div class="doc_subsection">
1392   <a name="ssamco">SSA-based Machine Code Optimizations</a>
1393 </div>
1394 <div class="doc_text"><p>To Be Written</p></div>
1395
1396 <!-- ======================================================================= -->
1397 <div class="doc_subsection">
1398   <a name="liveintervals">Live Intervals</a>
1399 </div>
1400
1401 <div class="doc_text">
1402
1403 <p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
1404    They are used by some <a href="#regalloc">register allocator</a> passes to
1405    determine if two or more virtual registers which require the same physical
1406    register are live at the same point in the program (i.e., they conflict).
1407    When this situation occurs, one virtual register must be <i>spilled</i>.</p>
1408
1409 </div>
1410
1411 <!-- _______________________________________________________________________ -->
1412 <div class="doc_subsubsection">
1413   <a name="livevariable_analysis">Live Variable Analysis</a>
1414 </div>
1415
1416 <div class="doc_text">
1417
1418 <p>The first step in determining the live intervals of variables is to calculate
1419    the set of registers that are immediately dead after the instruction (i.e.,
1420    the instruction calculates the value, but it is never used) and the set of
1421    registers that are used by the instruction, but are never used after the
1422    instruction (i.e., they are killed). Live variable information is computed
1423    for each <i>virtual</i> register and <i>register allocatable</i> physical
1424    register in the function.  This is done in a very efficient manner because it
1425    uses SSA to sparsely compute lifetime information for virtual registers
1426    (which are in SSA form) and only has to track physical registers within a
1427    block.  Before register allocation, LLVM can assume that physical registers
1428    are only live within a single basic block.  This allows it to do a single,
1429    local analysis to resolve physical register lifetimes within each basic
1430    block. If a physical register is not register allocatable (e.g., a stack
1431    pointer or condition codes), it is not tracked.</p>
1432
1433 <p>Physical registers may be live in to or out of a function. Live in values are
1434    typically arguments in registers. Live out values are typically return values
1435    in registers. Live in values are marked as such, and are given a dummy
1436    "defining" instruction during live intervals analysis. If the last basic
1437    block of a function is a <tt>return</tt>, then it's marked as using all live
1438    out values in the function.</p>
1439
1440 <p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
1441    the live variable information from a depth first traversal of the CFG of the
1442    function won't guarantee that a virtual register used by the <tt>PHI</tt>
1443    node is defined before it's used. When a <tt>PHI</tt> node is encountered,
1444    only the definition is handled, because the uses will be handled in other
1445    basic blocks.</p>
1446
1447 <p>For each <tt>PHI</tt> node of the current basic block, we simulate an
1448    assignment at the end of the current basic block and traverse the successor
1449    basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
1450    the <tt>PHI</tt> node's operands is coming from the current basic block, then
1451    the variable is marked as <i>alive</i> within the current basic block and all
1452    of its predecessor basic blocks, until the basic block with the defining
1453    instruction is encountered.</p>
1454
1455 </div>
1456
1457 <!-- _______________________________________________________________________ -->
1458 <div class="doc_subsubsection">
1459   <a name="liveintervals_analysis">Live Intervals Analysis</a>
1460 </div>
1461
1462 <div class="doc_text">
1463
1464 <p>We now have the information available to perform the live intervals analysis
1465    and build the live intervals themselves.  We start off by numbering the basic
1466    blocks and machine instructions.  We then handle the "live-in" values.  These
1467    are in physical registers, so the physical register is assumed to be killed
1468    by the end of the basic block.  Live intervals for virtual registers are
1469    computed for some ordering of the machine instructions <tt>[1, N]</tt>.  A
1470    live interval is an interval <tt>[i, j)</tt>, where <tt>1 &lt;= i &lt;= j
1471    &lt; N</tt>, for which a variable is live.</p>
1472
1473 <p><i><b>More to come...</b></i></p>
1474
1475 </div>
1476
1477 <!-- ======================================================================= -->
1478 <div class="doc_subsection">
1479   <a name="regalloc">Register Allocation</a>
1480 </div>
1481
1482 <div class="doc_text">
1483
1484 <p>The <i>Register Allocation problem</i> consists in mapping a program
1485    <i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
1486    to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
1487    number of physical registers. Each target architecture has a different number
1488    of physical registers. If the number of physical registers is not enough to
1489    accommodate all the virtual registers, some of them will have to be mapped
1490    into memory. These virtuals are called <i>spilled virtuals</i>.</p>
1491
1492 </div>
1493
1494 <!-- _______________________________________________________________________ -->
1495
1496 <div class="doc_subsubsection">
1497   <a name="regAlloc_represent">How registers are represented in LLVM</a>
1498 </div>
1499
1500 <div class="doc_text">
1501
1502 <p>In LLVM, physical registers are denoted by integer numbers that normally
1503    range from 1 to 1023. To see how this numbering is defined for a particular
1504    architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
1505    architecture. For instance, by
1506    inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
1507    32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
1508    <tt>MM0</tt> is mapped to 48.</p>
1509
1510 <p>Some architectures contain registers that share the same physical location. A
1511    notable example is the X86 platform. For instance, in the X86 architecture,
1512    the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
1513    bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
1514    particular architecture, you can check which registers are aliased by
1515    inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
1516    <tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
1517    all the physical registers aliased to the register <tt>p_reg</tt>.</p>
1518
1519 <p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
1520    Elements in the same register class are functionally equivalent, and can be
1521    interchangeably used. Each virtual register can only be mapped to physical
1522    registers of a particular class. For instance, in the X86 architecture, some
1523    virtuals can only be allocated to 8 bit registers.  A register class is
1524    described by <tt>TargetRegisterClass</tt> objects.  To discover if a virtual
1525    register is compatible with a given physical, this code can be used:</p>
1526
1527 <div class="doc_code">
1528 <pre>
1529 bool RegMapping_Fer::compatible_class(MachineFunction &amp;mf,
1530                                       unsigned v_reg,
1531                                       unsigned p_reg) {
1532   assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &amp;&amp;
1533          "Target register must be physical");
1534   const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
1535   return trc-&gt;contains(p_reg);
1536 }
1537 </pre>
1538 </div>
1539
1540 <p>Sometimes, mostly for debugging purposes, it is useful to change the number
1541    of physical registers available in the target architecture. This must be done
1542    statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
1543    for <tt>RegisterClass</tt>, the last parameter of which is a list of
1544    registers. Just commenting some out is one simple way to avoid them being
1545    used. A more polite way is to explicitly exclude some registers from
1546    the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
1547    class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
1548    </p>
1549
1550 <p>Virtual registers are also denoted by integer numbers. Contrary to physical
1551    registers, different virtual registers never share the same number. The
1552    smallest virtual register is normally assigned the number 1024. This may
1553    change, so, in order to know which is the first virtual register, you should
1554    access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
1555    number is greater than or equal
1556    to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
1557    register. Whereas physical registers are statically defined in
1558    a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
1559    application developer, that is not the case with virtual registers.  In order
1560    to create new virtual registers, use the
1561    method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
1562    will return a virtual register with the highest code.</p>
1563
1564 <p>Before register allocation, the operands of an instruction are mostly virtual
1565    registers, although physical registers may also be used. In order to check if
1566    a given machine operand is a register, use the boolean
1567    function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
1568    a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
1569    or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
1570    defines the registers 1024, and uses registers 1025 and 1026. Given a
1571    register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
1572    register is being used by the instruction. The
1573    method <tt>MachineOperand::isDef()</tt> informs if that registers is being
1574    defined.</p>
1575
1576 <p>We will call physical registers present in the LLVM bitcode before register
1577    allocation <i>pre-colored registers</i>. Pre-colored registers are used in
1578    many different situations, for instance, to pass parameters of functions
1579    calls, and to store results of particular instructions. There are two types
1580    of pre-colored registers: the ones <i>implicitly</i> defined, and
1581    those <i>explicitly</i> defined. Explicitly defined registers are normal
1582    operands, and can be accessed
1583    with <tt>MachineInstr::getOperand(int)::getReg()</tt>.  In order to check
1584    which registers are implicitly defined by an instruction, use
1585    the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
1586    where <tt>opcode</tt> is the opcode of the target instruction. One important
1587    difference between explicit and implicit physical registers is that the
1588    latter are defined statically for each instruction, whereas the former may
1589    vary depending on the program being compiled. For example, an instruction
1590    that represents a function call will always implicitly define or use the same
1591    set of physical registers. To read the registers implicitly used by an
1592    instruction,
1593    use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
1594    registers impose constraints on any register allocation algorithm. The
1595    register allocator must make sure that none of them are overwritten by
1596    the values of virtual registers while still alive.</p>
1597
1598 </div>
1599
1600 <!-- _______________________________________________________________________ -->
1601
1602 <div class="doc_subsubsection">
1603   <a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
1604 </div>
1605
1606 <div class="doc_text">
1607
1608 <p>There are two ways to map virtual registers to physical registers (or to
1609    memory slots). The first way, that we will call <i>direct mapping</i>, is
1610    based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
1611    and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
1612    mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
1613    and stores sending and getting values to and from memory.</p>
1614
1615 <p>The direct mapping provides more flexibility to the developer of the register
1616    allocator; however, it is more error prone, and demands more implementation
1617    work.  Basically, the programmer will have to specify where load and store
1618    instructions should be inserted in the target function being compiled in
1619    order to get and store values in memory. To assign a physical register to a
1620    virtual register present in a given operand,
1621    use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
1622    use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
1623    load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
1624
1625 <p>The indirect mapping shields the application developer from the complexities
1626    of inserting load and store instructions. In order to map a virtual register
1627    to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>.  In
1628    order to map a certain virtual register to memory,
1629    use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
1630    the stack slot where <tt>vreg</tt>'s value will be located.  If it is
1631    necessary to map another virtual register to the same stack slot,
1632    use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
1633    important point to consider when using the indirect mapping, is that even if
1634    a virtual register is mapped to memory, it still needs to be mapped to a
1635    physical register. This physical register is the location where the virtual
1636    register is supposed to be found before being stored or after being
1637    reloaded.</p>
1638
1639 <p>If the indirect strategy is used, after all the virtual registers have been
1640    mapped to physical registers or stack slots, it is necessary to use a spiller
1641    object to place load and store instructions in the code. Every virtual that
1642    has been mapped to a stack slot will be stored to memory after been defined
1643    and will be loaded before being used. The implementation of the spiller tries
1644    to recycle load/store instructions, avoiding unnecessary instructions. For an
1645    example of how to invoke the spiller,
1646    see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
1647    in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
1648
1649 </div>
1650
1651 <!-- _______________________________________________________________________ -->
1652 <div class="doc_subsubsection">
1653   <a name="regAlloc_twoAddr">Handling two address instructions</a>
1654 </div>
1655
1656 <div class="doc_text">
1657
1658 <p>With very rare exceptions (e.g., function calls), the LLVM machine code
1659    instructions are three address instructions. That is, each instruction is
1660    expected to define at most one register, and to use at most two registers.
1661    However, some architectures use two address instructions. In this case, the
1662    defined register is also one of the used register. For instance, an
1663    instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
1664    to <tt>%EAX = %EAX + %EBX</tt>.</p>
1665
1666 <p>In order to produce correct code, LLVM must convert three address
1667    instructions that represent two address instructions into true two address
1668    instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
1669    this specific purpose. It must be run before register allocation takes
1670    place. After its execution, the resulting code may no longer be in SSA
1671    form. This happens, for instance, in situations where an instruction such
1672    as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
1673
1674 <div class="doc_code">
1675 <pre>
1676 %a = MOVE %b
1677 %a = ADD %a %c
1678 </pre>
1679 </div>
1680
1681 <p>Notice that, internally, the second instruction is represented as
1682    <tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
1683    used and defined by the instruction.</p>
1684
1685 </div>
1686
1687 <!-- _______________________________________________________________________ -->
1688 <div class="doc_subsubsection">
1689   <a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
1690 </div>
1691
1692 <div class="doc_text">
1693
1694 <p>An important transformation that happens during register allocation is called
1695    the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
1696    that are performed on the control flow graph of programs. However,
1697    traditional instruction sets do not implement PHI instructions. Thus, in
1698    order to generate executable code, compilers must replace PHI instructions
1699    with other instructions that preserve their semantics.</p>
1700
1701 <p>There are many ways in which PHI instructions can safely be removed from the
1702    target code. The most traditional PHI deconstruction algorithm replaces PHI
1703    instructions with copy instructions. That is the strategy adopted by
1704    LLVM. The SSA deconstruction algorithm is implemented
1705    in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
1706    identifier <tt>PHIEliminationID</tt> must be marked as required in the code
1707    of the register allocator.</p>
1708
1709 </div>
1710
1711 <!-- _______________________________________________________________________ -->
1712 <div class="doc_subsubsection">
1713   <a name="regAlloc_fold">Instruction folding</a>
1714 </div>
1715
1716 <div class="doc_text">
1717
1718 <p><i>Instruction folding</i> is an optimization performed during register
1719    allocation that removes unnecessary copy instructions. For instance, a
1720    sequence of instructions such as:</p>
1721
1722 <div class="doc_code">
1723 <pre>
1724 %EBX = LOAD %mem_address
1725 %EAX = COPY %EBX
1726 </pre>
1727 </div>
1728
1729 <p>can be safely substituted by the single instruction:</p>
1730
1731 <div class="doc_code">
1732 <pre>
1733 %EAX = LOAD %mem_address
1734 </pre>
1735 </div>
1736
1737 <p>Instructions can be folded with
1738    the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
1739    taken when folding instructions; a folded instruction can be quite different
1740    from the original
1741    instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
1742    in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
1743    use.</p>
1744
1745 </div>
1746
1747 <!-- _______________________________________________________________________ -->
1748
1749 <div class="doc_subsubsection">
1750   <a name="regAlloc_builtIn">Built in register allocators</a>
1751 </div>
1752
1753 <div class="doc_text">
1754
1755 <p>The LLVM infrastructure provides the application developer with three
1756    different register allocators:</p>
1757
1758 <ul>
1759   <li><i>Linear Scan</i> &mdash; <i>The default allocator</i>. This is the
1760       well-know linear scan register allocator. Whereas the
1761       <i>Simple</i> and <i>Local</i> algorithms use a direct mapping
1762       implementation technique, the <i>Linear Scan</i> implementation
1763       uses a spiller in order to place load and stores.</li>
1764
1765   <li><i>Fast</i> &mdash; This register allocator is the default for debug
1766       builds. It allocates registers on a basic block level, attempting to keep
1767       values in registers and reusing registers as appropriate.</li>
1768
1769   <li><i>PBQP</i> &mdash; A Partitioned Boolean Quadratic Programming (PBQP)
1770       based register allocator. This allocator works by constructing a PBQP
1771       problem representing the register allocation problem under consideration,
1772       solving this using a PBQP solver, and mapping the solution back to a
1773       register assignment.</li>
1774
1775 </ul>
1776
1777 <p>The type of register allocator used in <tt>llc</tt> can be chosen with the
1778    command line option <tt>-regalloc=...</tt>:</p>
1779
1780 <div class="doc_code">
1781 <pre>
1782 $ llc -regalloc=linearscan file.bc -o ln.s;
1783 $ llc -regalloc=fast file.bc -o fa.s;
1784 $ llc -regalloc=pbqp file.bc -o pbqp.s;
1785 </pre>
1786 </div>
1787
1788 </div>
1789
1790 <!-- ======================================================================= -->
1791 <div class="doc_subsection">
1792   <a name="proepicode">Prolog/Epilog Code Insertion</a>
1793 </div>
1794 <div class="doc_text"><p>To Be Written</p></div>
1795 <!-- ======================================================================= -->
1796 <div class="doc_subsection">
1797   <a name="latemco">Late Machine Code Optimizations</a>
1798 </div>
1799 <div class="doc_text"><p>To Be Written</p></div>
1800
1801 <!-- ======================================================================= -->
1802 <div class="doc_subsection">
1803   <a name="codeemit">Code Emission</a>
1804 </div>
1805
1806 <div class="doc_text">
1807
1808 <p>The code emission step of code generation is responsible for lowering from
1809 the code generator abstractions (like <a
1810 href="#machinefunction">MachineFunction</a>, <a
1811 href="#machineinstr">MachineInstr</a>, etc) down
1812 to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
1813 <a href="#mcstreamer">MCStreamer</a>, etc).  This is
1814 done with a combination of several different classes: the (misnamed)
1815 target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
1816 (such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
1817
1818 <p>Since the MC layer works at the level of abstraction of object files, it
1819 doesn't have a notion of functions, global variables etc.  Instead, it thinks
1820 about labels, directives, and instructions.  A key class used at this time is
1821 the MCStreamer class.  This is an abstract API that is implemented in different
1822 ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
1823 an "assembler API".  MCStreamer has one method per directive, such as EmitLabel,
1824 EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
1825 level directives.
1826 </p>
1827
1828 <p>If you are interested in implementing a code generator for a target, there
1829 are three important things that you have to implement for your target:</p>
1830
1831 <ol>
1832 <li>First, you need a subclass of AsmPrinter for your target.  This class
1833 implements the general lowering process converting MachineFunction's into MC
1834 label constructs.  The AsmPrinter base class provides a number of useful methods
1835 and routines, and also allows you to override the lowering process in some
1836 important ways.  You should get much of the lowering for free if you are
1837 implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
1838 class implements much of the common logic.</li>
1839
1840 <li>Second, you need to implement an instruction printer for your target.  The
1841 instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
1842 raw_ostream as text.  Most of this is automatically generated from the .td file
1843 (when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
1844 instructions), but you need to implement routines to print operands.</li>
1845
1846 <li>Third, you need to implement code that lowers a <a
1847 href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
1848 "&lt;target&gt;MCInstLower.cpp".  This lowering process is often target
1849 specific, and is responsible for turning jump table entries, constant pool
1850 indices, global variable addresses, etc into MCLabels as appropriate.  This
1851 translation layer is also responsible for expanding pseudo ops used by the code
1852 generator into the actual machine instructions they correspond to. The MCInsts
1853 that are generated by this are fed into the instruction printer or the encoder.
1854 </li>
1855
1856 </ol>
1857
1858 <p>Finally, at your choosing, you can also implement an subclass of
1859 MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
1860 This is important if you want to support direct .o file emission, or would like
1861 to implement an assembler for your target.</p>
1862
1863 </div>
1864
1865
1866 <!-- *********************************************************************** -->
1867 <div class="doc_section">
1868   <a name="nativeassembler">Implementing a Native Assembler</a>
1869 </div>
1870 <!-- *********************************************************************** -->
1871
1872 <div class="doc_text">
1873
1874 <p>Though you're probably reading this because you want to write or maintain a
1875 compiler backend, LLVM also fully supports building a native assemblers too.
1876 We've tried hard to automate the generation of the assembler from the .td files
1877 (in particular the instruction syntax and encodings), which means that a large
1878 part of the manual and repetitive data entry can be factored and shared with the
1879 compiler.</p>
1880
1881
1882
1883 </div>
1884
1885
1886 <!-- ======================================================================= -->
1887 <div class="doc_subsection">
1888   <a name="proepicode">Prolog/Epilog Code Insertion</a>
1889 </div>
1890 <div class="doc_text"><p>To Be Written</p></div>
1891
1892
1893
1894
1895 <!-- *********************************************************************** -->
1896 <div class="doc_section">
1897   <a name="targetimpls">Target-specific Implementation Notes</a>
1898 </div>
1899 <!-- *********************************************************************** -->
1900
1901 <div class="doc_text">
1902
1903 <p>This section of the document explains features or design decisions that are
1904    specific to the code generator for a particular target.  First we start
1905    with a table that summarizes what features are supported by each target.</p>
1906
1907 </div>
1908
1909 <!-- ======================================================================= -->
1910 <div class="doc_subsection">
1911   <a name="targetfeatures">Target Feature Matrix</a>
1912 </div>
1913
1914 <style type="text/css">
1915   .unknown { background-color: #C0C0C0; text-align: center; }
1916   .unknown:before { content: "?" }
1917   .no { background-color: #C11B17 }
1918   .no:before { content: "N" }
1919   .partial { background-color: #F88017 }
1920   .yes { background-color: #00FF00; }
1921   .yes:before { content: "Y" }
1922 </style>
1923
1924
1925 <div class="doc_text">
1926
1927 <p>Note that this table does not include the C backend or Cpp backends, since
1928 they do not use the target independent code generator infrastructure.  It also
1929 doesn't list features that are not supported fully by any target yet.  It
1930 considers a feature to be supported if at least one subtarget supports it.  A
1931 feature being supported means that it is useful and works for most cases, it
1932 does not indicate that there are zero known bugs in the implementation.  Here
1933 is the key:</p>
1934
1935
1936 <table border="1" cellspacing="0">
1937   <tr>
1938     <th>Unknown</th>
1939     <th>No support</th>
1940     <th>Partial Support</th>
1941     <th>Complete Support</th>
1942   </tr>
1943   <tr>
1944     <td class="unknown"></td>
1945     <td class="no"></td>
1946     <td class="partial"></td>
1947     <td class="yes"></td>
1948   </tr>
1949 </table>
1950
1951 <p>Here is the table:</p>
1952
1953 <table width="689" border="1" cellspacing="0">
1954 <tr><td></td>
1955 <td colspan="13" align="center" bgcolor="#ffffcc">Target</td>
1956 </tr>
1957   <tr>
1958     <th>Feature</th>
1959     <th>ARM</th>
1960     <th>Alpha</th>
1961     <th>Blackfin</th>
1962     <th>CellSPU</th>
1963     <th>MBlaze</th>
1964     <th>MSP430</th>
1965     <th>Mips</th>
1966     <th>PTX</th>
1967     <th>PowerPC</th>
1968     <th>Sparc</th>
1969     <th>SystemZ</th>
1970     <th>X86</th>
1971     <th>XCore</th>
1972   </tr>
1973
1974 <tr>
1975   <td><a href="#feat_reliable">is generally reliable</a></td>
1976   <td class="yes"></td> <!-- ARM -->
1977   <td class="unknown"></td> <!-- Alpha -->
1978   <td class="no"></td> <!-- Blackfin -->
1979   <td class="no"></td> <!-- CellSPU -->
1980   <td class="no"></td> <!-- MBlaze -->
1981   <td class="unknown"></td> <!-- MSP430 -->
1982   <td class="unknown"></td> <!-- Mips -->
1983   <td class="no"></td> <!-- PTX -->
1984   <td class="yes"></td> <!-- PowerPC -->
1985   <td class="yes"></td> <!-- Sparc -->
1986   <td class="unknown"></td> <!-- SystemZ -->
1987   <td class="yes"></td> <!-- X86 -->
1988   <td class="unknown"></td> <!-- XCore -->
1989 </tr>
1990
1991 <tr>
1992   <td><a href="#feat_asmparser">assembly parser</a></td>
1993   <td class="no"></td> <!-- ARM -->
1994   <td class="no"></td> <!-- Alpha -->
1995   <td class="no"></td> <!-- Blackfin -->
1996   <td class="no"></td> <!-- CellSPU -->
1997   <td class="no"></td> <!-- MBlaze -->
1998   <td class="no"></td> <!-- MSP430 -->
1999   <td class="no"></td> <!-- Mips -->
2000   <td class="no"></td> <!-- PTX -->
2001   <td class="no"></td> <!-- PowerPC -->
2002   <td class="no"></td> <!-- Sparc -->
2003   <td class="no"></td> <!-- SystemZ -->
2004   <td class="yes"></td> <!-- X86 -->
2005   <td class="no"></td> <!-- XCore -->
2006 </tr>
2007
2008 <tr>
2009   <td><a href="#feat_disassembler">disassembler</a></td>
2010   <td class="yes"></td> <!-- ARM -->
2011   <td class="no"></td> <!-- Alpha -->
2012   <td class="no"></td> <!-- Blackfin -->
2013   <td class="no"></td> <!-- CellSPU -->
2014   <td class="no"></td> <!-- MBlaze -->
2015   <td class="no"></td> <!-- MSP430 -->
2016   <td class="no"></td> <!-- Mips -->
2017   <td class="no"></td> <!-- PTX -->
2018   <td class="no"></td> <!-- PowerPC -->
2019   <td class="no"></td> <!-- Sparc -->
2020   <td class="no"></td> <!-- SystemZ -->
2021   <td class="yes"></td> <!-- X86 -->
2022   <td class="no"></td> <!-- XCore -->
2023 </tr>
2024
2025 <tr>
2026   <td><a href="#feat_inlineasm">inline asm</a></td>
2027   <td class="yes"></td> <!-- ARM -->
2028   <td class="unknown"></td> <!-- Alpha -->
2029   <td class="yes"></td> <!-- Blackfin -->
2030   <td class="no"></td> <!-- CellSPU -->
2031   <td class="no"></td> <!-- MBlaze -->
2032   <td class="unknown"></td> <!-- MSP430 -->
2033   <td class="unknown"></td> <!-- Mips -->
2034   <td class="unknown"></td> <!-- PTX -->
2035   <td class="yes"></td> <!-- PowerPC -->
2036   <td class="unknown"></td> <!-- Sparc -->
2037   <td class="unknown"></td> <!-- SystemZ -->
2038   <td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 -->
2039   <td class="unknown"></td> <!-- XCore -->
2040 </tr>
2041
2042 <tr>
2043   <td><a href="#feat_jit">jit</a></td>
2044   <td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
2045   <td class="unknown"></td> <!-- Alpha -->
2046   <td class="no"></td> <!-- Blackfin -->
2047   <td class="no"></td> <!-- CellSPU -->
2048   <td class="no"></td> <!-- MBlaze -->
2049   <td class="unknown"></td> <!-- MSP430 -->
2050   <td class="unknown"></td> <!-- Mips -->
2051   <td class="unknown"></td> <!-- PTX -->
2052   <td class="yes"></td> <!-- PowerPC -->
2053   <td class="unknown"></td> <!-- Sparc -->
2054   <td class="unknown"></td> <!-- SystemZ -->
2055   <td class="yes"></td> <!-- X86 -->
2056   <td class="unknown"></td> <!-- XCore -->
2057 </tr>
2058
2059 <tr>
2060   <td><a href="#feat_objectwrite">.o&nbsp;file writing</a></td>
2061   <td class="no"></td> <!-- ARM -->
2062   <td class="no"></td> <!-- Alpha -->
2063   <td class="no"></td> <!-- Blackfin -->
2064   <td class="no"></td> <!-- CellSPU -->
2065   <td class="no"></td> <!-- MBlaze -->
2066   <td class="no"></td> <!-- MSP430 -->
2067   <td class="no"></td> <!-- Mips -->
2068   <td class="no"></td> <!-- PTX -->
2069   <td class="no"></td> <!-- PowerPC -->
2070   <td class="no"></td> <!-- Sparc -->
2071   <td class="no"></td> <!-- SystemZ -->
2072   <td class="yes"></td> <!-- X86 -->
2073   <td class="no"></td> <!-- XCore -->
2074 </tr>
2075
2076 <tr>
2077   <td><a href="#feat_tailcall">tail calls</a></td>
2078   <td class="yes"></td> <!-- ARM -->
2079   <td class="unknown"></td> <!-- Alpha -->
2080   <td class="no"></td> <!-- Blackfin -->
2081   <td class="no"></td> <!-- CellSPU -->
2082   <td class="no"></td> <!-- MBlaze -->
2083   <td class="unknown"></td> <!-- MSP430 -->
2084   <td class="unknown"></td> <!-- Mips -->
2085   <td class="unknown"></td> <!-- PTX -->
2086   <td class="yes"></td> <!-- PowerPC -->
2087   <td class="unknown"></td> <!-- Sparc -->
2088   <td class="unknown"></td> <!-- SystemZ -->
2089   <td class="yes"></td> <!-- X86 -->
2090   <td class="unknown"></td> <!-- XCore -->
2091 </tr>
2092
2093
2094 </table>
2095
2096 </div>
2097
2098 <!-- _______________________________________________________________________ -->
2099 <div class="doc_subsubsection" id="feat_reliable">Is Generally Reliable</div>
2100
2101 <div class="doc_text">
2102 <p>This box indicates whether the target is considered to be production quality.
2103 This indicates that the target has been used as a static compiler to
2104 compile large amounts of code by a variety of different people and is in
2105 continuous use.</p>
2106 </div>
2107
2108 <!-- _______________________________________________________________________ -->
2109 <div class="doc_subsubsection" id="feat_asmparser">Assembly Parser</div>
2110
2111 <div class="doc_text">
2112 <p>This box indicates whether the target supports parsing target specific .s
2113 files by implementing the MCAsmParser interface.  This is required for llvm-mc
2114 to be able to act as a native assembler and is required for inline assembly
2115 support in the native .o file writer.</p>
2116
2117 </div>
2118
2119
2120 <!-- _______________________________________________________________________ -->
2121 <div class="doc_subsubsection" id="feat_disassembler">Disassembler</div>
2122
2123 <div class="doc_text">
2124 <p>This box indicates whether the target supports the MCDisassembler API for
2125 disassembling machine opcode bytes into MCInst's.</p>
2126
2127 </div>
2128
2129 <!-- _______________________________________________________________________ -->
2130 <div class="doc_subsubsection" id="feat_inlineasm">Inline Asm</div>
2131
2132 <div class="doc_text">
2133 <p>This box indicates whether the target supports most popular inline assembly
2134 constraints and modifiers.</p>
2135
2136 <p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly
2137 constraints relating to the X86 floating point stack.</p>
2138
2139 </div>
2140
2141 <!-- _______________________________________________________________________ -->
2142 <div class="doc_subsubsection" id="feat_jit">JIT Support</div>
2143
2144 <div class="doc_text">
2145 <p>This box indicates whether the target supports the JIT compiler through
2146 the ExecutionEngine interface.</p>
2147
2148 <p id="feat_jit_arm">The ARM backend has basic support for integer code
2149 in ARM codegen mode, but lacks NEON and full Thumb support.</p>
2150
2151 </div>
2152
2153 <!-- _______________________________________________________________________ -->
2154 <div class="doc_subsubsection" id="feat_objectwrite">.o File Writing</div>
2155
2156 <div class="doc_text">
2157
2158 <p>This box indicates whether the target supports writing .o files (e.g. MachO,
2159 ELF, and/or COFF) files directly from the target.  Note that the target also
2160 must include an assembly parser and general inline assembly support for full
2161 inline assembly support in the .o writer.</p>
2162
2163 <p>Targets that don't support this feature can obviously still write out .o
2164 files, they just rely on having an external assembler to translate from a .s
2165 file to a .o file (as is the case for many C compilers).</p>
2166
2167 </div>
2168
2169 <!-- _______________________________________________________________________ -->
2170 <div class="doc_subsubsection" id="feat_tailcall">Tail Calls</div>
2171
2172 <div class="doc_text">
2173
2174 <p>This box indicates whether the target supports guaranteed tail calls.  These
2175 are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
2176 calling convention.  Please see the <a href="#tailcallopt">tail call section
2177 more more details</a>.</p>
2178
2179 </div>
2180
2181
2182
2183
2184 <!-- ======================================================================= -->
2185 <div class="doc_subsection">
2186   <a name="tailcallopt">Tail call optimization</a>
2187 </div>
2188
2189 <div class="doc_text">
2190
2191 <p>Tail call optimization, callee reusing the stack of the caller, is currently
2192    supported on x86/x86-64 and PowerPC. It is performed if:</p>
2193
2194 <ul>
2195   <li>Caller and callee have the calling convention <tt>fastcc</tt> or
2196        <tt>cc 10</tt> (GHC call convention).</li>
2197
2198   <li>The call is a tail call - in tail position (ret immediately follows call
2199       and ret uses value of call or is void).</li>
2200
2201   <li>Option <tt>-tailcallopt</tt> is enabled.</li>
2202
2203   <li>Platform specific constraints are met.</li>
2204 </ul>
2205
2206 <p>x86/x86-64 constraints:</p>
2207
2208 <ul>
2209   <li>No variable argument lists are used.</li>
2210
2211   <li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
2212   = hidden or protected) are supported.</li>
2213 </ul>
2214
2215 <p>PowerPC constraints:</p>
2216
2217 <ul>
2218   <li>No variable argument lists are used.</li>
2219
2220   <li>No byval parameters are used.</li>
2221
2222   <li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
2223 </ul>
2224
2225 <p>Example:</p>
2226
2227 <p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
2228
2229 <div class="doc_code">
2230 <pre>
2231 declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
2232
2233 define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
2234   %l1 = add i32 %in1, %in2
2235   %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
2236   ret i32 %tmp
2237 }
2238 </pre>
2239 </div>
2240
2241 <p>Implications of <tt>-tailcallopt</tt>:</p>
2242
2243 <p>To support tail call optimization in situations where the callee has more
2244    arguments than the caller a 'callee pops arguments' convention is used. This
2245    currently causes each <tt>fastcc</tt> call that is not tail call optimized
2246    (because one or more of above constraints are not met) to be followed by a
2247    readjustment of the stack. So performance might be worse in such cases.</p>
2248
2249 </div>
2250 <!-- ======================================================================= -->
2251 <div class="doc_subsection">
2252   <a name="sibcallopt">Sibling call optimization</a>
2253 </div>
2254
2255 <div class="doc_text">
2256
2257 <p>Sibling call optimization is a restricted form of tail call optimization.
2258    Unlike tail call optimization described in the previous section, it can be
2259    performed automatically on any tail calls when <tt>-tailcallopt</tt> option
2260    is not specified.</p>
2261
2262 <p>Sibling call optimization is currently performed on x86/x86-64 when the
2263    following constraints are met:</p>
2264
2265 <ul>
2266   <li>Caller and callee have the same calling convention. It can be either
2267       <tt>c</tt> or <tt>fastcc</tt>.
2268
2269   <li>The call is a tail call - in tail position (ret immediately follows call
2270       and ret uses value of call or is void).</li>
2271
2272   <li>Caller and callee have matching return type or the callee result is not
2273       used.
2274
2275   <li>If any of the callee arguments are being passed in stack, they must be
2276       available in caller's own incoming argument stack and the frame offsets
2277       must be the same.
2278 </ul>
2279
2280 <p>Example:</p>
2281 <div class="doc_code">
2282 <pre>
2283 declare i32 @bar(i32, i32)
2284
2285 define i32 @foo(i32 %a, i32 %b, i32 %c) {
2286 entry:
2287   %0 = tail call i32 @bar(i32 %a, i32 %b)
2288   ret i32 %0
2289 }
2290 </pre>
2291 </div>
2292
2293 </div>
2294 <!-- ======================================================================= -->
2295 <div class="doc_subsection">
2296   <a name="x86">The X86 backend</a>
2297 </div>
2298
2299 <div class="doc_text">
2300
2301 <p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory.  This
2302    code generator is capable of targeting a variety of x86-32 and x86-64
2303    processors, and includes support for ISA extensions such as MMX and SSE.</p>
2304
2305 </div>
2306
2307 <!-- _______________________________________________________________________ -->
2308 <div class="doc_subsubsection">
2309   <a name="x86_tt">X86 Target Triples supported</a>
2310 </div>
2311
2312 <div class="doc_text">
2313
2314 <p>The following are the known target triples that are supported by the X86
2315    backend.  This is not an exhaustive list, and it would be useful to add those
2316    that people test.</p>
2317
2318 <ul>
2319   <li><b>i686-pc-linux-gnu</b> &mdash; Linux</li>
2320
2321   <li><b>i386-unknown-freebsd5.3</b> &mdash; FreeBSD 5.3</li>
2322
2323   <li><b>i686-pc-cygwin</b> &mdash; Cygwin on Win32</li>
2324
2325   <li><b>i686-pc-mingw32</b> &mdash; MingW on Win32</li>
2326
2327   <li><b>i386-pc-mingw32msvc</b> &mdash; MingW crosscompiler on Linux</li>
2328
2329   <li><b>i686-apple-darwin*</b> &mdash; Apple Darwin on X86</li>
2330
2331   <li><b>x86_64-unknown-linux-gnu</b> &mdash; Linux</li>
2332 </ul>
2333
2334 </div>
2335
2336 <!-- _______________________________________________________________________ -->
2337 <div class="doc_subsubsection">
2338   <a name="x86_cc">X86 Calling Conventions supported</a>
2339 </div>
2340
2341
2342 <div class="doc_text">
2343
2344 <p>The following target-specific calling conventions are known to backend:</p>
2345
2346 <ul>
2347   <li><b>x86_StdCall</b> &mdash; stdcall calling convention seen on Microsoft
2348       Windows platform (CC ID = 64).</li>
2349
2350   <li><b>x86_FastCall</b> &mdash; fastcall calling convention seen on Microsoft
2351       Windows platform (CC ID = 65).</li>
2352 </ul>
2353
2354 </div>
2355
2356 <!-- _______________________________________________________________________ -->
2357 <div class="doc_subsubsection">
2358   <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
2359 </div>
2360
2361 <div class="doc_text">
2362
2363 <p>The x86 has a very flexible way of accessing memory.  It is capable of
2364    forming memory addresses of the following expression directly in integer
2365    instructions (which use ModR/M addressing):</p>
2366
2367 <div class="doc_code">
2368 <pre>
2369 SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
2370 </pre>
2371 </div>
2372
2373 <p>In order to represent this, LLVM tracks no less than 5 operands for each
2374    memory operand of this form.  This means that the "load" form of
2375    '<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
2376
2377 <div class="doc_code">
2378 <pre>
2379 Index:        0     |    1        2       3           4          5
2380 Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
2381 OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg
2382 </pre>
2383 </div>
2384
2385 <p>Stores, and all other instructions, treat the four memory operands in the
2386    same way and in the same order.  If the segment register is unspecified
2387    (regno = 0), then no segment override is generated.  "Lea" operations do not
2388    have a segment register specified, so they only have 4 operands for their
2389    memory reference.</p>
2390
2391 </div>
2392
2393 <!-- _______________________________________________________________________ -->
2394 <div class="doc_subsubsection">
2395   <a name="x86_memory">X86 address spaces supported</a>
2396 </div>
2397
2398 <div class="doc_text">
2399
2400 <p>x86 has an experimental feature which provides
2401    the ability to perform loads and stores to different address spaces
2402    via the x86 segment registers.  A segment override prefix byte on an
2403    instruction causes the instruction's memory access to go to the specified
2404    segment.  LLVM address space 0 is the default address space, which includes
2405    the stack, and any unqualified memory accesses in a program.  Address spaces
2406    1-255 are currently reserved for user-defined code.  The GS-segment is
2407    represented by address space 256, while the FS-segment is represented by
2408    address space 257. Other x86 segments have yet to be allocated address space
2409    numbers.</p>
2410
2411 <p>While these address spaces may seem similar to TLS via the
2412    <tt>thread_local</tt> keyword, and often use the same underlying hardware,
2413    there are some fundamental differences.</p>
2414
2415 <p>The <tt>thread_local</tt> keyword applies to global variables and
2416    specifies that they are to be allocated in thread-local memory. There are
2417    no type qualifiers involved, and these variables can be pointed to with
2418    normal pointers and accessed with normal loads and stores.
2419    The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
2420    level (though LLVM doesn't yet have implementations of it for some
2421    configurations).<p>
2422
2423 <p>Special address spaces, in contrast, apply to static types. Every
2424    load and store has a particular address space in its address operand type,
2425    and this is what determines which address space is accessed.
2426    LLVM ignores these special address space qualifiers on global variables,
2427    and does not provide a way to directly allocate storage in them.
2428    At the LLVM IR level, the behavior of these special address spaces depends
2429    in part on the underlying OS or runtime environment, and they are specific
2430    to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
2431
2432 <p>Some operating systems and runtime environments use (or may in the future
2433    use) the FS/GS-segment registers for various low-level purposes, so care
2434    should be taken when considering them.</p>
2435
2436 </div>
2437
2438 <!-- _______________________________________________________________________ -->
2439 <div class="doc_subsubsection">
2440   <a name="x86_names">Instruction naming</a>
2441 </div>
2442
2443 <div class="doc_text">
2444
2445 <p>An instruction name consists of the base name, a default operand size, and a
2446    a character per operand with an optional special size. For example:</p>
2447
2448 <div class="doc_code">
2449 <pre>
2450 ADD8rr      -&gt; add, 8-bit register, 8-bit register
2451 IMUL16rmi   -&gt; imul, 16-bit register, 16-bit memory, 16-bit immediate
2452 IMUL16rmi8  -&gt; imul, 16-bit register, 16-bit memory, 8-bit immediate
2453 MOVSX32rm16 -&gt; movsx, 32-bit register, 16-bit memory
2454 </pre>
2455 </div>
2456
2457 </div>
2458
2459 <!-- ======================================================================= -->
2460 <div class="doc_subsection">
2461   <a name="ppc">The PowerPC backend</a>
2462 </div>
2463
2464 <div class="doc_text">
2465
2466 <p>The PowerPC code generator lives in the lib/Target/PowerPC directory.  The
2467    code generation is retargetable to several variations or <i>subtargets</i> of
2468    the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
2469
2470 </div>
2471
2472 <!-- _______________________________________________________________________ -->
2473 <div class="doc_subsubsection">
2474   <a name="ppc_abi">LLVM PowerPC ABI</a>
2475 </div>
2476
2477 <div class="doc_text">
2478
2479 <p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
2480    relative (PIC) or static addressing for accessing global values, so no TOC
2481    (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
2482    of a stack frame.  LLVM takes advantage of having no TOC to provide space to
2483    save the frame pointer in the PowerPC linkage area of the caller frame.
2484    Other details of PowerPC ABI can be found at <a href=
2485    "http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
2486    >PowerPC ABI.</a> Note: This link describes the 32 bit ABI.  The 64 bit ABI
2487    is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
2488    for system use.</p>
2489
2490 </div>
2491
2492 <!-- _______________________________________________________________________ -->
2493 <div class="doc_subsubsection">
2494   <a name="ppc_frame">Frame Layout</a>
2495 </div>
2496
2497 <div class="doc_text">
2498
2499 <p>The size of a PowerPC frame is usually fixed for the duration of a
2500    function's invocation.  Since the frame is fixed size, all references
2501    into the frame can be accessed via fixed offsets from the stack pointer.  The
2502    exception to this is when dynamic alloca or variable sized arrays are
2503    present, then a base pointer (r31) is used as a proxy for the stack pointer
2504    and stack pointer is free to grow or shrink.  A base pointer is also used if
2505    llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
2506    always aligned to 16 bytes, so that space allocated for altivec vectors will
2507    be properly aligned.</p>
2508
2509 <p>An invocation frame is laid out as follows (low memory at top);</p>
2510
2511 <table class="layout">
2512   <tr>
2513     <td>Linkage<br><br></td>
2514   </tr>
2515   <tr>
2516     <td>Parameter area<br><br></td>
2517   </tr>
2518   <tr>
2519     <td>Dynamic area<br><br></td>
2520   </tr>
2521   <tr>
2522     <td>Locals area<br><br></td>
2523   </tr>
2524   <tr>
2525     <td>Saved registers area<br><br></td>
2526   </tr>
2527   <tr style="border-style: none hidden none hidden;">
2528     <td><br></td>
2529   </tr>
2530   <tr>
2531     <td>Previous Frame<br><br></td>
2532   </tr>
2533 </table>
2534
2535 <p>The <i>linkage</i> area is used by a callee to save special registers prior
2536    to allocating its own frame.  Only three entries are relevant to LLVM. The
2537    first entry is the previous stack pointer (sp), aka link.  This allows
2538    probing tools like gdb or exception handlers to quickly scan the frames in
2539    the stack.  A function epilog can also use the link to pop the frame from the
2540    stack.  The third entry in the linkage area is used to save the return
2541    address from the lr register. Finally, as mentioned above, the last entry is
2542    used to save the previous frame pointer (r31.)  The entries in the linkage
2543    area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
2544    mode and 48 bytes in 64 bit mode.</p>
2545
2546 <p>32 bit linkage area</p>
2547
2548 <table class="layout">
2549   <tr>
2550     <td>0</td>
2551     <td>Saved SP (r1)</td>
2552   </tr>
2553   <tr>
2554     <td>4</td>
2555     <td>Saved CR</td>
2556   </tr>
2557   <tr>
2558     <td>8</td>
2559     <td>Saved LR</td>
2560   </tr>
2561   <tr>
2562     <td>12</td>
2563     <td>Reserved</td>
2564   </tr>
2565   <tr>
2566     <td>16</td>
2567     <td>Reserved</td>
2568   </tr>
2569   <tr>
2570     <td>20</td>
2571     <td>Saved FP (r31)</td>
2572   </tr>
2573 </table>
2574
2575 <p>64 bit linkage area</p>
2576
2577 <table class="layout">
2578   <tr>
2579     <td>0</td>
2580     <td>Saved SP (r1)</td>
2581   </tr>
2582   <tr>
2583     <td>8</td>
2584     <td>Saved CR</td>
2585   </tr>
2586   <tr>
2587     <td>16</td>
2588     <td>Saved LR</td>
2589   </tr>
2590   <tr>
2591     <td>24</td>
2592     <td>Reserved</td>
2593   </tr>
2594   <tr>
2595     <td>32</td>
2596     <td>Reserved</td>
2597   </tr>
2598   <tr>
2599     <td>40</td>
2600     <td>Saved FP (r31)</td>
2601   </tr>
2602 </table>
2603
2604 <p>The <i>parameter area</i> is used to store arguments being passed to a callee
2605    function.  Following the PowerPC ABI, the first few arguments are actually
2606    passed in registers, with the space in the parameter area unused.  However,
2607    if there are not enough registers or the callee is a thunk or vararg
2608    function, these register arguments can be spilled into the parameter area.
2609    Thus, the parameter area must be large enough to store all the parameters for
2610    the largest call sequence made by the caller.  The size must also be
2611    minimally large enough to spill registers r3-r10.  This allows callees blind
2612    to the call signature, such as thunks and vararg functions, enough space to
2613    cache the argument registers.  Therefore, the parameter area is minimally 32
2614    bytes (64 bytes in 64 bit mode.)  Also note that since the parameter area is
2615    a fixed offset from the top of the frame, that a callee can access its spilt
2616    arguments using fixed offsets from the stack pointer (or base pointer.)</p>
2617
2618 <p>Combining the information about the linkage, parameter areas and alignment. A
2619    stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
2620    mode.</p>
2621
2622 <p>The <i>dynamic area</i> starts out as size zero.  If a function uses dynamic
2623    alloca then space is added to the stack, the linkage and parameter areas are
2624    shifted to top of stack, and the new space is available immediately below the
2625    linkage and parameter areas.  The cost of shifting the linkage and parameter
2626    areas is minor since only the link value needs to be copied.  The link value
2627    can be easily fetched by adding the original frame size to the base pointer.
2628    Note that allocations in the dynamic space need to observe 16 byte
2629    alignment.</p>
2630
2631 <p>The <i>locals area</i> is where the llvm compiler reserves space for local
2632    variables.</p>
2633
2634 <p>The <i>saved registers area</i> is where the llvm compiler spills callee
2635    saved registers on entry to the callee.</p>
2636
2637 </div>
2638
2639 <!-- _______________________________________________________________________ -->
2640 <div class="doc_subsubsection">
2641   <a name="ppc_prolog">Prolog/Epilog</a>
2642 </div>
2643
2644 <div class="doc_text">
2645
2646 <p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
2647    the following exceptions.  Callee saved registers are spilled after the frame
2648    is created.  This allows the llvm epilog/prolog support to be common with
2649    other targets.  The base pointer callee saved register r31 is saved in the
2650    TOC slot of linkage area.  This simplifies allocation of space for the base
2651    pointer and makes it convenient to locate programatically and during
2652    debugging.</p>
2653
2654 </div>
2655
2656 <!-- _______________________________________________________________________ -->
2657 <div class="doc_subsubsection">
2658   <a name="ppc_dynamic">Dynamic Allocation</a>
2659 </div>
2660
2661 <div class="doc_text">
2662
2663 <p><i>TODO - More to come.</i></p>
2664
2665 </div>
2666
2667
2668 <!-- *********************************************************************** -->
2669 <hr>
2670 <address>
2671   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2672   src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
2673   <a href="http://validator.w3.org/check/referer"><img
2674   src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
2675
2676   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2677   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2678   Last modified: $Date$
2679 </address>
2680
2681 </body>
2682 </html>