docs/CodeGenerator.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3 <html>
   4 <head>
   5   <title>The LLVM Target-Independent Code Generator</title>
   6   <link rel="stylesheet" href="llvm.css" type="text/css">
   7 </head>
   8 <body>
   9
  10 <div class="doc_title">
  11   The LLVM Target-Independent Code Generator
  12 </div>
  13
  14 <ol>
  15   <li><a href="#introduction">Introduction</a>
  16     <ul>
  17       <li><a href="#required">Required components in the code generator</a></li>
  18       <li><a href="#high-level-design">The high-level design of the code
  19           generator</a></li>
  20       <li><a href="#tablegen">Using TableGen for target description</a></li>
  21     </ul>
  22   </li>
  23   <li><a href="#targetdesc">Target description classes</a>
  24     <ul>
  25       <li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
  26       <li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
  27       <li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
  28       <li><a href="#mregisterinfo">The <tt>MRegisterInfo</tt> class</a></li>
  29       <li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
  30       <li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
  31       <li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
  32       <li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
  33     </ul>
  34   </li>
  35   <li><a href="#codegendesc">Machine code description classes</a>
  36     <ul>
  37     <li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
  38     <li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
  39                                      class</a></li>
  40     <li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
  41     </ul>
  42   </li>
  43   <li><a href="#codegenalgs">Target-independent code generation algorithms</a>
  44     <ul>
  45     <li><a href="#instselect">Instruction Selection</a>
  46       <ul>
  47       <li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
  48       <li><a href="#selectiondag_process">SelectionDAG Code Generation
  49                                           Process</a></li>
  50       <li><a href="#selectiondag_build">Initial SelectionDAG
  51                                         Construction</a></li>
  52       <li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
  53       <li><a href="#selectiondag_optimize">SelectionDAG Optimization
  54                                            Phase: the DAG Combiner</a></li>
  55       <li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
  56       <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
  57                                         Phase</a></li>
  58       <li><a href="#selectiondag_future">Future directions for the
  59                                          SelectionDAG</a></li>
  60       </ul></li>
  61     <li><a href="#codeemit">Code Emission</a>
  62         <ul>
  63         <li><a href="#codeemit_asm">Generating Assembly Code</a></li>
  64         <li><a href="#codeemit_bin">Generating Binary Machine Code</a></li>
  65         </ul></li>
  66     </ul>
  67   </li>
  68   <li><a href="#targetimpls">Target-specific Implementation Notes</a>
  69     <ul>
  70     <li><a href="#x86">The X86 backend</a></li>
  71     </ul>
  72   </li>
  73
  74 </ol>
  75
  76 <div class="doc_author">
  77   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a> &amp;
  78   <a href="mailto:isanbard@gmail.com">Bill Wendling</a></p>
  79 </div>
  80
  81 <div class="doc_warning">
  82   <p>Warning: This is a work in progress.</p>
  83 </div>
  84
  85 <!-- *********************************************************************** -->
  86 <div class="doc_section">
  87   <a name="introduction">Introduction</a>
  88 </div>
  89 <!-- *********************************************************************** -->
  90
  91 <div class="doc_text">
  92
  93 <p>The LLVM target-independent code generator is a framework that provides a
  94 suite of reusable components for translating the LLVM internal representation to
  95 the machine code for a specified target&mdash;either in assembly form (suitable
  96 for a static compiler) or in binary machine code format (usable for a JIT
  97 compiler). The LLVM target-independent code generator consists of five main
  98 components:</p>
  99
 100 <ol>
 101 <li><a href="#targetdesc">Abstract target description</a> interfaces which
 102 capture important properties about various aspects of the machine, independently
 103 of how they will be used.  These interfaces are defined in
 104 <tt>include/llvm/Target/</tt>.</li>
 105
 106 <li>Classes used to represent the <a href="#codegendesc">machine code</a> being
 107 generated for a target.  These classes are intended to be abstract enough to
 108 represent the machine code for <i>any</i> target machine.  These classes are
 109 defined in <tt>include/llvm/CodeGen/</tt>.</li>
 110
 111 <li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
 112 various phases of native code generation (register allocation, scheduling, stack
 113 frame representation, etc).  This code lives in <tt>lib/CodeGen/</tt>.</li>
 114
 115 <li><a href="#targetimpls">Implementations of the abstract target description
 116 interfaces</a> for particular targets.  These machine descriptions make use of
 117 the components provided by LLVM, and can optionally provide custom
 118 target-specific passes, to build complete code generators for a specific target.
 119 Target descriptions live in <tt>lib/Target/</tt>.</li>
 120
 121 <li><a href="#jit">The target-independent JIT components</a>.  The LLVM JIT is
 122 completely target independent (it uses the <tt>TargetJITInfo</tt> structure to
 123 interface for target-specific issues.  The code for the target-independent
 124 JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
 125
 126 </ol>
 127
 128 <p>
 129 Depending on which part of the code generator you are interested in working on,
 130 different pieces of this will be useful to you.  In any case, you should be
 131 familiar with the <a href="#targetdesc">target description</a> and <a
 132 href="#codegendesc">machine code representation</a> classes.  If you want to add
 133 a backend for a new target, you will need to <a href="#targetimpls">implement the
 134 target description</a> classes for your new target and understand the <a
 135 href="LangRef.html">LLVM code representation</a>.  If you are interested in
 136 implementing a new <a href="#codegenalgs">code generation algorithm</a>, it
 137 should only depend on the target-description and machine code representation
 138 classes, ensuring that it is portable.
 139 </p>
 140
 141 </div>
 142
 143 <!-- ======================================================================= -->
 144 <div class="doc_subsection">
 145  <a name="required">Required components in the code generator</a>
 146 </div>
 147
 148 <div class="doc_text">
 149
 150 <p>The two pieces of the LLVM code generator are the high-level interface to the
 151 code generator and the set of reusable components that can be used to build
 152 target-specific backends.  The two most important interfaces (<a
 153 href="#targetmachine"><tt>TargetMachine</tt></a> and <a
 154 href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
 155 required to be defined for a backend to fit into the LLVM system, but the others
 156 must be defined if the reusable code generator components are going to be
 157 used.</p>
 158
 159 <p>This design has two important implications.  The first is that LLVM can
 160 support completely non-traditional code generation targets.  For example, the C
 161 backend does not require register allocation, instruction selection, or any of
 162 the other standard components provided by the system.  As such, it only
 163 implements these two interfaces, and does its own thing.  Another example of a
 164 code generator like this is a (purely hypothetical) backend that converts LLVM
 165 to the GCC RTL form and uses GCC to emit machine code for a target.</p>
 166
 167 <p>This design also implies that it is possible to design and
 168 implement radically different code generators in the LLVM system that do not
 169 make use of any of the built-in components.  Doing so is not recommended at all,
 170 but could be required for radically different targets that do not fit into the
 171 LLVM machine description model: FPGAs for example.</p>
 172
 173 </div>
 174
 175 <!-- ======================================================================= -->
 176 <div class="doc_subsection">
 177  <a name="high-level-design">The high-level design of the code generator</a>
 178 </div>
 179
 180 <div class="doc_text">
 181
 182 <p>The LLVM target-independent code generator is designed to support efficient and
 183 quality code generation for standard register-based microprocessors.  Code
 184 generation in this model is divided into the following stages:</p>
 185
 186 <ol>
 187 <li><b><a href="#instselect">Instruction Selection</a></b> - This phase
 188 determines an efficient way to express the input LLVM code in the target
 189 instruction set.
 190 This stage produces the initial code for the program in the target instruction
 191 set, then makes use of virtual registers in SSA form and physical registers that
 192 represent any required register assignments due to target constraints or calling
 193 conventions.  This step turns the LLVM code into a DAG of target
 194 instructions.</li>
 195
 196 <li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> - This
 197 phase takes the DAG of target instructions produced by the instruction selection
 198 phase, determines an ordering of the instructions, then emits the instructions
 199 as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.  Note
 200 that we describe this in the <a href="#instselect">instruction selection
 201 section</a> because it operates on a <a
 202 href="#selectiondag_intro">SelectionDAG</a>.
 203 </li>
 204
 205 <li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> - This
 206 optional stage consists of a series of machine-code optimizations that
 207 operate on the SSA-form produced by the instruction selector.  Optimizations
 208 like modulo-scheduling or peephole optimization work here.
 209 </li>
 210
 211 <li><b><a href="#regalloc">Register Allocation</a></b> - The
 212 target code is transformed from an infinite virtual register file in SSA form
 213 to the concrete register file used by the target.  This phase introduces spill
 214 code and eliminates all virtual register references from the program.</li>
 215
 216 <li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> - Once the
 217 machine code has been generated for the function and the amount of stack space
 218 required is known (used for LLVM alloca's and spill slots), the prolog and
 219 epilog code for the function can be inserted and "abstract stack location
 220 references" can be eliminated.  This stage is responsible for implementing
 221 optimizations like frame-pointer elimination and stack packing.</li>
 222
 223 <li><b><a href="#latemco">Late Machine Code Optimizations</a></b> - Optimizations
 224 that operate on "final" machine code can go here, such as spill code scheduling
 225 and peephole optimizations.</li>
 226
 227 <li><b><a href="#codeemit">Code Emission</a></b> - The final stage actually
 228 puts out the code for the current function, either in the target assembler
 229 format or in machine code.</li>
 230
 231 </ol>
 232
 233 <p>The code generator is based on the assumption that the instruction selector
 234 will use an optimal pattern matching selector to create high-quality sequences of
 235 native instructions.  Alternative code generator designs based on pattern
 236 expansion and aggressive iterative peephole optimization are much slower.  This
 237 design permits efficient compilation (important for JIT environments) and
 238 aggressive optimization (used when generating code offline) by allowing
 239 components of varying levels of sophistication to be used for any step of
 240 compilation.</p>
 241
 242 <p>In addition to these stages, target implementations can insert arbitrary
 243 target-specific passes into the flow.  For example, the X86 target uses a
 244 special pass to handle the 80x87 floating point stack architecture.  Other
 245 targets with unusual requirements can be supported with custom passes as
 246 needed.</p>
 247
 248 </div>
 249
 250
 251 <!-- ======================================================================= -->
 252 <div class="doc_subsection">
 253  <a name="tablegen">Using TableGen for target description</a>
 254 </div>
 255
 256 <div class="doc_text">
 257
 258 <p>The target description classes require a detailed description of the target
 259 architecture.  These target descriptions often have a large amount of common
 260 information (e.g., an <tt>add</tt> instruction is almost identical to a
 261 <tt>sub</tt> instruction).
 262 In order to allow the maximum amount of commonality to be factored out, the LLVM
 263 code generator uses the <a href="TableGenFundamentals.html">TableGen</a> tool to
 264 describe big chunks of the target machine, which allows the use of
 265 domain-specific and target-specific abstractions to reduce the amount of
 266 repetition.</p>
 267
 268 <p>As LLVM continues to be developed and refined, we plan to move more and more
 269 of the target description to the <tt>.td</tt> form.  Doing so gives us a
 270 number of advantages.  The most important is that it makes it easier to port
 271 LLVM because it reduces the amount of C++ code that has to be written, and the
 272 surface area of the code generator that needs to be understood before someone
 273 can get something working.  Second, it makes it easier to change things. In
 274 particular, if tables and other things are all emitted by <tt>tblgen</tt>, we
 275 only need a change in one place (<tt>tblgen</tt>) to update all of the targets
 276 to a new interface.</p>
 277
 278 </div>
 279
 280 <!-- *********************************************************************** -->
 281 <div class="doc_section">
 282   <a name="targetdesc">Target description classes</a>
 283 </div>
 284 <!-- *********************************************************************** -->
 285
 286 <div class="doc_text">
 287
 288 <p>The LLVM target description classes (located in the
 289 <tt>include/llvm/Target</tt> directory) provide an abstract description of the
 290 target machine independent of any particular client.  These classes are
 291 designed to capture the <i>abstract</i> properties of the target (such as the
 292 instructions and registers it has), and do not incorporate any particular pieces
 293 of code generation algorithms.</p>
 294
 295 <p>All of the target description classes (except the <tt><a
 296 href="#targetdata">TargetData</a></tt> class) are designed to be subclassed by
 297 the concrete target implementation, and have virtual methods implemented.  To
 298 get to these implementations, the <tt><a
 299 href="#targetmachine">TargetMachine</a></tt> class provides accessors that
 300 should be implemented by the target.</p>
 301
 302 </div>
 303
 304 <!-- ======================================================================= -->
 305 <div class="doc_subsection">
 306   <a name="targetmachine">The <tt>TargetMachine</tt> class</a>
 307 </div>
 308
 309 <div class="doc_text">
 310
 311 <p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
 312 access the target-specific implementations of the various target description
 313 classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
 314 <tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.).  This class is
 315 designed to be specialized by
 316 a concrete target implementation (e.g., <tt>X86TargetMachine</tt>) which
 317 implements the various virtual methods.  The only required target description
 318 class is the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the
 319 code generator components are to be used, the other interfaces should be
 320 implemented as well.</p>
 321
 322 </div>
 323
 324
 325 <!-- ======================================================================= -->
 326 <div class="doc_subsection">
 327   <a name="targetdata">The <tt>TargetData</tt> class</a>
 328 </div>
 329
 330 <div class="doc_text">
 331
 332 <p>The <tt>TargetData</tt> class is the only required target description class,
 333 and it is the only class that is not extensible (you cannot derived  a new
 334 class from it).  <tt>TargetData</tt> specifies information about how the target
 335 lays out memory for structures, the alignment requirements for various data
 336 types, the size of pointers in the target, and whether the target is
 337 little-endian or big-endian.</p>
 338
 339 </div>
 340
 341 <!-- ======================================================================= -->
 342 <div class="doc_subsection">
 343   <a name="targetlowering">The <tt>TargetLowering</tt> class</a>
 344 </div>
 345
 346 <div class="doc_text">
 347
 348 <p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
 349 selectors primarily to describe how LLVM code should be lowered to SelectionDAG
 350 operations.  Among other things, this class indicates:</p>
 351
 352 <ul>
 353   <li>an initial register class to use for various <tt>ValueType</tt>s</li>
 354   <li>which operations are natively supported by the target machine</li>
 355   <li>the return type of <tt>setcc</tt> operations</li>
 356   <li>the type to use for shift amounts</li>
 357   <li>various high-level characteristics, like whether it is profitable to turn
 358       division by a constant into a multiplication sequence</li>
 359 </ol>
 360
 361 </div>
 362
 363 <!-- ======================================================================= -->
 364 <div class="doc_subsection">
 365   <a name="mregisterinfo">The <tt>MRegisterInfo</tt> class</a>
 366 </div>
 367
 368 <div class="doc_text">
 369
 370 <p>The <tt>MRegisterInfo</tt> class (which will eventually be renamed to
 371 <tt>TargetRegisterInfo</tt>) is used to describe the register file of the
 372 target and any interactions between the registers.</p>
 373
 374 <p>Registers in the code generator are represented in the code generator by
 375 unsigned integers.  Physical registers (those that actually exist in the target
 376 description) are unique small numbers, and virtual registers are generally
 377 large.  Note that register #0 is reserved as a flag value.</p>
 378
 379 <p>Each register in the processor description has an associated
 380 <tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
 381 register (used for assembly output and debugging dumps) and a set of aliases
 382 (used to indicate whether one register overlaps with another).
 383 </p>
 384
 385 <p>In addition to the per-register description, the <tt>MRegisterInfo</tt> class
 386 exposes a set of processor specific register classes (instances of the
 387 <tt>TargetRegisterClass</tt> class).  Each register class contains sets of
 388 registers that have the same properties (for example, they are all 32-bit
 389 integer registers).  Each SSA virtual register created by the instruction
 390 selector has an associated register class.  When the register allocator runs, it
 391 replaces virtual registers with a physical register in the set.</p>
 392
 393 <p>
 394 The target-specific implementations of these classes is auto-generated from a <a
 395 href="TableGenFundamentals.html">TableGen</a> description of the register file.
 396 </p>
 397
 398 </div>
 399
 400 <!-- ======================================================================= -->
 401 <div class="doc_subsection">
 402   <a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
 403 </div>
 404
 405 <div class="doc_text">
 406   <p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
 407   instructions supported by the target. It is essentially an array of
 408   <tt>TargetInstrDescriptor</tt> objects, each of which describes one
 409   instruction the target supports. Descriptors define things like the mnemonic
 410   for the opcode, the number of operands, the list of implicit register uses
 411   and defs, whether the instruction has certain target-independent properties
 412   (accesses memory, is commutable, etc), and holds any target-specific
 413   flags.</p>
 414 </div>
 415
 416 <!-- ======================================================================= -->
 417 <div class="doc_subsection">
 418   <a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
 419 </div>
 420
 421 <div class="doc_text">
 422   <p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
 423   stack frame layout of the target. It holds the direction of stack growth,
 424   the known stack alignment on entry to each function, and the offset to the
 425   local area.  The offset to the local area is the offset from the stack
 426   pointer on function entry to the first location where function data (local
 427   variables, spill locations) can be stored.</p>
 428 </div>
 429
 430 <!-- ======================================================================= -->
 431 <div class="doc_subsection">
 432   <a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
 433 </div>
 434
 435 <div class="doc_text">
 436   <p>The <tt>TargetSubtarget</tt> class is used to provide information about the
 437   specific chip set being targeted.  A sub-target informs code generation of
 438   which instructions are supported, instruction latencies and instruction
 439   execution itinerary; i.e., which processing units are used, in what order, and
 440   for how long.</p>
 441 </div>
 442
 443
 444 <!-- ======================================================================= -->
 445 <div class="doc_subsection">
 446   <a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
 447 </div>
 448
 449 <div class="doc_text">
 450   <p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
 451   Just-In-Time code generator to perform target-specific activities, such as
 452   emitting stubs.  If a <tt>TargetMachine</tt> supports JIT code generation, it
 453   should provide one of these objects through the <tt>getJITInfo</tt>
 454   method.</p>
 455 </div>
 456
 457 <!-- *********************************************************************** -->
 458 <div class="doc_section">
 459   <a name="codegendesc">Machine code description classes</a>
 460 </div>
 461 <!-- *********************************************************************** -->
 462
 463 <div class="doc_text">
 464
 465 <p>At the high-level, LLVM code is translated to a machine specific
 466 representation formed out of
 467 <a href="#machinefunction"><tt>MachineFunction</tt></a>,
 468 <a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>, and <a
 469 href="#machineinstr"><tt>MachineInstr</tt></a> instances
 470 (defined in <tt>include/llvm/CodeGen</tt>).  This representation is completely
 471 target agnostic, representing instructions in their most abstract form: an
 472 opcode and a series of operands.  This representation is designed to support
 473 both an SSA representation for machine code, as well as a register allocated,
 474 non-SSA form.</p>
 475
 476 </div>
 477
 478 <!-- ======================================================================= -->
 479 <div class="doc_subsection">
 480   <a name="machineinstr">The <tt>MachineInstr</tt> class</a>
 481 </div>
 482
 483 <div class="doc_text">
 484
 485 <p>Target machine instructions are represented as instances of the
 486 <tt>MachineInstr</tt> class.  This class is an extremely abstract way of
 487 representing machine instructions.  In particular, it only keeps track of
 488 an opcode number and a set of operands.</p>
 489
 490 <p>The opcode number is a simple unsigned integer that only has meaning to a
 491 specific backend.  All of the instructions for a target should be defined in
 492 the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values
 493 are auto-generated from this description.  The <tt>MachineInstr</tt> class does
 494 not have any information about how to interpret the instruction (i.e., what the
 495 semantics of the instruction are); for that you must refer to the
 496 <tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
 497
 498 <p>The operands of a machine instruction can be of several different types:
 499 a register reference, a constant integer, a basic block reference, etc.  In
 500 addition, a machine operand should be marked as a def or a use of the value
 501 (though only registers are allowed to be defs).</p>
 502
 503 <p>By convention, the LLVM code generator orders instruction operands so that
 504 all register definitions come before the register uses, even on architectures
 505 that are normally printed in other orders.  For example, the SPARC add
 506 instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
 507 and stores the result into the "%i3" register.  In the LLVM code generator,
 508 the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the destination
 509 first.</p>
 510
 511 <p>Keeping destination (definition) operands at the beginning of the operand
 512 list has several advantages.  In particular, the debugging printer will print
 513 the instruction like this:</p>
 514
 515 <div class="doc_code">
 516 <pre>
 517 %r3 = add %i1, %i2
 518 </pre>
 519 </div>
 520
 521 <p>Also if the first operand is a def, it is easier to <a
 522 href="#buildmi">create instructions</a> whose only def is the first
 523 operand.</p>
 524
 525 </div>
 526
 527 <!-- _______________________________________________________________________ -->
 528 <div class="doc_subsubsection">
 529   <a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
 530 </div>
 531
 532 <div class="doc_text">
 533
 534 <p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
 535 located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file.  The
 536 <tt>BuildMI</tt> functions make it easy to build arbitrary machine
 537 instructions.  Usage of the <tt>BuildMI</tt> functions look like this:</p>
 538
 539 <div class="doc_code">
 540 <pre>
 541 // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
 542 // instruction.  The '1' specifies how many operands will be added.
 543 MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
 544
 545 // Create the same instr, but insert it at the end of a basic block.
 546 MachineBasicBlock &amp;MBB = ...
 547 BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
 548
 549 // Create the same instr, but insert it before a specified iterator point.
 550 MachineBasicBlock::iterator MBBI = ...
 551 BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
 552
 553 // Create a 'cmp Reg, 0' instruction, no destination reg.
 554 MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
 555 // Create an 'sahf' instruction which takes no operands and stores nothing.
 556 MI = BuildMI(X86::SAHF, 0);
 557
 558 // Create a self looping branch instruction.
 559 BuildMI(MBB, X86::JNE, 1).addMBB(&amp;MBB);
 560 </pre>
 561 </div>
 562
 563 <p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
 564 have to specify the number of operands that the machine instruction will take.
 565 This allows for efficient memory allocation.  You also need to specify if
 566 operands default to be uses of values, not definitions.  If you need to add a
 567 definition operand (other than the optional destination register), you must
 568 explicitly mark it as such:</p>
 569
 570 <div class="doc_code">
 571 <pre>
 572 MI.addReg(Reg, MachineOperand::Def);
 573 </pre>
 574 </div>
 575
 576 </div>
 577
 578 <!-- _______________________________________________________________________ -->
 579 <div class="doc_subsubsection">
 580   <a name="fixedregs">Fixed (preassigned) registers</a>
 581 </div>
 582
 583 <div class="doc_text">
 584
 585 <p>One important issue that the code generator needs to be aware of is the
 586 presence of fixed registers.  In particular, there are often places in the
 587 instruction stream where the register allocator <em>must</em> arrange for a
 588 particular value to be in a particular register.  This can occur due to
 589 limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
 590 with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like calling
 591 conventions.  In any case, the instruction selector should emit code that
 592 copies a virtual register into or out of a physical register when needed.</p>
 593
 594 <p>For example, consider this simple LLVM example:</p>
 595
 596 <div class="doc_code">
 597 <pre>
 598 int %test(int %X, int %Y) {
 599   %Z = div int %X, %Y
 600   ret int %Z
 601 }
 602 </pre>
 603 </div>
 604
 605 <p>The X86 instruction selector produces this machine code for the <tt>div</tt>
 606 and <tt>ret</tt> (use
 607 "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to get this):</p>
 608
 609 <div class="doc_code">
 610 <pre>
 611 ;; Start of div
 612 %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
 613 %reg1027 = sar %reg1024, 31
 614 %EDX = mov %reg1027           ;; Sign extend X into EDX
 615 idiv %reg1025                 ;; Divide by Y (in reg1025)
 616 %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
 617
 618 ;; Start of ret
 619 %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
 620 ret
 621 </pre>
 622 </div>
 623
 624 <p>By the end of code generation, the register allocator has coalesced
 625 the registers and deleted the resultant identity moves producing the
 626 following code:</p>
 627
 628 <div class="doc_code">
 629 <pre>
 630 ;; X is in EAX, Y is in ECX
 631 mov %EAX, %EDX
 632 sar %EDX, 31
 633 idiv %ECX
 634 ret
 635 </pre>
 636 </div>
 637
 638 <p>This approach is extremely general (if it can handle the X86 architecture,
 639 it can handle anything!) and allows all of the target specific
 640 knowledge about the instruction stream to be isolated in the instruction
 641 selector.  Note that physical registers should have a short lifetime for good
 642 code generation, and all physical registers are assumed dead on entry to and
 643 exit from basic blocks (before register allocation).  Thus, if you need a value
 644 to be live across basic block boundaries, it <em>must</em> live in a virtual
 645 register.</p>
 646
 647 </div>
 648
 649 <!-- _______________________________________________________________________ -->
 650 <div class="doc_subsubsection">
 651   <a name="ssa">Machine code in SSA form</a>
 652 </div>
 653
 654 <div class="doc_text">
 655
 656 <p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and
 657 are maintained in SSA-form until register allocation happens.  For the most
 658 part, this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
 659 become machine code PHI nodes, and virtual registers are only allowed to have a
 660 single definition.</p>
 661
 662 <p>After register allocation, machine code is no longer in SSA-form because there
 663 are no virtual registers left in the code.</p>
 664
 665 </div>
 666
 667 <!-- ======================================================================= -->
 668 <div class="doc_subsection">
 669   <a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
 670 </div>
 671
 672 <div class="doc_text">
 673
 674 <p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
 675 (<tt><a href="#machineinstr">MachineInstr</a></tt> instances).  It roughly
 676 corresponds to the LLVM code input to the instruction selector, but there can be
 677 a one-to-many mapping (i.e. one LLVM basic block can map to multiple machine
 678 basic blocks). The <tt>MachineBasicBlock</tt> class has a
 679 "<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
 680 comes from.</p>
 681
 682 </div>
 683
 684 <!-- ======================================================================= -->
 685 <div class="doc_subsection">
 686   <a name="machinefunction">The <tt>MachineFunction</tt> class</a>
 687 </div>
 688
 689 <div class="doc_text">
 690
 691 <p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
 692 (<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances).  It
 693 corresponds one-to-one with the LLVM function input to the instruction selector.
 694 In addition to a list of basic blocks, the <tt>MachineFunction</tt> contains a
 695 a <tt>MachineConstantPool</tt>, a <tt>MachineFrameInfo</tt>, a
 696 <tt>MachineFunctionInfo</tt>, a <tt>SSARegMap</tt>, and a set of live in and
 697 live out registers for the function.  See
 698 <tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
 699
 700 </div>
 701
 702 <!-- *********************************************************************** -->
 703 <div class="doc_section">
 704   <a name="codegenalgs">Target-independent code generation algorithms</a>
 705 </div>
 706 <!-- *********************************************************************** -->
 707
 708 <div class="doc_text">
 709
 710 <p>This section documents the phases described in the <a
 711 href="#high-level-design">high-level design of the code generator</a>.  It
 712 explains how they work and some of the rationale behind their design.</p>
 713
 714 </div>
 715
 716 <!-- ======================================================================= -->
 717 <div class="doc_subsection">
 718   <a name="instselect">Instruction Selection</a>
 719 </div>
 720
 721 <div class="doc_text">
 722 <p>
 723 Instruction Selection is the process of translating LLVM code presented to the
 724 code generator into target-specific machine instructions.  There are several
 725 well-known ways to do this in the literature.  In LLVM there are two main forms:
 726 the SelectionDAG based instruction selector framework and an old-style 'simple'
 727 instruction selector, which effectively peephole selects each LLVM instruction
 728 into a series of machine instructions.  We recommend that all targets use the
 729 SelectionDAG infrastructure.
 730 </p>
 731
 732 <p>Portions of the DAG instruction selector are generated from the target
 733 description (<tt>*.td</tt>) files.  Our goal is for the entire instruction
 734 selector to be generated from these <tt>.td</tt> files.</p>
 735 </div>
 736
 737 <!-- _______________________________________________________________________ -->
 738 <div class="doc_subsubsection">
 739   <a name="selectiondag_intro">Introduction to SelectionDAGs</a>
 740 </div>
 741
 742 <div class="doc_text">
 743
 744 <p>The SelectionDAG provides an abstraction for code representation in a way
 745 that is amenable to instruction selection using automatic techniques
 746 (e.g. dynamic-programming based optimal pattern matching selectors). It is also
 747 well-suited to other phases of code generation; in particular,
 748 instruction scheduling (SelectionDAG's are very close to scheduling DAGs
 749 post-selection).  Additionally, the SelectionDAG provides a host representation
 750 where a large variety of very-low-level (but target-independent)
 751 <a href="#selectiondag_optimize">optimizations</a> may be
 752 performed; ones which require extensive information about the instructions
 753 efficiently supported by the target.</p>
 754
 755 <p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
 756 <tt>SDNode</tt> class.  The primary payload of the <tt>SDNode</tt> is its
 757 operation code (Opcode) that indicates what operation the node performs and
 758 the operands to the operation.
 759 The various operation node types are described at the top of the
 760 <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt> file.</p>
 761
 762 <p>Although most operations define a single value, each node in the graph may
 763 define multiple values.  For example, a combined div/rem operation will define
 764 both the dividend and the remainder. Many other situations require multiple
 765 values as well.  Each node also has some number of operands, which are edges
 766 to the node defining the used value.  Because nodes may define multiple values,
 767 edges are represented by instances of the <tt>SDOperand</tt> class, which is
 768 a <tt>&lt;SDNode, unsigned&gt;</tt> pair, indicating the node and result
 769 value being used, respectively.  Each value produced by an <tt>SDNode</tt> has
 770 an associated <tt>MVT::ValueType</tt> indicating what type the value is.</p>
 771
 772 <p>SelectionDAGs contain two different kinds of values: those that represent
 773 data flow and those that represent control flow dependencies.  Data values are
 774 simple edges with an integer or floating point value type.  Control edges are
 775 represented as "chain" edges which are of type <tt>MVT::Other</tt>.  These edges
 776 provide an ordering between nodes that have side effects (such as
 777 loads, stores, calls, returns, etc).  All nodes that have side effects should
 778 take a token chain as input and produce a new one as output.  By convention,
 779 token chain inputs are always operand #0, and chain results are always the last
 780 value produced by an operation.</p>
 781
 782 <p>A SelectionDAG has designated "Entry" and "Root" nodes.  The Entry node is
 783 always a marker node with an Opcode of <tt>ISD::EntryToken</tt>.  The Root node
 784 is the final side-effecting node in the token chain. For example, in a single
 785 basic block function it would be the return node.</p>
 786
 787 <p>One important concept for SelectionDAGs is the notion of a "legal" vs.
 788 "illegal" DAG.  A legal DAG for a target is one that only uses supported
 789 operations and supported types.  On a 32-bit PowerPC, for example, a DAG with
 790 a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a
 791 SREM or UREM operation.  The
 792 <a href="#selectiondag_legalize">legalize</a> phase is responsible for turning
 793 an illegal DAG into a legal DAG.</p>
 794
 795 </div>
 796
 797 <!-- _______________________________________________________________________ -->
 798 <div class="doc_subsubsection">
 799   <a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
 800 </div>
 801
 802 <div class="doc_text">
 803
 804 <p>SelectionDAG-based instruction selection consists of the following steps:</p>
 805
 806 <ol>
 807 <li><a href="#selectiondag_build">Build initial DAG</a> - This stage
 808     performs a simple translation from the input LLVM code to an illegal
 809     SelectionDAG.</li>
 810 <li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> - This stage
 811     performs simple optimizations on the SelectionDAG to simplify it, and
 812     recognize meta instructions (like rotates and <tt>div</tt>/<tt>rem</tt>
 813     pairs) for targets that support these meta operations.  This makes the
 814     resultant code more efficient and the <a href="#selectiondag_select">select
 815     instructions from DAG</a> phase (below) simpler.</li>
 816 <li><a href="#selectiondag_legalize">Legalize SelectionDAG</a> - This stage
 817     converts the illegal SelectionDAG to a legal SelectionDAG by eliminating
 818     unsupported operations and data types.</li>
 819 <li><a href="#selectiondag_optimize">Optimize SelectionDAG (#2)</a> - This
 820     second run of the SelectionDAG optimizes the newly legalized DAG to
 821     eliminate inefficiencies introduced by legalization.</li>
 822 <li><a href="#selectiondag_select">Select instructions from DAG</a> - Finally,
 823     the target instruction selector matches the DAG operations to target
 824     instructions.  This process translates the target-independent input DAG into
 825     another DAG of target instructions.</li>
 826 <li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
 827     - The last phase assigns a linear order to the instructions in the
 828     target-instruction DAG and emits them into the MachineFunction being
 829     compiled.  This step uses traditional prepass scheduling techniques.</li>
 830 </ol>
 831
 832 <p>After all of these steps are complete, the SelectionDAG is destroyed and the
 833 rest of the code generation passes are run.</p>
 834
 835 <p>One great way to visualize what is going on here is to take advantage of a
 836 few LLC command line options.  In particular, the <tt>-view-isel-dags</tt>
 837 option pops up a window with the SelectionDAG input to the Select phase for all
 838 of the code compiled (if you only get errors printed to the console while using
 839 this, you probably <a href="ProgrammersManual.html#ViewGraph">need to configure
 840 your system</a> to add support for it).  The <tt>-view-sched-dags</tt> option
 841 views the SelectionDAG output from the Select phase and input to the Scheduler
 842 phase.</p>
 843
 844 </div>
 845
 846 <!-- _______________________________________________________________________ -->
 847 <div class="doc_subsubsection">
 848   <a name="selectiondag_build">Initial SelectionDAG Construction</a>
 849 </div>
 850
 851 <div class="doc_text">
 852
 853 <p>The initial SelectionDAG is naively peephole expanded from the LLVM input by
 854 the <tt>SelectionDAGLowering</tt> class in the
 855 <tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file.  The intent of this
 856 pass is to expose as much low-level, target-specific details to the SelectionDAG
 857 as possible.  This pass is mostly hard-coded (e.g. an LLVM <tt>add</tt> turns
 858 into an <tt>SDNode add</tt> while a <tt>geteelementptr</tt> is expanded into the
 859 obvious arithmetic). This pass requires target-specific hooks to lower calls,
 860 returns, varargs, etc.  For these features, the
 861 <tt><a href="#targetlowering">TargetLowering</a></tt> interface is used.</p>
 862
 863 </div>
 864
 865 <!-- _______________________________________________________________________ -->
 866 <div class="doc_subsubsection">
 867   <a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
 868 </div>
 869
 870 <div class="doc_text">
 871
 872 <p>The Legalize phase is in charge of converting a DAG to only use the types and
 873 operations that are natively supported by the target.  This involves two major
 874 tasks:</p>
 875
 876 <ol>
 877 <li><p>Convert values of unsupported types to values of supported types.</p>
 878     <p>There are two main ways of doing this: converting small types to
 879        larger types ("promoting"), and breaking up large integer types
 880        into smaller ones ("expanding").  For example, a target might require
 881        that all f32 values are promoted to f64 and that all i1/i8/i16 values
 882        are promoted to i32.  The same target might require that all i64 values
 883        be expanded into i32 values.  These changes can insert sign and zero
 884        extensions as needed to make sure that the final code has the same
 885        behavior as the input.</p>
 886     <p>A target implementation tells the legalizer which types are supported
 887        (and which register class to use for them) by calling the
 888        <tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
 889 </li>
 890
 891 <li><p>Eliminate operations that are not supported by the target.</p>
 892     <p>Targets often have weird constraints, such as not supporting every
 893        operation on every supported datatype (e.g. X86 does not support byte
 894        conditional moves and PowerPC does not support sign-extending loads from
 895        a 16-bit memory location).  Legalize takes care of this by open-coding
 896        another sequence of operations to emulate the operation ("expansion"), by
 897        promoting one type to a larger type that supports the operation
 898        ("promotion"), or by using a target-specific hook to implement the
 899        legalization ("custom").</p>
 900     <p>A target implementation tells the legalizer which operations are not
 901        supported (and which of the above three actions to take) by calling the
 902        <tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
 903        constructor.</p>
 904 </li>
 905 </ol>
 906
 907 <p>Prior to the existance of the Legalize pass, we required that every target
 908 <a href="#selectiondag_optimize">selector</a> supported and handled every
 909 operator and type even if they are not natively supported.  The introduction of
 910 the Legalize phase allows all of the cannonicalization patterns to be shared
 911 across targets, and makes it very easy to optimize the cannonicalized code
 912 because it is still in the form of a DAG.</p>
 913
 914 </div>
 915
 916 <!-- _______________________________________________________________________ -->
 917 <div class="doc_subsubsection">
 918   <a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
 919   Combiner</a>
 920 </div>
 921
 922 <div class="doc_text">
 923
 924 <p>The SelectionDAG optimization phase is run twice for code generation: once
 925 immediately after the DAG is built and once after legalization.  The first run
 926 of the pass allows the initial code to be cleaned up (e.g. performing
 927 optimizations that depend on knowing that the operators have restricted type
 928 inputs).  The second run of the pass cleans up the messy code generated by the
 929 Legalize pass, which allows Legalize to be very simple (it can focus on making
 930 code legal instead of focusing on generating <em>good</em> and legal code).</p>
 931
 932 <p>One important class of optimizations performed is optimizing inserted sign
 933 and zero extension instructions.  We currently use ad-hoc techniques, but could
 934 move to more rigorous techniques in the future.  Here are some good papers on
 935 the subject:</p>
 936
 937 <p>
 938  "<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
 939  integer arithmetic</a>"<br>
 940  Kevin Redwine and Norman Ramsey<br>
 941  International Conference on Compiler Construction (CC) 2004
 942 </p>
 943
 944
 945 <p>
 946  "<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
 947  sign extension elimination</a>"<br>
 948  Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
 949  Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
 950  and Implementation.
 951 </p>
 952
 953 </div>
 954
 955 <!-- _______________________________________________________________________ -->
 956 <div class="doc_subsubsection">
 957   <a name="selectiondag_select">SelectionDAG Select Phase</a>
 958 </div>
 959
 960 <div class="doc_text">
 961
 962 <p>The Select phase is the bulk of the target-specific code for instruction
 963 selection.  This phase takes a legal SelectionDAG as input, pattern matches the
 964 instructions supported by the target to this DAG, and produces a new DAG of
 965 target code.  For example, consider the following LLVM fragment:</p>
 966
 967 <div class="doc_code">
 968 <pre>
 969 %t1 = add float %W, %X
 970 %t2 = mul float %t1, %Y
 971 %t3 = add float %t2, %Z
 972 </pre>
 973 </div>
 974
 975 <p>This LLVM code corresponds to a SelectionDAG that looks basically like
 976 this:</p>
 977
 978 <div class="doc_code">
 979 <pre>
 980 (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
 981 </pre>
 982 </div>
 983
 984 <p>If a target supports floating point multiply-and-add (FMA) operations, one
 985 of the adds can be merged with the multiply.  On the PowerPC, for example, the
 986 output of the instruction selector might look like this DAG:</p>
 987
 988 <div class="doc_code">
 989 <pre>
 990 (FMADDS (FADDS W, X), Y, Z)
 991 </pre>
 992 </div>
 993
 994 <p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
 995 first two operands and adds the third (as single-precision floating-point
 996 numbers).  The <tt>FADDS</tt> instruction is a simple binary single-precision
 997 add instruction.  To perform this pattern match, the PowerPC backend includes
 998 the following instruction definitions:</p>
 999
1000 <div class="doc_code">
1001 <pre>
1002 def FMADDS : AForm_1&lt;59, 29,
1003                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
1004                     "fmadds $FRT, $FRA, $FRC, $FRB",
1005                     [<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
1006                                            F4RC:$FRB))</b>]&gt;;
1007 def FADDS : AForm_2&lt;59, 21,
1008                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
1009                     "fadds $FRT, $FRA, $FRB",
1010                     [<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]&gt;;
1011 </pre>
1012 </div>
1013
1014 <p>The portion of the instruction definition in bold indicates the pattern used
1015 to match the instruction.  The DAG operators (like <tt>fmul</tt>/<tt>fadd</tt>)
1016 are defined in the <tt>lib/Target/TargetSelectionDAG.td</tt> file.
1017 "<tt>F4RC</tt>" is the register class of the input and result values.<p>
1018
1019 <p>The TableGen DAG instruction selector generator reads the instruction
1020 patterns in the <tt>.td</tt> file and automatically builds parts of the pattern
1021 matching code for your target.  It has the following strengths:</p>
1022
1023 <ul>
1024 <li>At compiler-compiler time, it analyzes your instruction patterns and tells
1025     you if your patterns make sense or not.</li>
1026 <li>It can handle arbitrary constraints on operands for the pattern match.  In
1027     particular, it is straight-forward to say things like "match any immediate
1028     that is a 13-bit sign-extended value".  For examples, see the
1029     <tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
1030     backend.</li>
1031 <li>It knows several important identities for the patterns defined.  For
1032     example, it knows that addition is commutative, so it allows the
1033     <tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
1034     well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
1035     to specially handle this case.</li>
1036 <li>It has a full-featured type-inferencing system.  In particular, you should
1037     rarely have to explicitly tell the system what type parts of your patterns
1038     are.  In the <tt>FMADDS</tt> case above, we didn't have to tell
1039     <tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.  It
1040     was able to infer and propagate this knowledge from the fact that
1041     <tt>F4RC</tt> has type 'f32'.</li>
1042 <li>Targets can define their own (and rely on built-in) "pattern fragments".
1043     Pattern fragments are chunks of reusable patterns that get inlined into your
1044     patterns during compiler-compiler time.  For example, the integer
1045     "<tt>(not x)</tt>" operation is actually defined as a pattern fragment that
1046     expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not have a
1047     native '<tt>not</tt>' operation.  Targets can define their own short-hand
1048     fragments as they see fit.  See the definition of '<tt>not</tt>' and
1049     '<tt>ineg</tt>' for examples.</li>
1050 <li>In addition to instructions, targets can specify arbitrary patterns that
1051     map to one or more instructions using the 'Pat' class.  For example,
1052     the PowerPC has no way to load an arbitrary integer immediate into a
1053     register in one instruction. To tell tblgen how to do this, it defines:
1054     <br>
1055     <br>
1056     <div class="doc_code">
1057     <pre>
1058 // Arbitrary immediate support.  Implement in terms of LIS/ORI.
1059 def : Pat&lt;(i32 imm:$imm),
1060           (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))&gt;;
1061     </pre>
1062     </div>
1063     <br>
1064     If none of the single-instruction patterns for loading an immediate into a
1065     register match, this will be used.  This rule says "match an arbitrary i32
1066     immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and an
1067     <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to the
1068     left 16 bits') instruction".  To make this work, the
1069     <tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate the
1070     input immediate (in this case, take the high or low 16-bits of the
1071     immediate).</li>
1072 <li>While the system does automate a lot, it still allows you to write custom
1073     C++ code to match special cases if there is something that is hard to
1074     express.</li>
1075 </ul>
1076
1077 <p>While it has many strengths, the system currently has some limitations,
1078 primarily because it is a work in progress and is not yet finished:</p>
1079
1080 <ul>
1081 <li>Overall, there is no way to define or match SelectionDAG nodes that define
1082     multiple values (e.g. <tt>ADD_PARTS</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
1083     etc).  This is the biggest reason that you currently still <em>have to</em>
1084     write custom C++ code for your instruction selector.</li>
1085 <li>There is no great way to support matching complex addressing modes yet.  In
1086     the future, we will extend pattern fragments to allow them to define
1087     multiple values (e.g. the four operands of the <a href="#x86_memory">X86
1088     addressing mode</a>).  In addition, we'll extend fragments so that a
1089     fragment can match multiple different patterns.</li>
1090 <li>We don't automatically infer flags like isStore/isLoad yet.</li>
1091 <li>We don't automatically generate the set of supported registers and
1092     operations for the <a href="#"selectiondag_legalize>Legalizer</a> yet.</li>
1093 <li>We don't have a way of tying in custom legalized nodes yet.</li>
1094 </ul>
1095
1096 <p>Despite these limitations, the instruction selector generator is still quite
1097 useful for most of the binary and logical operations in typical instruction
1098 sets.  If you run into any problems or can't figure out how to do something,
1099 please let Chris know!</p>
1100
1101 </div>
1102
1103 <!-- _______________________________________________________________________ -->
1104 <div class="doc_subsubsection">
1105   <a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
1106 </div>
1107
1108 <div class="doc_text">
1109
1110 <p>The scheduling phase takes the DAG of target instructions from the selection
1111 phase and assigns an order.  The scheduler can pick an order depending on
1112 various constraints of the machines (i.e. order for minimal register pressure or
1113 try to cover instruction latencies).  Once an order is established, the DAG is
1114 converted to a list of <tt><a href="#machineinstr">MachineInstr</a></tt>s and
1115 the SelectionDAG is destroyed.</p>
1116
1117 <p>Note that this phase is logically separate from the instruction selection
1118 phase, but is tied to it closely in the code because it operates on
1119 SelectionDAGs.</p>
1120
1121 </div>
1122
1123 <!-- _______________________________________________________________________ -->
1124 <div class="doc_subsubsection">
1125   <a name="selectiondag_future">Future directions for the SelectionDAG</a>
1126 </div>
1127
1128 <div class="doc_text">
1129
1130 <ol>
1131 <li>Optional function-at-a-time selection.</li>
1132 <li>Auto-generate entire selector from <tt>.td</tt> file.</li>
1133 </li>
1134 </ol>
1135
1136 </div>
1137
1138 <!-- ======================================================================= -->
1139 <div class="doc_subsection">
1140   <a name="ssamco">SSA-based Machine Code Optimizations</a>
1141 </div>
1142 <div class="doc_text"><p>To Be Written</p></div>
1143 <!-- ======================================================================= -->
1144 <div class="doc_subsection">
1145   <a name="regalloc">Register Allocation</a>
1146 </div>
1147 <div class="doc_text"><p>To Be Written</p></div>
1148 <!-- ======================================================================= -->
1149 <div class="doc_subsection">
1150   <a name="proepicode">Prolog/Epilog Code Insertion</a>
1151 </div>
1152 <div class="doc_text"><p>To Be Written</p></div>
1153 <!-- ======================================================================= -->
1154 <div class="doc_subsection">
1155   <a name="latemco">Late Machine Code Optimizations</a>
1156 </div>
1157 <div class="doc_text"><p>To Be Written</p></div>
1158 <!-- ======================================================================= -->
1159 <div class="doc_subsection">
1160   <a name="codeemit">Code Emission</a>
1161 </div>
1162 <div class="doc_text"><p>To Be Written</p></div>
1163 <!-- _______________________________________________________________________ -->
1164 <div class="doc_subsubsection">
1165   <a name="codeemit_asm">Generating Assembly Code</a>
1166 </div>
1167 <div class="doc_text"><p>To Be Written</p></div>
1168 <!-- _______________________________________________________________________ -->
1169 <div class="doc_subsubsection">
1170   <a name="codeemit_bin">Generating Binary Machine Code</a>
1171 </div>
1172
1173 <div class="doc_text">
1174    <p>For the JIT or <tt>.o</tt> file writer</p>
1175 </div>
1176
1177
1178 <!-- *********************************************************************** -->
1179 <div class="doc_section">
1180   <a name="targetimpls">Target-specific Implementation Notes</a>
1181 </div>
1182 <!-- *********************************************************************** -->
1183
1184 <div class="doc_text">
1185
1186 <p>This section of the document explains features or design decisions that
1187 are specific to the code generator for a particular target.</p>
1188
1189 </div>
1190
1191
1192 <!-- ======================================================================= -->
1193 <div class="doc_subsection">
1194   <a name="x86">The X86 backend</a>
1195 </div>
1196
1197 <div class="doc_text">
1198
1199 <p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory.  This
1200 code generator currently targets a generic P6-like processor.  As such, it
1201 produces a few P6-and-above instructions (like conditional moves), but it does
1202 not make use of newer features like MMX or SSE.  In the future, the X86 backend
1203 will have sub-target support added for specific processor families and
1204 implementations.</p>
1205
1206 </div>
1207
1208 <!-- _______________________________________________________________________ -->
1209 <div class="doc_subsubsection">
1210   <a name="x86_tt">X86 Target Triples Supported</a>
1211 </div>
1212
1213 <div class="doc_text">
1214
1215 <p>The following are the known target triples that are supported by the X86
1216 backend.  This is not an exhaustive list, and it would be useful to add those
1217 that people test.</p>
1218
1219 <ul>
1220 <li><b>i686-pc-linux-gnu</b> - Linux</li>
1221 <li><b>i386-unknown-freebsd5.3</b> - FreeBSD 5.3</li>
1222 <li><b>i686-pc-cygwin</b> - Cygwin on Win32</li>
1223 <li><b>i686-pc-mingw32</b> - MingW on Win32</li>
1224 <li><b>i686-apple-darwin*</b> - Apple Darwin on X86</li>
1225 </ul>
1226
1227 </div>
1228
1229 <!-- _______________________________________________________________________ -->
1230 <div class="doc_subsubsection">
1231   <a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
1232 </div>
1233
1234 <div class="doc_text">
1235
1236 <p>The x86 has a very flexible way of accessing memory.  It is capable of
1237 forming memory addresses of the following expression directly in integer
1238 instructions (which use ModR/M addressing):</p>
1239
1240 <div class="doc_code">
1241 <pre>
1242 Base + [1,2,4,8] * IndexReg + Disp32
1243 </pre>
1244 </div>
1245
1246 <p>In order to represent this, LLVM tracks no less than 4 operands for each
1247 memory operand of this form.  This means that the "load" form of '<tt>mov</tt>'
1248 has the following <tt>MachineOperand</tt>s in this order:</p>
1249
1250 <pre>
1251 Index:        0     |    1        2       3           4
1252 Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement
1253 OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm
1254 </pre>
1255
1256 <p>Stores, and all other instructions, treat the four memory operands in the
1257 same way and in the same order.</p>
1258
1259 </div>
1260
1261 <!-- _______________________________________________________________________ -->
1262 <div class="doc_subsubsection">
1263   <a name="x86_names">Instruction naming</a>
1264 </div>
1265
1266 <div class="doc_text">
1267
1268 <p>An instruction name consists of the base name, a default operand size, and a
1269 a character per operand with an optional special size. For example:</p>
1270
1271 <p>
1272 <tt>ADD8rr</tt> -&gt; add, 8-bit register, 8-bit register<br>
1273 <tt>IMUL16rmi</tt> -&gt; imul, 16-bit register, 16-bit memory, 16-bit immediate<br>
1274 <tt>IMUL16rmi8</tt> -&gt; imul, 16-bit register, 16-bit memory, 8-bit immediate<br>
1275 <tt>MOVSX32rm16</tt> -&gt; movsx, 32-bit register, 16-bit memory
1276 </p>
1277
1278 </div>
1279
1280 <!-- *********************************************************************** -->
1281 <hr>
1282 <address>
1283   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
1284   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
1285   <a href="http://validator.w3.org/check/referer"><img
1286   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
1287
1288   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
1289   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
1290   Last modified: $Date$
1291 </address>
1292
1293 </body>
1294 </html>