X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FPasses.html;h=362be32d7da9fc9623e51a1bef688ad70dcc2dec;hb=0d20ac8d175575fbad1ed410c9fe610cf7255af3;hp=efa69fec5a2a2762eec1d083e474e74a478eb53b;hpb=ddaa61d8b5f9af9f8f0ea5348c413bb9a9321bbc;p=oota-llvm.git diff --git a/docs/Passes.html b/docs/Passes.html index efa69fec5a2..362be32d7da 100644 --- a/docs/Passes.html +++ b/docs/Passes.html @@ -34,6 +34,10 @@ while () { print @x, @y; EOT +This (real) one-liner can also be helpful when converting comments to HTML: + +perl -e '$/ = undef; for (split(/\n/, <>)) { s:^ *///? ?::; print "

\n" if !$on && $_ =~ /\S/; print "

\n" if $on && $_ =~ /^\s*$/; print " $_\n"; $on = ($_ =~ /\S/); } print "

\n" if $on' + -->
LLVM's Analysis and Transform Passes
@@ -46,7 +50,8 @@ EOT
-

Written by Reid Spencer

+

Written by Reid Spencer + and Gordon Henriksen

@@ -74,17 +79,15 @@ EOT -basicaaBasic Alias Analysis (default AA impl) -basiccgBasic CallGraph Construction -basicvnBasic Value Numbering (default GVN impl) --callgraphPrint a call graph --callsccPrint SCCs of the Call Graph --cfgsccPrint SCCs of each function CFG -codegenprepareOptimize for code generation -count-aaCount Alias Analysis Query Responses -debug-aaAA use debugger -domfrontierDominance Frontier Construction -domtreeDominator Tree Construction --externalfnconstantsPrint external fn callsites passed constants +-dot-callgraphPrint Call Graph to 'dot' file +-dot-cfgPrint CFG of function to 'dot' file +-dot-cfg-onlyPrint CFG of function to 'dot' file (with no function bodies) -globalsmodref-aaSimple mod/ref analysis for globals --gvnGlobal Value Numbering -instcountCounts the various types of Instructions -intervalsInterval Partition Construction -load-vnLoad Value Numbering @@ -94,13 +97,14 @@ EOT -no-profileNo Profile Information -postdomfrontierPost-Dominance Frontier Construction -postdomtreePost-Dominator Tree Construction --printPrint function to stderr -print-alias-setsAlias Set Printer --print-callgraphPrint Call Graph to 'dot' file --print-cfgPrint CFG of function to 'dot' file --print-cfg-onlyPrint CFG of function to 'dot' file (with no function bodies) --printmPrint module to stderr --printusedtypesFind Used Types +-print-callgraphPrint a call graph +-print-callgraph-sccsPrint SCCs of the Call Graph +-print-cfg-sccsPrint SCCs of each function CFG +-print-externalfnconstantsPrint external fn callsites passed constants +-print-functionPrint function to stderr +-print-modulePrint module to stderr +-print-used-typesFind Used Types -profile-loaderLoad profile information from llvmprof.out -scalar-evolutionScalar Evolution Analysis -targetdataTarget Data Layout @@ -112,7 +116,7 @@ EOT -argpromotionPromote 'by reference' arguments to scalars -block-placementProfile Guided Basic Block Placement -break-crit-edgesBreak critical edges in CFG --ceeCorrelated Expression Elimination +-codegenpreparePrepare a function for code generation -condpropConditional Propagation -constmergeMerge Duplicate Global Constants -constpropSimple constant propagation @@ -124,6 +128,7 @@ EOT -gcseGlobal Common Subexpression Elimination -globaldceDead Global Elimination -globaloptGlobal Variable Optimizer +-gvnGlobal Value Numbering -gvnpreGlobal Value Numbering/Partial Redundancy Elimination -indmemremIndirect Malloc and Free Removal -indvarsCanonicalize Induction Variables @@ -137,8 +142,10 @@ EOT -internalizeInternalize Global Symbols -ipconstpropInterprocedural constant propagation -ipsccpInterprocedural Sparse Conditional Constant Propagation +-jump-threadingThread control through conditional blocks -lcssaLoop-Closed SSA Form Pass -licmLoop Invariant Code Motion +-loop-deletionDead Loop Deletion Pass -loop-extractExtract loops into new functions -loop-extract-singleExtract at most one loop into a new function -loop-index-splitIndex Split Loops @@ -147,14 +154,12 @@ EOT -loop-unrollUnroll loops -loop-unswitchUnswitch loops -loopsimplifyCanonicalize natural loops --lower-packedlowers packed operations to operations on smaller packed datatypes -lowerallocsLower allocations from instructions to calls --lowergcLower GC intrinsics, for GCless code generators -lowerinvokeLower invoke and unwind, for unwindless code generators --lowerselectLower select instructions to branches -lowersetjmpLower Set Jump -lowerswitchLower SwitchInst's to branches -mem2regPromote Memory to Register +-memcpyoptOptimize use of memcpy and friends -mergereturnUnify function exit nodes -predsimplifyPredicate Simplifier -prune-ehRemove unused exception handling info @@ -166,6 +171,8 @@ EOT -simplify-libcallsSimplify well-known library calls -simplifycfgSimplify the CFG -stripStrip all symbols from a module +-strip-dead-prototypesRemove unused function declarations +-sretpromotionPromote sret arguments -tailcallelimTail Call Elimination -tailduplicateTail Duplication @@ -174,7 +181,7 @@ EOT OptionName -deadarghaX0rDead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE) -extract-blocksExtract Basic Blocks From Module (for bugpoint use) --emitbitcodeBitcode Writer +-preverifyPreliminary module verification -verifyModule Verifier -view-cfgView CFG of function -view-cfg-onlyView CFG of function (with no function bodies) @@ -192,7 +199,13 @@ EOT Exhaustive Alias Analysis Precision Evaluator
-

Yet to be written.

+

This is a simple N^2 alias analysis accuracy evaluator. + Basically, for each function in the program, it simply queries to see how the + alias analysis implementation answers alias queries between each pair of + pointers in the function.

+ +

This is inspired and adapted from code by: Naveen Neelakantam, Francesco + Spadini, and Wojciech Stryjewski.

@@ -200,7 +213,73 @@ EOT Andersen's Interprocedural Alias Analysis
-

Yet to be written.

+

+ This is an implementation of Andersen's interprocedural alias + analysis +

+ +

+ In pointer analysis terms, this is a subset-based, flow-insensitive, + field-sensitive, and context-insensitive algorithm pointer algorithm. +

+ +

+ This algorithm is implemented as three stages: +

+ +
    +
  1. Object identification.
  2. +
  3. Inclusion constraint identification.
  4. +
  5. Offline constraint graph optimization.
  6. +
  7. Inclusion constraint solving.
  8. +
+ +

+ The object identification stage identifies all of the memory objects in the + program, which includes globals, heap allocated objects, and stack allocated + objects. +

+ +

+ The inclusion constraint identification stage finds all inclusion constraints + in the program by scanning the program, looking for pointer assignments and + other statements that effect the points-to graph. For a statement like + A = B, this statement is processed to + indicate that A can point to anything that B can point + to. Constraints can handle copies, loads, and stores, and address taking. +

+ +

+ The offline constraint graph optimization portion includes offline variable + substitution algorithms intended to computer pointer and location + equivalences. Pointer equivalences are those pointers that will have the + same points-to sets, and location equivalences are those variables that + always appear together in points-to sets. +

+ +

+ The inclusion constraint solving phase iteratively propagates the inclusion + constraints until a fixed point is reached. This is an O(n³) + algorithm. +

+ +

+ Function constraints are handled as if they were structs with X + fields. Thus, an access to argument X of function Y is + an access to node index getNode(Y) + X. + This representation allows handling of indirect calls without any issues. To + wit, an indirect call Y(a,b) is + equivalent to *(Y + 1) = a, *(Y + 2) = + b. The return node for a function F is always + located at getNode(F) + CallReturnPos. The arguments + start at getNode(F) + CallArgPos. +

+ +

+ Please keep in mind that the current andersen's pass has many known + problems and bugs. It should be considered "research quality". +

+
@@ -208,7 +287,11 @@ EOT Basic Alias Analysis (default AA impl)
-

Yet to be written.

+

+ This is the default implementation of the Alias Analysis interface + that simply implements a few identities (two different globals cannot alias, + etc), but otherwise does no analysis. +

@@ -221,82 +304,120 @@ EOT
- Basic Value Numbering (default GVN impl) + Basic Value Numbering (default Value Numbering impl)
-

Yet to be written.

+

+ This is the default implementation of the ValueNumbering + interface. It walks the SSA def-use chains to trivially identify + lexically identical expressions. This does not require any ahead of time + analysis, so it is a very fast default implementation. +

+

+ The ValueNumbering analysis passes are mostly deprecated. They are only used + by the Global Common Subexpression Elimination pass, which + is deprecated by the Global Value Numbering pass (which + does its value numbering on its own). +

- Print a call graph -
-
-

Yet to be written.

-
- - -
- Print SCCs of the Call Graph + Optimize for code generation
-

Yet to be written.

+

+ This pass munges the code in the input function to better prepare it for + SelectionDAG-based code generation. This works around limitations in it's + basic-block-at-a-time approach. It should eventually be removed. +

- Print SCCs of each function CFG + Count Alias Analysis Query Responses
-

Yet to be written.

+

+ A pass which can be used to count how many alias queries + are being made and how the alias analysis implementation being used responds. +

- Optimize for code generation + AA use debugger
-

Yet to be written.

+

+ This simple pass checks alias analysis users to ensure that if they + create a new value, they do not query AA without informing it of the value. + It acts as a shim over any other AA pass you want. +

+ +

+ Yes keeping track of every value in the program is expensive, but this is + a debugging pass. +

- Count Alias Analysis Query Responses + Dominance Frontier Construction
-

Yet to be written.

+

+ This pass is a simple dominator construction algorithm for finding forward + dominator frontiers. +

- AA use debugger + Dominator Tree Construction
-

Yet to be written.

+

+ This pass is a simple dominator construction algorithm for finding forward + dominators. +

- Dominance Frontier Construction + Print Call Graph to 'dot' file
-

Yet to be written.

+

+ This pass, only available in opt, prints the call graph into a + .dot graph. This graph can then be processed with the "dot" tool + to convert it to postscript or some other suitable format. +

- Dominator Tree Construction + Print CFG of function to 'dot' file
-

Yet to be written.

+

+ This pass, only available in opt, prints the control flow graph + into a .dot graph. This graph can then be processed with the + "dot" tool to convert it to postscript or some other suitable format. +

- Print external fn callsites passed constants + Print CFG of function to 'dot' file (with no function bodies)
-

Yet to be written.

+

+ This pass, only available in opt, prints the control flow graph + into a .dot graph, omitting the function bodies. This graph can + then be processed with the "dot" tool to convert it to postscript or some + other suitable format. +

@@ -304,15 +425,12 @@ EOT Simple mod/ref analysis for globals
-

Yet to be written.

-
- - -
- Global Value Numbering -
-
-

Yet to be written.

+

+ This simple pass provides alias and mod/ref information for global values + that do not have their address taken, and keeps track of whether functions + read or write memory (are "pure"). For this simple (but very common) case, + we can provide pretty accurate and useful information. +

@@ -320,7 +438,9 @@ EOT Counts the various types of Instructions
-

Yet to be written.

+

+ This pass collects the count of all instructions and reports them +

@@ -328,7 +448,15 @@ EOT Interval Partition Construction
-

Yet to be written.

+

+ This analysis calculates and represents the interval partition of a function, + or a preexisting interval partition. +

+ +

+ In this way, the interval partition may be used to reduce a flow graph down + to its degenerate single node interval partition (unless it is irreducible). +

@@ -336,7 +464,21 @@ EOT Load Value Numbering
-

Yet to be written.

+

+ This pass value numbers load and call instructions. To do this, it finds + lexically identical load instructions, and uses alias analysis to determine + which loads are guaranteed to produce the same value. To value number call + instructions, it looks for calls to functions that do not write to memory + which do not have intervening instructions that clobber the memory that is + read from. +

+ +

+ This pass builds off of another value numbering pass to implement value + numbering for non-load and non-call instructions. It uses Alias Analysis so + that it can disambiguate the load instructions. The more powerful these base + analyses are, the more powerful the resultant value numbering will be. +

@@ -344,7 +486,12 @@ EOT Natural Loop Construction
-

Yet to be written.

+

+ This analysis is used to identify natural loops and determine the loop depth + of various nodes of the CFG. Note that the loops identified may actually be + several natural loops that share the same header node... not just a single + natural loop. +

@@ -352,7 +499,12 @@ EOT Memory Dependence Analysis
-

Yet to be written.

+

+ An analysis that determines, for a given memory operation, what preceding + memory operations it depends on. It builds on alias analysis information, and + tries to provide a lazy, caching interface to a common kind of alias + information query. +

@@ -360,7 +512,11 @@ EOT No Alias Analysis (always returns 'may' alias)
-

Yet to be written.

+

+ Always returns "I don't know" for alias queries. NoAA is unlike other alias + analysis implementations, in that it does not chain to a previous analysis. As + such it doesn't follow many of the rules that other alias analyses must. +

@@ -368,7 +524,10 @@ EOT No Profile Information
-

Yet to be written.

+

+ The default "no profile" implementation of the abstract + ProfileInfo interface. +

@@ -376,7 +535,10 @@ EOT Post-Dominance Frontier Construction
-

Yet to be written.

+

+ This pass is a simple post-dominator construction algorithm for finding + post-dominator frontiers. +

@@ -384,12 +546,15 @@ EOT Post-Dominator Tree Construction
-

Yet to be written.

+

+ This pass is a simple post-dominator construction algorithm for finding + post-dominators. +

- Print function to stderr + Alias Set Printer

Yet to be written.

@@ -397,50 +562,81 @@ EOT
-

Yet to be written.

+

+ This pass, only available in opt, prints the call graph to + standard output in a human-readable form. +

-

Yet to be written.

+

+ This pass, only available in opt, prints the SCCs of the call + graph to standard output in a human-readable form. +

-

Yet to be written.

+

+ This pass, only available in opt, prints the SCCs of each + function CFG to standard output in a human-readable form. +

-

Yet to be written.

+

+ This pass, only available in opt, prints out call sites to + external functions that are called with constant arguments. This can be + useful when looking for standard library functions we should constant fold + or handle in alias analyses. +

-

Yet to be written.

+

+ The PrintFunctionPass class is designed to be pipelined with + other FunctionPasses, and prints out the functions of the module + as they are processed. +

-

Yet to be written.

+

+ This pass simply prints out the entire module when it is executed. +

+
+ + + +
+

+ This pass is used to seek out all of the types in use by the program. Note + that this analysis explicitly does not include types only used by the symbol + table.

@@ -448,7 +644,10 @@ EOT Load profile information from llvmprof.out
-

Yet to be written.

+

+ A concrete implementation of profiling information that loads the information + from a profile dump file. +

@@ -456,7 +655,18 @@ EOT Scalar Evolution Analysis
-

Yet to be written.

+

+ The ScalarEvolution analysis can be used to analyze and + catagorize scalar expressions in loops. It specializes in recognizing general + induction variables, representing them with the abstract and opaque + SCEV class. Given this analysis, trip counts of loops and other + important properties can be obtained. +

+ +

+ This analysis is primarily useful for induction variable substitution and + strength reduction. +

@@ -464,7 +674,8 @@ EOT Target Data Layout
-

Yet to be written.

+

Provides other passes access to information on how the size and alignment + required by the the target ABI for various data types.

@@ -489,7 +700,30 @@ EOT Promote 'by reference' arguments to scalars
-

Yet to be written.

+

+ This pass promotes "by reference" arguments to be "by value" arguments. In + practice, this means looking for internal functions that have pointer + arguments. If it can prove, through the use of alias analysis, that an + argument is *only* loaded, then it can pass the value into the function + instead of the address of the value. This can cause recursive simplification + of code and lead to the elimination of allocas (especially in C++ template + code like the STL). +

+ +

+ This pass also handles aggregate arguments that are passed into a function, + scalarizing them if the elements of the aggregate are only loaded. Note that + it refuses to scalarize aggregates which would require passing in more than + three operands to the function, because passing thousands of operands for a + large array or structure is unprofitable! +

+ +

+ Note that this transformation could also be done for arguments that are only + stored to (returning the value instead), but does not currently. This case + would be best handled when and if LLVM starts supporting multiple return + values from functions. +

@@ -497,21 +731,11 @@ EOT Profile Guided Basic Block Placement
-

This pass implements a very simple profile guided basic block placement - algorithm. The idea is to put frequently executed blocks together at the - start of the function, and hopefully increase the number of fall-through - conditional branches. If there is no profile information for a particular - function, this pass basically orders blocks in depth-first order.

-

The algorithm implemented here is basically "Algo1" from "Profile Guided - Code Positioning" by Pettis and Hansen, except that it uses basic block - counts instead of edge counts. This could be improved in many ways, but is - very simple for now.

-

Basically we "place" the entry block, then loop over all successors in a - DFO, placing the most frequently executed successor until we run out of - blocks. Did we mention that this was extremely simplistic? This is - also much slower than it could be. When it becomes important, this pass - will be rewritten to use a better algorithm, and then we can worry about - efficiency.

+

This pass is a very simple profile guided basic block placement algorithm. + The idea is to put frequently executed blocks together at the start of the + function and hopefully increase the number of fall-through conditional + branches. If there is no profile information for a particular function, this + pass basically orders blocks in depth-first order.

@@ -519,32 +743,22 @@ EOT Break critical edges in CFG
-

Yet to be written.

+

+ Break all of the critical edges in the CFG by inserting a dummy basic block. + It may be "required" by passes that cannot deal with critical edges. This + transformation obviously invalidates the CFG, but can update forward dominator + (set, immediate dominators, tree, and frontier) information. +

- Correlated Expression Elimination + Prepare a function for code generation
-

Correlated Expression Elimination propagates information from conditional - branches to blocks dominated by destinations of the branch. It propagates - information from the condition check itself into the body of the branch, - allowing transformations like these for example:

- -
-if (i == 7)
-  ... 4*i;  // constant propagation
-
-M = i+1; N = j+1;
-if (i == j)
-  X = M-N;  // = M-M == 0;
-
- -

This is called Correlated Expression Elimination because we eliminate or - simplify expressions that are correlated with the direction of a branch. In - this way we use static information to give us some information about the - dynamic value of a variable.

+ This pass munges the code in the input function to better prepare it for + SelectionDAG-based code generation. This works around limitations in it's + basic-block-at-a-time approach. It should eventually be removed.
@@ -561,7 +775,12 @@ if (i == j) Merge Duplicate Global Constants
-

Yet to be written.

+

+ Merges duplicate global constants together into a single constant that is + shared. This is useful because some passes (ie TraceValues) insert a lot of + string constants into the program, regardless of whether or not an existing + string is available. +

@@ -585,7 +804,11 @@ if (i == j) Dead Code Elimination
-

Yet to be written.

+

+ Dead code elimination is similar to dead instruction + elimination, but it rechecks instructions that were used by removed + instructions to see if they are newly dead. +

@@ -593,7 +816,17 @@ if (i == j) Dead Argument Elimination
-

Yet to be written.

+

+ This pass deletes dead arguments from internal functions. Dead argument + elimination removes arguments which are directly dead, as well as arguments + only passed into function calls as dead arguments of other functions. This + pass also deletes dead arguments in a similar way. +

+ +

+ This pass is often useful as a cleanup pass to run after aggressive + interprocedural passes, which add possibly-dead arguments. +

@@ -601,7 +834,11 @@ if (i == j) Dead Type Elimination
-

Yet to be written.

+

+ This pass is used to cleanup the output of GCC. It eliminate names for types + that are unused in the entire translation unit, using the find used types pass. +

@@ -609,7 +846,10 @@ if (i == j) Dead Instruction Elimination
-

Yet to be written.

+

+ Dead instruction elimination performs a single pass over the function, + removing instructions that are obviously dead. +

@@ -617,7 +857,10 @@ if (i == j) Dead Store Elimination
-

Yet to be written.

+

+ A trivial dead store elimination that only considers basic-block local + redundant stores. +

@@ -625,7 +868,16 @@ if (i == j) Global Common Subexpression Elimination
-

Yet to be written.

+

+ This pass is designed to be a very quick global transformation that + eliminates global common subexpressions from a function. It does this by + using an existing value numbering analysis pass to identify the common + subexpressions, eliminating them when possible. +

+

+ This pass is deprecated by the Global Value Numbering pass + (which does a better job with its own value numbering). +

@@ -633,7 +885,13 @@ if (i == j) Dead Global Elimination
-

Yet to be written.

+

+ This transform is designed to eliminate unreachable internal globals from the + program. It uses an aggressive algorithm, searching out globals that are + known to be alive. After it finds all of the globals which are needed, it + deletes whatever is left over. This allows it to delete recursive chunks of + the program which are unreachable. +

@@ -641,7 +899,26 @@ if (i == j) Global Variable Optimizer
-

Yet to be written.

+

+ This pass transforms simple global variables that never have their address + taken. If obviously true, it marks read/write globals as constant, deletes + variables only stored to, etc. +

+
+ + +
+ Global Value Numbering +
+
+

+ This pass performs global value numbering to eliminate fully redundant + instructions. It also performs simple dead load elimination. +

+

+ Note that this pass does the value numbering itself, it does not use the + ValueNumbering analysis passes. +

@@ -649,7 +926,20 @@ if (i == j) Global Value Numbering/Partial Redundancy Elimination
-

Yet to be written.

+

+ This pass performs a hybrid of global value numbering and partial redundancy + elimination, known as GVN-PRE. It performs partial redundancy elimination on + values, rather than lexical expressions, allowing a more comprehensive view + the optimization. It replaces redundant values with uses of earlier + occurences of the same value. While this is beneficial in that it eliminates + unneeded computation, it also increases register pressure by creating large + live ranges, and should be used with caution on platforms that are very + sensitive to register pressure. +

+

+ Note that this pass does the value numbering itself, it does not use the + ValueNumbering analysis passes. +

@@ -657,7 +947,16 @@ if (i == j) Indirect Malloc and Free Removal
-

Yet to be written.

+

+ This pass finds places where memory allocation functions may escape into + indirect land. Some transforms are much easier (aka possible) only if free + or malloc are not called indirectly. +

+ +

+ Thus find places where the address of memory functions are taken and construct + bounce functions with direct calls of those functions. +

@@ -665,7 +964,50 @@ if (i == j) Canonicalize Induction Variables
-

Yet to be written.

+

+ This transformation analyzes and transforms the induction variables (and + computations derived from them) into simpler forms suitable for subsequent + analysis and transformation. +

+ +

+ This transformation makes the following changes to each loop with an + identifiable induction variable: +

+ +
    +
  1. All loops are transformed to have a single canonical + induction variable which starts at zero and steps by one.
  2. +
  3. The canonical induction variable is guaranteed to be the first PHI node + in the loop header block.
  4. +
  5. Any pointer arithmetic recurrences are raised to use array + subscripts.
  6. +
+ +

+ If the trip count of a loop is computable, this pass also makes the following + changes: +

+ +
    +
  1. The exit condition for the loop is canonicalized to compare the + induction value against the exit value. This turns loops like: +
    for (i = 7; i*i < 1000; ++i)
    + into +
    for (i = 0; i != 25; ++i)
  2. +
  3. Any use outside of the loop of an expression derived from the indvar + is changed to compute the derived value outside of the loop, eliminating + the dependence on the exit value of the induction variable. If the only + purpose of the loop is to compute the exit value of some derived + expression, this transformation will make the loop dead.
  4. +
+ +

+ This transformation should be followed by strength reduction after all of the + desired loop transformations have been performed. Additionally, on targets + where it is profitable, the loop could be transformed to count down to zero + (the "do loop" optimization). +

@@ -673,7 +1015,9 @@ if (i == j) Function Integration/Inlining
-

Yet to be written.

+

+ Bottom-up inlining of functions into callees. +

@@ -681,7 +1025,18 @@ if (i == j) Insert instrumentation for block profiling
-

Yet to be written.

+

+ This pass instruments the specified program with counters for basic block + profiling, which counts the number of times each basic block executes. This + is the most basic form of profiling, which can tell which blocks are hot, but + cannot reliably detect hot paths through the CFG. +

+ +

+ Note that this implementation is very naïve. Control equivalent regions of + the CFG should not require duplicate counters, but it does put duplicate + counters in. +

@@ -689,7 +1044,17 @@ if (i == j) Insert instrumentation for edge profiling
-

Yet to be written.

+

+ This pass instruments the specified program with counters for edge profiling. + Edge profiling can give a reasonable approximation of the hot paths through a + program, and is used for a wide variety of program transformations. +

+ +

+ Note that this implementation is very naïve. It inserts a counter for + every edge in the program, instead of using control flow information + to prune the number of counters inserted. +

@@ -697,7 +1062,10 @@ if (i == j) Insert instrumentation for function profiling
-

Yet to be written.

+

+ This pass instruments the specified program with counters for function + profiling, which counts the number of times each function is called. +

@@ -705,7 +1073,11 @@ if (i == j) Measure profiling framework overhead
-

Yet to be written.

+

+ The basic profiler that does nothing. It is the default profiler and thus + terminates RSProfiler chains. It is useful for measuring + framework overhead. +

@@ -713,7 +1085,20 @@ if (i == j) Insert random sampling instrumentation framework
-

Yet to be written.

+

+ The second stage of the random-sampling instrumentation framework, duplicates + all instructions in a function, ignoring the profiling code, then connects the + two versions together at the entry and at backedges. At each connection point + a choice is made as to whether to jump to the profiled code (take a sample) or + execute the unprofiled code. +

+ +

+ After this pass, it is highly recommended to runmem2reg + and adce. instcombine, + load-vn, gdce, and + dse also are good to run afterwards. +

@@ -721,7 +1106,53 @@ if (i == j) Combine redundant instructions
-

Yet to be written.

+

+ Combine instructions to form fewer, simple + instructions. This pass does not modify the CFG This pass is where algebraic + simplification happens. +

+ +

+ This pass combines things like: +

+ +
%Y = add i32 %X, 1
+%Z = add i32 %Y, 1
+ +

+ into: +

+ +
%Z = add i32 %X, 2
+ +

+ This is a simple worklist driven algorithm. +

+ +

+ This pass guarantees that the following canonicalizations are performed on + the program: +

+ +
@@ -729,7 +1160,11 @@ if (i == j) Internalize Global Symbols
-

Yet to be written.

+

+ This pass loops over all of the functions in the input module, looking for a + main function. If a main function is found, all other functions and all + global variables with initializers are marked as internal. +

@@ -737,7 +1172,13 @@ if (i == j) Interprocedural constant propagation
-

Yet to be written.

+

+ This pass implements an extremely simple interprocedural constant + propagation pass. It could certainly be improved in many different ways, + like using a worklist. This pass makes arguments dead, but does not remove + them. The existing dead argument elimination pass should be run after this + to clean up the mess. +

@@ -745,7 +1186,39 @@ if (i == j) Interprocedural Sparse Conditional Constant Propagation
-

Yet to be written.

+

+ An interprocedural variant of Sparse Conditional Constant + Propagation. +

+
+ + +
+ Thread control through conditional blocks +
+
+

+ Jump threading tries to find distinct threads of control flow running through + a basic block. This pass looks at blocks that have multiple predecessors and + multiple successors. If one or more of the predecessors of the block can be + proven to always cause a jump to one of the successors, we forward the edge + from the predecessor to the successor by duplicating the contents of this + block. +

+

+ An example of when this can occur is code like this: +

+ +
if () { ...
+  X = 4;
+}
+if (X < 3) {
+ +

+ In this case, the unconditional branch at the end of the first if can be + revectored to the false side of the second if. +

@@ -753,7 +1226,28 @@ if (i == j) Loop-Closed SSA Form Pass
-

Yet to be written.

+

+ This pass transforms loops by placing phi nodes at the end of the loops for + all values that are live across the loop boundary. For example, it turns + the left into the right code: +

+ +
for (...)                for (...)
+  if (c)                   if (c)
+    X1 = ...                 X1 = ...
+  else                     else
+    X2 = ...                 X2 = ...
+  X3 = phi(X1, X2)         X3 = phi(X1, X2)
+... = X3 + 4              X4 = phi(X3)
+                          ... = X4 + 4
+ +

+ This is still valid LLVM; the extra phi nodes are purely redundant, and will + be trivially eliminated by InstCombine. The major benefit of + this transformation is that it makes many other loop optimizations, such as + LoopUnswitching, simpler. +

@@ -761,7 +1255,48 @@ if (i == j) Loop Invariant Code Motion
-

Yet to be written.

+

+ This pass performs loop invariant code motion, attempting to remove as much + code from the body of a loop as possible. It does this by either hoisting + code into the preheader block, or by sinking code to the exit blocks if it is + safe. This pass also promotes must-aliased memory locations in the loop to + live in registers, thus hoisting and sinking "invariant" loads and stores. +

+ +

+ This pass uses alias analysis for two purposes: +

+ + +
+ +
+ Dead Loop Deletion Pass +
+
+

+ This file implements the Dead Loop Deletion Pass. This pass is responsible + for eliminating loops with non-infinite computable trip counts that have no + side effects or volatile instructions, and do not contribute to the + computation of the function's return value. +

@@ -769,7 +1304,12 @@ if (i == j) Extract loops into new functions
-

Yet to be written.

+

+ A pass wrapper around the ExtractLoop() scalar transformation to + extract each top-level loop into its own new function. If the loop is the + only loop in a given function, it is not touched. This is a pass most + useful for debugging via bugpoint. +

@@ -777,7 +1317,11 @@ if (i == j) Extract at most one loop into a new function
-

Yet to be written.

+

+ Similar to Extract loops into new functions, + this pass extracts one natural loop from the program into a function if it + can. This is used by bugpoint. +

@@ -785,7 +1329,10 @@ if (i == j) Index Split Loops
-

Yet to be written.

+

+ This pass divides loop's iteration range by spliting loop such that each + individual loop is executed efficiently. +

@@ -793,7 +1340,13 @@ if (i == j) Loop Strength Reduction
-

Yet to be written.

+

+ This pass performs a strength reduction on array references inside loops that + have as one or more of their components the loop induction variable. This is + accomplished by creating a new value to hold the initial value of the array + access for the first iteration, and then creating a new GEP instruction in + the loop to increment the value by the appropriate amount. +

@@ -801,7 +1354,7 @@ if (i == j) Rotate Loops
-

Yet to be written.

+

A simple loop rotation transformation.

@@ -809,7 +1362,11 @@ if (i == j) Unroll loops
-

Yet to be written.

+

+ This pass implements a simple loop unroller. It works best when loops have + been canonicalized by the -indvars pass, + allowing it to determine the trip counts of loops easily. +

@@ -817,7 +1374,29 @@ if (i == j) Unswitch loops
-

Yet to be written.

+

+ This pass transforms loops that contain branches on loop-invariant conditions + to have multiple loops. For example, it turns the left into the right code: +

+ +
for (...)                  if (lic)
+  A                          for (...)
+  if (lic)                     A; B; C
+    B                      else
+  C                          for (...)
+                               A; C
+ +

+ This can increase the size of the code exponentially (doubling it every time + a loop is unswitched) so we only unswitch if the resultant code will be + smaller than a threshold. +

+ +

+ This pass expects LICM to be run before it to hoist invariant conditions out + of the loop, to make the unswitching opportunity obvious. +

@@ -825,15 +1404,40 @@ if (i == j) Canonicalize natural loops
-

Yet to be written.

-
- - -
- lowers packed operations to operations on smaller packed datatypes -
-
-

Yet to be written.

+

+ This pass performs several transformations to transform natural loops into a + simpler form, which makes subsequent analyses and transformations simpler and + more effective. +

+ +

+ Loop pre-header insertion guarantees that there is a single, non-critical + entry edge from outside of the loop to the loop header. This simplifies a + number of analyses and transformations, such as LICM. +

+ +

+ Loop exit-block insertion guarantees that all exit blocks from the loop + (blocks which are outside of the loop that have predecessors inside of the + loop) only have predecessors from inside of the loop (and are thus dominated + by the loop header). This simplifies transformations such as store-sinking + that are built into LICM. +

+ +

+ This pass also guarantees that loops will have exactly one backedge. +

+ +

+ Note that the simplifycfg pass will clean up blocks which are split out but + end up being unnecessary, so usage of this pass should not pessimize + generated code. +

+ +

+ This pass obviously modifies the CFG, but updates loop information and + dominator information. +

@@ -841,15 +1445,15 @@ if (i == j) Lower allocations from instructions to calls
-

Yet to be written.

-
+

+ Turn malloc and free instructions into @malloc and + @free calls. +

- -
- Lower GC intrinsics, for GCless code generators -
-
-

Yet to be written.

+

+ This is a target-dependent tranformation because it depends on the size of + data types and alignment constraints. +

@@ -857,39 +1461,108 @@ if (i == j) Lower invoke and unwind, for unwindless code generators
-

Yet to be written.

+

+ This transformation is designed for use by code generators which do not yet + support stack unwinding. This pass supports two models of exception handling + lowering, the 'cheap' support and the 'expensive' support. +

+ +

+ 'Cheap' exception handling support gives the program the ability to execute + any program which does not "throw an exception", by turning 'invoke' + instructions into calls and by turning 'unwind' instructions into calls to + abort(). If the program does dynamically use the unwind instruction, the + program will print a message then abort. +

+ +

+ 'Expensive' exception handling support gives the full exception handling + support to the program at the cost of making the 'invoke' instruction + really expensive. It basically inserts setjmp/longjmp calls to emulate the + exception handling as necessary. +

+ +

+ Because the 'expensive' support slows down programs a lot, and EH is only + used for a subset of the programs, it must be specifically enabled by the + -enable-correct-eh-support option. +

+ +

+ Note that after this pass runs the CFG is not entirely accurate (exceptional + control flow edges are not correct anymore) so only very simple things should + be done after the lowerinvoke pass has run (like generation of native code). + This should not be used as a general purpose "my LLVM-to-LLVM pass doesn't + support the invoke instruction yet" lowering pass. +

- Lower select instructions to branches + Lower Set Jump
-

Yet to be written.

+

+ Lowers setjmp and longjmp to use the LLVM invoke and unwind + instructions as necessary. +

+ +

+ Lowering of longjmp is fairly trivial. We replace the call with a + call to the LLVM library function __llvm_sjljeh_throw_longjmp(). + This unwinds the stack for us calling all of the destructors for + objects allocated on the stack. +

+ +

+ At a setjmp call, the basic block is split and the setjmp + removed. The calls in a function that have a setjmp are converted to + invoke where the except part checks to see if it's a longjmp + exception and, if so, if it's handled in the function. If it is, then it gets + the value returned by the longjmp and goes to where the basic block + was split. invoke instructions are handled in a similar fashion with + the original except block being executed if it isn't a longjmp + except that is handled by that function. +

- Lower Set Jump + Lower SwitchInst's to branches
-

Yet to be written.

+

+ Rewrites switch instructions with a sequence of branches, which + allows targets to get away with not implementing the switch instruction until + it is convenient. +

- Lower SwitchInst's to branches + Promote Memory to Register
-

Yet to be written.

+

+ This file promotes memory references to be register references. It promotes + alloca instructions which only have loads and + stores as uses. An alloca is transformed by using dominator + frontiers to place phi nodes, then traversing the function in + depth-first order to rewrite loads and stores as + appropriate. This is just the standard SSA construction algorithm to construct + "pruned" SSA form. +

- Promote Memory to Register + Optimize use of memcpy and friend
-

Yet to be written.

+

+ This pass performs various transformations related to eliminating memcpy + calls, or transforming sets of stores into memset's. +

@@ -897,7 +1570,10 @@ if (i == j) Unify function exit nodes
-

Yet to be written.

+

+ Ensure that functions have at most one ret instruction in them. + Additionally, it keeps track of which node is the new exit node of the CFG. +

@@ -905,7 +1581,21 @@ if (i == j) Predicate Simplifier
-

Yet to be written.

+

+ Path-sensitive optimizer. In a branch where x == y, replace uses of + x with y. Permits further optimization, such as the + elimination of the unreachable call: +

+ +
void test(int *p, int *q)
+{
+  if (p != q)
+    return;
+
+  if (*p != *q)
+    foo(); // unreachable
+}
@@ -913,7 +1603,12 @@ if (i == j) Remove unused exception handling info
-

Yet to be written.

+

+ This file implements a simple interprocedural pass which walks the call-graph, + turning invoke instructions into call instructions if and + only if the callee cannot throw an exception. It implements this as a + bottom-up traversal of the call-graph. +

@@ -921,7 +1616,10 @@ if (i == j) Raise allocations from calls to instructions
-

Yet to be written.

+

+ Converts @malloc and @free calls to malloc and + free instructions. +

@@ -929,7 +1627,22 @@ if (i == j) Reassociate expressions
-

Yet to be written.

+

+ This pass reassociates commutative expressions in an order that is designed + to promote better constant propagation, GCSE, LICM, PRE, etc. +

+ +

+ For example: 4 + (x + 5) ⇒ x + (4 + 5) +

+ +

+ In the implementation of this algorithm, constants are assigned rank = 0, + function arguments are rank = 1, and other values are assigned ranks + corresponding to the reverse post order traversal of current function + (starting at 2), which effectively gives values in deep loops higher rank + than values not in loops. +

@@ -937,7 +1650,16 @@ if (i == j) Demote all values to stack slots
-

Yet to be written.

+

+ This file demotes all registers to memory references. It is intented to be + the inverse of -mem2reg. By converting to + load instructions, the only values live accross basic blocks are + alloca instructions and load instructions before + phi nodes. It is intended that this should make CFG hacking much + easier. To make later hacking easier, the entry block is split into two, such + that all introduced alloca instructions (and nothing else) are in the + entry block. +

@@ -945,7 +1667,21 @@ if (i == j) Scalar Replacement of Aggregates
-

Yet to be written.

+

+ The well-known scalar replacement of aggregates transformation. This + transform breaks up alloca instructions of aggregate type (structure + or array) into individual alloca instructions for each member if + possible. Then, if possible, it transforms the individual alloca + instructions into nice clean scalar SSA form. +

+ +

+ This combines a simple scalar replacement of aggregates algorithm with the mem2reg algorithm because often interact, + especially for C++ programs. As such, iterating between scalarrepl, + then mem2reg until we run out of things to + promote works well. +

@@ -953,7 +1689,22 @@ if (i == j) Sparse Conditional Constant Propagation
-

Yet to be written.

+

+ Sparse conditional constant propagation and merging, which can be summarized + as: +

+ +
    +
  1. Assumes values are constant unless proven otherwise
  2. +
  3. Assumes BasicBlocks are dead unless proven otherwise
  4. +
  5. Proves values to be constant, and replaces them with constants
  6. +
  7. Proves conditional branches to be unconditional
  8. +
+ +

+ Note that this pass has a habit of making definitions be dead. It is a good + idea to to run a DCE pass sometime after running this pass. +

@@ -961,7 +1712,12 @@ if (i == j) Simplify well-known library calls
-

Yet to be written.

+

+ Applies a variety of small optimizations for calls to specific well-known + function calls (e.g. runtime library functions). For example, a call + exit(3) that occurs within the main() function can be + transformed into simply return 3. +

@@ -969,7 +1725,18 @@ if (i == j) Simplify the CFG
-

Yet to be written.

+

+ Performs dead code elimination and basic block merging. Specifically: +

+ +
    +
  1. Removes basic blocks with no predecessors.
  2. +
  3. Merges a basic block into its predecessor if there is only one and the + predecessor only has one successor.
  4. +
  5. Eliminates PHI nodes for basic blocks with a single predecessor.
  6. +
  7. Eliminates a basic block that only contains an unconditional + branch.
  8. +
@@ -977,7 +1744,57 @@ if (i == j) Strip all symbols from a module
-

Yet to be written.

+

+ Performs code stripping. This transformation can delete: +

+ +
    +
  1. names for virtual registers
  2. +
  3. symbols for internal globals and functions
  4. +
  5. debug information
  6. +
+ +

+ Note that this transformation makes code much less readable, so it should + only be used in situations where the strip utility would be used, + such as reducing code size or making it harder to reverse engineer code. +

+
+ + +
+ Remove unused function declarations +
+
+

+ This pass loops over all of the functions in the input module, looking for + dead declarations and removes them. Dead declarations are declarations of + functions for which no implementation is available (i.e., declarations for + unused library functions). +

+
+ + +
+ Promote sret arguments +
+
+

+ This pass finds functions that return a struct (using a pointer to the struct + as the first argument of the function, marked with the 'sret' attribute) and + replaces them with a new function that simply returns each of the elements of + that struct (using multiple return values). +

+ +

+ This pass works under a number of conditions: +

+ +
@@ -985,7 +1802,31 @@ if (i == j) Tail Call Elimination
-

Yet to be written.

+

+ This file transforms calls of the current function (self recursion) followed + by a return instruction with a branch to the entry of the function, creating + a loop. This pass also implements the following extensions to the basic + algorithm: +

+ +
@@ -993,7 +1834,13 @@ if (i == j) Tail Duplication
-

Yet to be written.

+

+ This pass performs a limited form of tail duplication, intended to simplify + CFGs by removing some unconditional branches. This pass is necessary to + straighten out loops created by the C front-end, but also is capable of + making other code nicer. After this pass is run, the CFG simplify pass + should be run to clean up the mess. +

@@ -1007,7 +1854,10 @@ if (i == j) Dead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)
-

Yet to be written.

+

+ Same as dead argument elimination, but deletes arguments to functions which + are external. This is only for use by bugpoint.

@@ -1015,15 +1865,25 @@ if (i == j) Extract Basic Blocks From Module (for bugpoint use)
-

Yet to be written.

+

+ This pass is used by bugpoint to extract all blocks from the module into their + own functions.

- Bitcode Writer + Preliminary module verification
-

Yet to be written.

+

+ Ensures that the module is in the form required by the Module Verifier pass. +

+ +

+ Running the verifier runs this pass automatically, so there should be no need + to use it directly. +

@@ -1031,7 +1891,50 @@ if (i == j) Module Verifier
-

Yet to be written.

+

+ Verifies an LLVM IR code. This is useful to run after an optimization which is + undergoing testing. Note that llvm-as verifies its input before + emitting bitcode, and also that malformed bitcode is likely to make LLVM + crash. All language front-ends are therefore encouraged to verify their output + before performing optimizing transformations. +

+ + + +

+ Note that this does not provide full security verification (like Java), but + instead just tries to ensure that code is well-formed. +

@@ -1039,7 +1942,9 @@ if (i == j) View CFG of function
-

Yet to be written.

+

+ Displays the control flow graph using the GraphViz tool. +

@@ -1047,7 +1952,10 @@ if (i == j) View CFG of function (with no function bodies)
-

Yet to be written.

+

+ Displays the control flow graph using the GraphViz tool, but omitting function + bodies. +

@@ -1055,9 +1963,9 @@ if (i == j)
Valid CSS! + src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"> Valid HTML 4.01! + src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"> Reid Spencer
LLVM Compiler Infrastructure