From 49126402532be5fd1c21fe6cc189e1fece4cb6c5 Mon Sep 17 00:00:00 2001 From: Stepan Dyatkovskiy Date: Wed, 10 Dec 2014 17:42:01 +0000 Subject: [PATCH] Added documentation for MergeFunctions pass: Pass looks for equivalent functions that are mergable and folds them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223931 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/MergeFunctions.rst | 802 ++++++++++++++++++++++++++++++++++++++++ docs/Passes.rst | 25 +- docs/index.rst | 4 + 3 files changed, 822 insertions(+), 9 deletions(-) create mode 100644 docs/MergeFunctions.rst diff --git a/docs/MergeFunctions.rst b/docs/MergeFunctions.rst new file mode 100644 index 00000000000..6b8012e4b0c --- /dev/null +++ b/docs/MergeFunctions.rst @@ -0,0 +1,802 @@ +================================= +MergeFunctions pass, how it works +================================= + +.. contents:: + :local: + +Introduction +============ +Sometimes code contains equal functions, or functions that does exactly the same +thing even though they are non-equal on the IR level (e.g.: multiplication on 2 +and 'shl 1'). It could happen due to several reasons: mainly, the usage of +templates and automatic code generators. Though, sometimes user itself could +write the same thing twice :-) + +The main purpose of this pass is to recognize such functions and merge them. + +Why would I want to read this document? +--------------------------------------- +Document is the extension to pass comments and describes the pass logic. It +describes algorithm that is used in order to compare functions, it also +explains how we could combine equal functions correctly, keeping module valid. + +Material is brought in top-down form, so reader could start learn pass from +ideas and end up with low-level algorithm details, thus preparing him for +reading the sources. + +So main goal is do describe algorithm and logic here; the concept. This document +is good for you, if you *don't want* to read the source code, but want to +understand pass algorithms. Author tried not to repeat the source-code and +cover only common cases, and thus avoid cases when after minor code changes we +need to update this document. + + +What should I know to be able to follow along with this document? +----------------------------------------------------------------- + +Reader should be familiar with common compile-engineering principles and LLVM +code fundamentals. In this article we suppose reader is familiar with +`Single Static Assingment `_ +concepts. Understanding of +`IR structure `_ is +also important. + +We will use such terms as +"`module `_", +"`function `_", +"`basic block `_", +"`user `_", +"`value `_", +"`instruction `_". + +As a good start point, Kaleidoscope tutorial could be used: + +:doc:`tutorial/index` + +Especially it's important to understand chapter 3 of tutorial: + +:doc:`tutorial/LangImpl3` + +Reader also should know how passes work in LLVM, he could use next article as a +reference and start point here: + +:doc:`WritingAnLLVMPass` + +What else? Well perhaps reader also should have some experience in LLVM pass +debugging and bug-fixing. + +What I gain by reading this document? +------------------------------------- +Main purpose is to provide reader with comfortable form of algorithms +description, namely the human reading text. Since it could be hard to +understand algorithm straight from the source code: pass uses some principles +that have to be explained first. + +Author wishes to everybody to avoid case, when you read code from top to bottom +again and again, and yet you don't understand why we implemented it that way. + +We hope that after this article reader could easily debug and improve +MergeFunctions pass and thus help LLVM project. + +Narrative structure +------------------- +Article consists of three parts. First part explains pass functionality on the +top-level. Second part describes the comparison procedure itself. The third +part describes the merging process. + +In every part author also tried to put the contents into the top-down form. +First, the top-level methods will be described, while the terminal ones will be +at the end, in the tail of each part. If reader will see the reference to the +method that wasn't described yet, he will find its description a bit below. + +Basics +====== + +How to do it? +------------- +Do we need to merge functions? Obvious thing is: yes that's a quite possible +case, since usually we *do* have duplicates. And it would be good to get rid of +them. But how to detect such a duplicates? The idea is next: we split functions +onto small bricks (parts), then we compare "bricks" amount, and if it equal, +compare "bricks" themselves, and then do our conclusions about functions +themselves. + +What the difference it could be? For example, on machine with 64-bit pointers +(let's assume we have only one address space), one function stores 64-bit +integer, while another one stores a pointer. So if the target is a machine +mentioned above, and if functions are identical, except the parameter type (we +could consider it as a part of function type), then we can treat ``uint64_t`` +and``void*`` as equal. + +It was just an example; possible details are described a bit below. + +As another example reader may imagine two more functions. First function +performs multiplication on 2, while the second one performs arithmetic right +shift on 1. + +Possible solutions +^^^^^^^^^^^^^^^^^^ +Let's briefly consider possible options about how and what we have to implement +in order to create full-featured functions merging, and also what it would +meant for us. + +Equal functions detection, obviously supposes "detector" method to be +implemented, latter should answer the question "whether functions are equal". +This "detector" method consists of tiny "sub-detectors", each of them answers +exactly the same question, but for function parts. + +As the second step, we should merge equal functions. So it should be a "merger" +method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2* +function, the result of merging. + +Having such a routines in our hands, we can process whole module, and merge all +equal functions. + +In this case, we have to compare every function with every another function. As +reader could notice, this way seems to be quite expensive. Of course we could +introduce hashing and other helpers, but it is still just an optimization, and +thus the level of O(N*N) complexity. + +Can we reach another level? Could we introduce logarithmical search, or random +access lookup? The answer is: "yes". + +Random-access +""""""""""""" +How it could be done? Just convert each function to number, and gather all of +them in special hash-table. Functions with equal hash are equal. Good hashing +means, that every function part must be taken into account. That means we have +to convert every function part into some number, and then add it into hash. +Lookup-up time would be small, but such approach adds some delay due to hashing +routine. + +Logarithmical search +"""""""""""""""""""" +We could introduce total ordering among the functions set, once we had it we +could then implement a logarithmical search. Lookup time still depends on N, +but adds a little of delay (*log(N)*). + +Present state +""""""""""""" +Both of approaches (random-access and logarithmical) has been implemented and +tested. And both of them gave a very good improvement. And what was most +surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs +some extra CPU time, and it is the main reason why it works slower; in most of +cases total "hashing" time was greater than total "logarithmical-search" time. + +So, preference has been granted to the "logarithmical search". + +Though in the case of need, *logarithmical-search* (read "total-ordering") could +be used as a milestone on our way to the *random-access* implementation. + +Every comparison is based either on the numbers or on flags comparison. In +*random-access* approach we could use the same comparison algorithm. During +comparison we exit once we find the difference, but here we might have to scan +whole function body every time (note, it could be slower). Like in +"total-ordering", we will track every numbers and flags, but instead of +comparison, we should get numbers sequence and then create the hash number. So, +once again, *total-ordering* could be considered as a milestone for even faster +(in theory) random-access approach. + +MergeFunctions, main fields and runOnModule +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +There are two most important fields in class: + +``FnTree`` – the set of all unique functions. It keeps items that couldn't be +merged with each other. It is defined as: + +``std::set FnTree;`` + +Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with +implemented “<” operator among the functions set (below we explain how it works +exactly; this is a key point in fast functions comparison). + +``Deferred`` – merging process can affect bodies of functions that are in +``FnTree`` already. Obviously such functions should be rechecked again. In this +case we remove them from ``FnTree``, and mark them as to be rescanned, namely +put them into ``Deferred`` list. + +runOnModule +""""""""""" +The algorithm is pretty simple: + +1. Put all module's functions into the *worklist*. + +2. Scan *worklist*'s functions twice: first enumerate only strong functions and +then only weak ones: + + 2.1. Loop body: take function from *worklist* (call it *FCur*) and try to + insert it into *FnTree*: check whether *FCur* is equal to one of functions + in *FnTree*. If there *is* equal function in *FnTree* (call it *FExists*): + merge function *FCur* with *FExists*. Otherwise add function from *worklist* + to *FnTree*. + +3. Once *worklist* scanning and merging operations is complete, check *Deferred* +list. If it is not empty: refill *worklist* contents with *Deferred* list and +do step 2 again, if *Deferred* is empty, then exit from method. + +Comparison and logarithmical search +""""""""""""""""""""""""""""""""""" +Let's recall our task: for every function *F* from module *M*, we have to find +equal functions *F`* in shortest time, and merge them into the single function. + +Defining total ordering among the functions set allows to organize functions +into the binary tree. The lookup procedure complexity would be estimated as +O(log(N)) in this case. But how to define *total-ordering*? + +We have to introduce a single rule applicable to every pair of functions, and +following this rule then evaluate which of them is greater. What kind of rule +it could be? Let's declare it as "compare" method, that returns one of 3 +possible values: + +-1, left is *less* than right, + +0, left and right are *equal*, + +1, left is *greater* than right. + +Of course it means, that we have to maintain +*strict and non-strict order relation properties*: + +* reflexivity (``a <= a``, ``a == a``, ``a >= a``), +* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``), +* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``) +* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``). + +As it was mentioned before, comparison routine consists of +"sub-comparison-routines", each of them also consists +"sub-comparison-routines", and so on, finally it ends up with a primitives +comparison. + +Below, we will use the next operations: + +#. ``cmpNumbers(number1, number2)`` is method that returns -1 if left is less + than right; 0, if left and right are equal; and 1 otherwise. + +#. ``cmpFlags(flag1, flag2)`` is hypothetical method that compares two flags. + The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and + ``false`` is 0. + +The rest of article is based on *MergeFunctions.cpp* source code +(*/lib/Transforms/IPO/MergeFunctions.cpp*). We would like to ask +reader to keep this file open nearby, so we could use it as a reference for +further explanations. + +Now we're ready to proceed to the next chapter and see how it works. + +Functions comparison +==================== +At first, let's define how exactly we compare complex objects. + +Complex objects comparison (function, basic-block, etc) is mostly based on its +sub-objects comparison results. So it is similar to the next "tree" objects +comparison: + +#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have + two sequences as a product: "*T1Items*" and "*T2Items*". + +#. Then compare chains "*T1Items*" and "*T2Items*" in + most-significant-item-first order. Result of items comparison would be the + result of *T1* and *T2* comparison itself. + +FunctionComparator::compare(void) +--------------------------------- +Brief look at the source code tells us, that comparison starts in +“``int FunctionComparator::compare(void)``” method. + +1. First parts to be compared are function's attributes and some properties that +outsides “attributes” term, but still could make function different without +changing its body. This part of comparison is usually done within simple +*cmpNumbers* or *cmpFlags* operations (e.g. +``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is full list of function's +properties to be compared on this stage: + + * *Attributes* (those are returned by ``Function::getAttributes()`` + method). + + * *GC*, for equivalence, *RHS* and *LHS* should be both either without + *GC* or with the same one. + + * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the + same section. + + * *Variable arguments*. *LHS* and *RHS* should be both either with or + without *var-args*. + + * *Calling convention* should be the same. + +2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)`` +method. It checks return type and parameters type; the method itself will be +described later. + +3. Associate function formal parameters with each other. Then comparing function +bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then, +we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s +body, otherwise functions are different. On this stage we grant the preference +to those we met later in function body (value we met first would be *less*). +This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``” +method (will be described a bit later). + +4. Function body comparison. As it written in method comments: + +“We do a CFG-ordered walk since the actual ordering of the blocks in the linked +list is immaterial. Our walk starts at the entry block for both functions, then +takes each block from each terminator in order. As an artifact, this also means +that unreachable blocks are ignored.” + +So, using this walk we get BBs from *left* and *right* in the same order, and +compare them by “``FunctionComparator::compare(const BasicBlock*, const +BasicBlock*)``” method. + +We also associate BBs with each other, like we did it with function formal +arguments (see ``cmpValues`` method below). + +FunctionComparator::cmpType +--------------------------- +Consider how types comparison works. + +1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the +integer type. It could be done if its address space is 0, or if address spaces +are ignored at all. Do the same thing for the right type. + +2. If left and right types are equal, return 0. Otherwise we need to give +preference to one of them. So proceed to the next step. + +3. If types are of different kind (different type IDs). Return result of type +IDs comparison, treating them as a numbers (use ``cmpNumbers`` operation). + +4. If types are vectors or integers, return result of their pointers comparison, +comparing them as numbers. + +5. Check whether type ID belongs to the next group (call it equivalent-group): + + * Void + + * Float + + * Double + + * X86_FP80 + + * FP128 + + * PPC_FP128 + + * Label + + * Metadata. + + If ID belongs to group above, return 0. Since it's enough to see that + types has the same ``TypeID``. No additional information is required. + +6. Left and right are pointers. Return result of address space comparison +(numbers comparison). + +7. Complex types (structures, arrays, etc.). Follow complex objects comparison +technique (see the very first paragraph of this chapter). Both *left* and +*right* are to be expanded and their element types will be checked the same +way. If we get -1 or 1 on some stage, return it. Otherwise return 0. + +8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't +get any conclusions, then invoke ``llvm_unreachable``, since it's quite +unexpectable case. + +cmpValues(const Value*, const Value*) +------------------------------------- +Method that compares local values. + +This method gives us an answer on a very curious quesion: whether we could treat +local values as equal, and which value is greater otherwise. It's better to +start from example: + +Consider situation when we're looking at the same place in left function "*FL*" +and in right function "*FR*". And every part of *left* place is equal to the +corresponding part of *right* place, and (!) both parts use *Value* instances, +for example: + +.. code-block:: llvm + + instr0 i32 %LV ; left side, function FL + instr0 i32 %RV ; right side, function FR + +So, now our conclusion depends on *Value* instances comparison. + +Main purpose of this method is to determine relation between such values. + +What we expect from equal functions? At the same place, in functions "*FL*" and +"*FR*" we expect to see *equal* values, or values *defined* at the same place +in "*FL*" and "*FR*". + +Consider small example here: + +.. code-block:: llvm + + define void %f(i32 %pf0, i32 %pf1) { + instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123 + } + +.. code-block:: llvm + + define void %g(i32 %pg0, i32 %pg1) { + instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123 + } + +In this example, *pf0* is associated with *pg0*, *pf1* is associated with *pg1*, +and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*. + +Instructions with opcode "*instr0*" would be *equal*, since their types and +opcodes are equal, and values are *associated*. + +Instruction with opcode "*instr1*" from *f* is *greater* than instruction with +opcode "*instr1*" from *g*; here we have equal types and opcodes, but "*pf1* is +greater than "*pg0*". + +And instructions with opcode "*instr2*" are equal, because their opcodes and +types are equal, and the same constant is used as a value. + +What we assiciate in cmpValues? +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +* Function arguments. *i*-th argument from left function associated with + *i*-th argument from right function. +* BasicBlock instances. In basic-block enumeration loop we associate *i*-th + BasicBlock from the left function with *i*-th BasicBlock from the right + function. +* Instructions. +* Instruction operands. Note, we can meet *Value* here we have never seen + before. In this case it is not a function argument, nor *BasicBlock*, nor + *Instruction*. It is global value. It is constant, since its the only + supposed global here. Method also compares: +* Constants that are of the same type. +* If right constant could be losslessly bit-casted to the left one, then we + also compare them. + +How to implement cmpValues? +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +*Association* is a case of equality for us. We just treat such values as equal. +But, in general, we need to implement antisymmetric relation. As it was +mentioned above, to understand what is *less*, we can use order in which we +meet values. If both of values has the same order in function (met at the same +time), then treat values as *associated*. Otherwise – it depends on who was +first. + +Every time we run top-level compare method, we initialize two identical maps +(one for the left side, another one for the right side): + +``map sn_mapL, sn_mapR;`` + +The key of the map is the *Value* itself, the *value* – is its order (call it +*serial number*). + +To add value *V* we need to perform the next procedure: + +``sn_map.insert(std::make_pair(V, sn_map.size()));`` + +For the first *Value*, map will return *0*, for second *Value* map will return +*1*, and so on. + +Then we can check whether left and right values met at the same time with simple +comparison: + +``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);`` + +Of course, we can combine insertion and comparison: + +.. code-block:: c++ + + std::pair + LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes + = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); + return cmpNumbers(LeftRes.first->second, RightRes.first->second); + +Let's look, how whole method could be implemented. + +1. we have to start from the bad news. Consider function self and +cross-referencing cases: + +.. code-block:: c++ + + // self-reference unsigned fact0(unsigned n) { return n > 1 ? n + * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n * + fact1(n-1) : 1; } + + // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0; + } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; } + +.. + + This comparison has been implemented in initial *MergeFunctions* pass + version. But, unfortunately, it is not transitive. And this is the only case + we can't convert to less-equal-greater comparison. It is a seldom case, 4-5 + functions of 10000 (checked on test-suite), and, we hope, reader would + forgive us for such a sacrifice in order to get the O(log(N)) pass time. + +2. If left/right *Value* is a constant, we have to compare them. Return 0 if it +is the same constant, or use ``cmpConstants`` method otherwise. + +3. If left/right is *InlineAsm* instance. Return result of *Value* pointers +comparison. + +4. Explicit association of *L* (left value) and *R* (right value). We need to +find out whether values met at the same time, and thus are *associated*. Or we +need to put the rule: when we treat *L* < *R*. Now it is easy: just return +result of numbers comparison: + +.. code-block:: c++ + + std::pair + LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), + RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); + if (LeftRes.first->second == RightRes.first->second) return 0; + if (LeftRes.first->second < RightRes.first->second) return -1; + return 1; + +Now when *cmpValues* returns 0, we can proceed comparison procedure. Otherwise, +if we get (-1 or 1), we need to pass this result to the top level, and finish +comparison procedure. + +cmpConstants +------------ +Performs constants comparison as follows: + +1. Compare constant types using ``cmpType`` method. If result is -1 or 1, goto +step 2, otherwise proceed to step 3. + +2. If types are different, we still can check whether constants could be +losslessly bitcasted to each other. The further explanation is modification of +``canLosslesslyBitCastTo`` method. + + 2.1 Check whether constants are of the first class types + (``isFirstClassType`` check): + + 2.1.1. If both constants are *not* of the first class type: return result + of ``cmpType``. + + 2.1.2. Otherwise, if left type is not of the first class, return -1. If + right type is not of the first class, return 1. + + 2.1.3. If both types are of the first class type, proceed to the next step + (2.1.3.1). + + 2.1.3.1. If types are vectors, compare their bitwidth using the + *cmpNumbers*. If result is not 0, return it. + + 2.1.3.2. Different types, but not a vectors: + + * if both of them are pointers, good for us, we can proceed to step 3. + * if one of types is pointer, return result of *isPointer* flags + comparison (*cmpFlags* operation). + * otherwise we have no methods to prove bitcastability, and thus return + result of types comparison (-1 or 1). + +Steps below are for the case when types are equal, or case when constants are +bitcastable: + +3. One of constants is a "*null*" value. Return the result of +``cmpFlags(L->isNullValue, R->isNullValue)`` comparison. + +4. Compare value IDs, and return result if it is not 0: + +.. code-block:: c++ + + if (int Res = cmpNumbers(L->getValueID(), R->getValueID())) + return Res; + +5. Compare the contents of constants. The comparison depends on kind of +constants, but on this stage it is just a lexicographical comparison. Just see +how it was described in the beginning of "*Functions comparison*" paragraph. +Mathematically it is equal to the next case: we encode left constant and right +constant (with similar way *bitcode-writer* does). Then compare left code +sequence and right code sequence. + +compare(const BasicBlock*, const BasicBlock*) +--------------------------------------------- +Compares two *BasicBlock* instances. + +It enumerates instructions from left *BB* and right *BB*. + +1. It assigns serial numbers to the left and right instructions, using +``cmpValues`` method. + +2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as +greater than other instructions, if both instructions are *GEPs* use ``cmpGEP`` +method for comparison. If result is -1 or 1, pass it to the top-level +comparison (return it). + + 3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or + 1, return it. + + 3.2. Compare number of operands, if result is -1 or 1, return it. + + 3.3. Compare operands themselves, use ``cmpValues`` method. Return result + if it is -1 or 1. + + 3.4. Compare type of operands, using ``cmpType`` method. Return result if + it is -1 or 1. + + 3.5. Proceed to the next instruction. + +4. We can finish instruction enumeration in 3 cases: + + 4.1. We reached the end of both left and right basic-blocks. We didn't + exit on steps 1-3, so contents is equal, return 0. + + 4.2. We have reached the end of the left basic-block. Return -1. + + 4.3. Return 1 (the end of the right basic block). + +cmpGEP +------ +Compares two GEPs (``getelementptr`` instructions). + +It differs from regular operations comparison with the only thing: possibility +to use ``accumulateConstantOffset`` method. + +So, if we get constant offset for both left and right *GEPs*, then compare it as +numbers, and return comparison result. + +Otherwise treat it like a regular operation (see previous paragraph). + +cmpOperation +------------ +Compares instruction opcodes and some important operation properties. + +1. Compare opcodes, if it differs return the result. + +2. Compare number of operands. If it differs – return the result. + +3. Compare operation types, use *cmpType*. All the same – if types are +different, return result. + +4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData`` +method, and compare it like a numbers. + +5. Compare operand types. + +6. For some particular instructions check equivalence (relation in our case) of +some significant attributes. For example we have to compare alignment for +``load`` instructions. + +O(log(N)) +--------- +Methods described above implement order relationship. And latter, could be used +for nodes comparison in a binary tree. So we can organize functions set into +the binary tree and reduce the cost of lookup procedure from +O(N*N) to O(log(N)). + +Merging process, mergeTwoFunctions +================================== +Once *MergeFunctions* detected that current function (*G*) is equal to one that +were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*, +Function*)``. + +Operation affects ``FnTree`` contents with next way: *F* will stay in +``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of +*G* would be replaced with something else. It changes bodies of callers. So, +functions that calls *G* would be put into ``Deferred`` set and removed from +``FnTree``, and analyzed again. + +The approach is next: + +1. Most wished case: when we can use alias and both of *F* and *G* are weak. We +make both of them with aliases to the third strong function *H*. Actually *H* +is *F*. See below how it's made (but it's better to look straight into the +source code). Well, this is a case when we can just replace *G* with *F* +everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*). + +2. *F* could not be overridden, while *G* could. It would be good to do the +next: after merging the places where overridable function were used, still use +overridable stub. So try to make *G* alias to *F*, or create overridable tail +call wrapper around *F* and replace *G* with that call. + +3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just +change the callers: call *F* instead of *G*. That's what +``replaceDirectCallers`` does. + +Below is detailed body description. + +If “F” may be overridden +------------------------ +As follows from ``mayBeOverridden`` comments: “whether the definition of this +global may be replaced by something non-equivalent at link time”. If so, thats +ok: we can use alias to *F* instead of *G* or change call instructions itself. + +HasGlobalAliases, removeUsers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +First consider the case when we have global aliases of one function name to +another. Our purpose is make both of them with aliases to the third strong +function. Though if we keep *F* alive and without major changes we can leave it +in ``FnTree``. Try to combine these two goals. + +Do stub replacement of *F* itself with an alias to *F*. + +1. Create stub function *H*, with the same name and attributes like function +*F*. It takes maximum alignment of *F* and *G*. + +2. Replace all uses of function *F* with uses of function *H*. It is the two +steps procedure instead. First of all, we must take into account, all functions +from whom *F* is called would be changed: since we change the call argument +(from *F* to *H*). If so we must to review these caller functions again after +this procedure. We remove callers from ``FnTree``, method with name +``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``): + + 2.1. ``Inside removeUsers(Value* + V)`` we go through the all values that use value *V* (or *F* in our context). + If value is instruction, we go to function that holds this instruction and + mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove + caller from ``FnTree``. + + 2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``. + +3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*. +Do the same with *G*: replace it with alias to *F*. So finally everywhere *F* +was used, we use *H* and it is alias to *F*, and everywhere *G* was used we +also have alias to *F*. + +4. Set *F* linkage to private. Make it strong :-) + +No global aliases, replaceDirectCallers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If global aliases are not supported. We call ``replaceDirectCallers`` then. Just +go through all calls of *G* and replace it with calls of *F*. If you look into +method you will see that it scans all uses of *G* too, and if use is callee (if +user is call instruction and *G* is used as what to be called), we replace it +with use of *F*. + +If “F” could not be overridden, fix it! +""""""""""""""""""""""""""""""""""""""" + +We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace +*G* with alias to *F* first. Next conditions are essential: + +* target should support global aliases, +* the address itself of *G* should be not significant, not named and not + referenced anywhere, +* function should come with external, local or weak linkage. + +Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*, +so *G* could be replaced with this wrapper. + +*writeAlias* + +As follows from *llvm* reference: + +“Aliases act as *second name* for the aliasee value”. So we just want to create +second name for *F* and use it instead of *G*: + +1. create global alias itself (*GA*), + +2. adjust alignment of *F* so it must be maximum of current and *G's* alignment; + +3. replace uses of *G*: + + 3.1. first mark all callers of *G* as to-be-analyzed-again, using + ``removeUsers`` method (see chapter above), + + 3.2. call ``G->replaceAllUsesWith(GA)``. + +4. Get rid of *G*. + +*writeThunk* + +As it written in method comments: + +“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G +with bitcast(F). Deletes G.” + +In general it does the same as usual when we want to replace callee, except the +first point: + +1. We generate tail call wrapper around *F*, but with interface that allows use +it instead of *G*. + +2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then. + +3. Get rid of *G*. + +That's it. +========== +We have described how to detect equal functions, and how to merge them, and in +first chapter we have described how it works all-together. Author hopes, reader +have some picture from now, and it helps him improve and debug ­this pass. + +Reader is welcomed to send us any questions and proposals ;-) diff --git a/docs/Passes.rst b/docs/Passes.rst index 9f4009297ac..3f9534182c7 100644 --- a/docs/Passes.rst +++ b/docs/Passes.rst @@ -891,17 +891,24 @@ calls, or transforming sets of stores into ``memset``\ s. This pass looks for equivalent functions that are mergable and folds them. -A hash is computed from the function, based on its type and number of basic -blocks. +Total-ordering is introduced among the functions set: we define comparison +that answers for every two functions which of them is greater. It allows to +arrange functions into the binary tree. -Once all hashes are computed, we perform an expensive equality comparison on -each function pair. This takes n^2/2 comparisons per bucket, so it's important -that the hash function be high quality. The equality comparison iterates -through each instruction in each basic block. +For every new function we check for equivalent in tree. -When a match is found the functions are folded. If both functions are -overridable, we move the functionality into a new internal function and leave -two overridable thunks to it. +If equivalent exists we fold such functions. If both functions are overridable, +we move the functionality into a new internal function and leave two +overridable thunks to it. + +If there is no equivalent, then we add this function to tree. + +Lookup routine has O(log(n)) complexity, while whole merging process has +complexity of O(n*log(n)). + +Read +:doc:`this ` +article for more details. ``-mergereturn``: Unify function exit nodes ------------------------------------------- diff --git a/docs/index.rst b/docs/index.rst index f7a78f28497..bdf788f7b28 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -241,6 +241,7 @@ For API clients and LLVM developers. BigEndianNEON CoverageMappingFormat Statepoints + MergeFunctions :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -337,6 +338,9 @@ For API clients and LLVM developers. This describes a set of experimental extensions for garbage collection support. +:doc:`MergeFunctions` + Describes functions merging optimization. + Development Process Documentation ================================= -- 2.34.1