From 49126402532be5fd1c21fe6cc189e1fece4cb6c5 Mon Sep 17 00:00:00 2001
From: Stepan Dyatkovskiy <stpworld@narod.ru>
Date: Wed, 10 Dec 2014 17:42:01 +0000
Subject: [PATCH] Added documentation for MergeFunctions pass: Pass looks for
 equivalent functions that are mergable and folds them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223931 91177308-0d34-0410-b5e6-96231b3b80d8
---
 docs/MergeFunctions.rst | 802 ++++++++++++++++++++++++++++++++++++++++
 docs/Passes.rst         |  25 +-
 docs/index.rst          |   4 +
 3 files changed, 822 insertions(+), 9 deletions(-)
 create mode 100644 docs/MergeFunctions.rst

diff --git a/docs/MergeFunctions.rst b/docs/MergeFunctions.rst
new file mode 100644
index 00000000000..6b8012e4b0c
--- /dev/null
+++ b/docs/MergeFunctions.rst
@@ -0,0 +1,802 @@
+=================================
+MergeFunctions pass, how it works
+=================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+Sometimes code contains equal functions, or functions that does exactly the same
+thing even though they are non-equal on the IR level (e.g.: multiplication on 2
+and 'shl 1'). It could happen due to several reasons: mainly, the usage of
+templates and automatic code generators. Though, sometimes user itself could
+write the same thing twice :-)
+
+The main purpose of this pass is to recognize such functions and merge them.
+
+Why would I want to read this document?
+---------------------------------------
+Document is the extension to pass comments and describes the pass logic. It
+describes algorithm that is used in order to compare functions, it also
+explains how we could combine equal functions correctly, keeping module valid.
+
+Material is brought in top-down form, so reader could start learn pass from
+ideas and end up with low-level algorithm details, thus preparing him for
+reading the sources.
+
+So main goal is do describe algorithm and logic here; the concept. This document
+is good for you, if you *don't want* to read the source code, but want to
+understand pass algorithms. Author tried not to repeat the source-code and
+cover only common cases, and thus avoid cases when after minor code changes we
+need to update this document.
+
+
+What should I know to be able to follow along with this document?
+-----------------------------------------------------------------
+
+Reader should be familiar with common compile-engineering principles and LLVM
+code fundamentals. In this article we suppose reader is familiar with
+`Single Static Assingment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+concepts. Understanding of
+`IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_ is
+also important.
+
+We will use such terms as
+"`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_",
+"`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_",
+"`basic block <http://en.wikipedia.org/wiki/Basic_block>`_",
+"`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_",
+"`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_",
+"`instruction <http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_".
+
+As a good start point, Kaleidoscope tutorial could be used:
+
+:doc:`tutorial/index`
+
+Especially it's important to understand chapter 3 of tutorial:
+
+:doc:`tutorial/LangImpl3`
+
+Reader also should know how passes work in LLVM, he could use next article as a
+reference and start point here:
+
+:doc:`WritingAnLLVMPass`
+
+What else? Well perhaps reader also should have some experience in LLVM pass
+debugging and bug-fixing.
+
+What I gain by reading this document?
+-------------------------------------
+Main purpose is to provide reader with comfortable form of algorithms
+description, namely the human reading text. Since it could be hard to
+understand algorithm straight from the source code: pass uses some principles
+that have to be explained first.
+
+Author wishes to everybody to avoid case, when you read code from top to bottom
+again and again, and yet you don't understand why we implemented it that way.
+
+We hope that after this article reader could easily debug and improve
+MergeFunctions pass and thus help LLVM project.
+
+Narrative structure
+-------------------
+Article consists of three parts. First part explains pass functionality on the
+top-level. Second part describes the comparison procedure itself. The third
+part describes the merging process.
+
+In every part author also tried to put the contents into the top-down form.
+First, the top-level methods will be described, while the terminal ones will be
+at the end, in the tail of each part. If reader will see the reference to the
+method that wasn't described yet, he will find its description a bit below.
+
+Basics
+======
+
+How to do it?
+-------------
+Do we need to merge functions? Obvious thing is: yes that's a quite possible
+case, since usually we *do* have duplicates. And it would be good to get rid of
+them. But how to detect such a duplicates? The idea is next: we split functions
+onto small bricks (parts), then we compare "bricks" amount, and if it equal,
+compare "bricks" themselves, and then do our conclusions about functions
+themselves.
+
+What the difference it could be? For example, on machine with 64-bit pointers
+(let's assume we have only one address space),  one function stores 64-bit
+integer, while another one stores a pointer. So if the target is a machine
+mentioned above, and if functions are identical, except the parameter type (we
+could consider it as a part of function type), then we can treat ``uint64_t``
+and``void*`` as equal.
+
+It was just an example; possible details are described a bit below.
+
+As another example reader may imagine two more functions. First function
+performs multiplication on 2, while the second one performs arithmetic right
+shift on 1.
+
+Possible solutions
+^^^^^^^^^^^^^^^^^^
+Let's briefly consider possible options about how and what we have to implement
+in order to create full-featured functions merging, and also what it would
+meant for us.
+
+Equal functions detection, obviously supposes "detector" method to be
+implemented, latter should answer the question "whether functions are equal".
+This "detector" method consists of tiny "sub-detectors", each of them answers
+exactly the same question, but for function parts.
+
+As the second step, we should merge equal functions. So it should be a "merger"
+method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2*
+function, the result of merging.
+
+Having such a routines in our hands, we can process whole module, and merge all
+equal functions.
+
+In this case, we have to compare every function with every another function. As
+reader could notice, this way seems to be quite expensive. Of course we could
+introduce hashing and other helpers, but it is still just an optimization, and
+thus the level of O(N*N) complexity.
+
+Can we reach another level? Could we introduce logarithmical search, or random
+access lookup? The answer is: "yes".
+
+Random-access
+"""""""""""""
+How it could be done? Just convert each function to number, and gather all of
+them in special hash-table. Functions with equal hash are equal. Good hashing
+means, that every function part must be taken into account. That means we have
+to convert every function part into some number, and then add it into hash.
+Lookup-up time would be small, but such approach adds some delay due to hashing
+routine.
+
+Logarithmical search
+""""""""""""""""""""
+We could introduce total ordering among the functions set, once we had it we
+could then implement a logarithmical search. Lookup time still depends on N,
+but adds a little of delay (*log(N)*).
+
+Present state
+"""""""""""""
+Both of approaches (random-access and logarithmical) has been implemented and
+tested. And both of them gave a very good improvement. And what was most
+surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs
+some extra CPU time, and it is the main reason why it works slower; in most of
+cases total "hashing" time was greater than total "logarithmical-search" time.
+
+So, preference has been granted to the "logarithmical search".
+
+Though in the case of need, *logarithmical-search* (read "total-ordering") could
+be used as a milestone on our way to the *random-access* implementation.
+
+Every comparison is based either on the numbers or on flags comparison. In
+*random-access* approach we could use the same comparison algorithm. During
+comparison we exit once we find the difference, but here we might have to scan
+whole function body every time (note, it could be slower). Like in
+"total-ordering", we will track every numbers and flags, but instead of
+comparison, we should get numbers sequence and then create the hash number. So,
+once again, *total-ordering* could be considered as a milestone for even faster
+(in theory) random-access approach.
+
+MergeFunctions, main fields and runOnModule
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+There are two most important fields in class:
+
+``FnTree``  â the set of all unique functions. It keeps items that couldn't be
+merged with each other. It is defined as:
+
+``std::set<FunctionNode> FnTree;``
+
+Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with
+implemented â<â operator among the functions set (below we explain how it works
+exactly; this is a key point in fast functions comparison).
+
+``Deferred`` â merging process can affect bodies of functions that are in
+``FnTree`` already. Obviously such functions should be rechecked again. In this
+case we remove them from ``FnTree``, and mark them as to be rescanned, namely
+put them into ``Deferred`` list.
+
+runOnModule
+"""""""""""
+The algorithm is pretty simple:
+
+1. Put all module's functions into the *worklist*.
+
+2. Scan *worklist*'s functions twice: first enumerate only strong functions and
+then only weak ones:
+
+   2.1. Loop body: take function from *worklist*  (call it *FCur*) and try to
+   insert it into *FnTree*: check whether *FCur* is equal to one of functions
+   in *FnTree*. If there *is* equal function in *FnTree* (call it *FExists*):
+   merge function *FCur* with *FExists*. Otherwise add function from *worklist*
+   to *FnTree*.
+
+3. Once *worklist* scanning and merging operations is complete, check *Deferred*
+list. If it is not empty: refill *worklist* contents with *Deferred* list and
+do step 2 again, if *Deferred* is empty, then exit from method.
+
+Comparison and logarithmical search
+"""""""""""""""""""""""""""""""""""
+Let's recall our task: for every function *F* from module *M*, we have to find
+equal functions *F`* in shortest time, and merge them into the single function.
+
+Defining total ordering among the functions set allows to organize functions
+into the binary tree. The lookup procedure complexity would be estimated as
+O(log(N)) in this case. But how to define *total-ordering*?
+
+We have to introduce a single rule applicable to every pair of functions, and
+following this rule then evaluate which of them is greater. What kind of rule
+it could be? Let's declare it as "compare" method, that returns one of 3
+possible values:
+
+-1, left is *less* than right,
+
+0, left and right are *equal*,
+
+1, left is *greater* than right.
+
+Of course it means, that we have to maintain
+*strict and non-strict order relation properties*:
+
+* reflexivity (``a <= a``, ``a == a``, ``a >= a``),
+* antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``),
+* transitivity (``a <= b`` and ``b <= c``, then ``a <= c``)
+* asymmetry (if ``a < b``, then ``a > b`` or ``a == b``).
+
+As it was mentioned before, comparison routine consists of
+"sub-comparison-routines", each of them also consists
+"sub-comparison-routines", and so on, finally it ends up with a primitives
+comparison.
+
+Below, we will use the next operations:
+
+#. ``cmpNumbers(number1, number2)`` is method that returns -1 if left is less
+   than right; 0, if left and right are equal; and 1 otherwise.
+
+#. ``cmpFlags(flag1, flag2)`` is hypothetical method that compares two flags.
+   The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and
+   ``false`` is 0.
+
+The rest of article is based on *MergeFunctions.cpp* source code
+(*<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like to ask
+reader to keep this file open nearby, so we could use it as a reference for
+further explanations.
+
+Now we're ready to proceed to the next chapter and see how it works.
+
+Functions comparison
+====================
+At first, let's define how exactly we compare complex objects.
+
+Complex objects comparison (function, basic-block, etc) is mostly based on its
+sub-objects comparison results. So it is similar to the next "tree" objects
+comparison:
+
+#. For two trees *T1* and *T2* we perform *depth-first-traversal* and have
+   two sequences as a product: "*T1Items*" and "*T2Items*".
+
+#. Then compare chains "*T1Items*" and "*T2Items*" in
+   most-significant-item-first order. Result of items comparison would be the
+   result of *T1* and *T2* comparison itself.
+
+FunctionComparator::compare(void)
+---------------------------------
+Brief look at the source code tells us, that comparison starts in
+â``int FunctionComparator::compare(void)``â method.
+
+1. First parts to be compared are function's attributes and some properties that
+outsides âattributesâ term, but still could make function different without
+changing its body. This part of comparison is usually done within simple
+*cmpNumbers* or *cmpFlags* operations (e.g.
+``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is full list of function's
+properties to be compared on this stage:
+
+  * *Attributes* (those are returned by ``Function::getAttributes()``
+    method).
+
+  * *GC*, for equivalence, *RHS* and *LHS* should be both either without
+    *GC* or with the same one.
+
+  * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the
+    same section.
+
+  * *Variable arguments*. *LHS* and *RHS* should be both either with or
+    without *var-args*.
+
+  * *Calling convention* should be the same.
+
+2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)``
+method. It checks return type and parameters type; the method itself will be
+described later.
+
+3. Associate function formal parameters with each other. Then comparing function
+bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then,
+we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s
+body, otherwise functions are different. On this stage we grant the preference
+to those we met later in function body (value we met first would be *less*).
+This is done by â``FunctionComparator::cmpValues(const Value*, const Value*)``â
+method (will be described a bit later).
+
+4. Function body comparison. As it written in method comments:
+
+âWe do a CFG-ordered walk since the actual ordering of the blocks in the linked
+list is immaterial. Our walk starts at the entry block for both functions, then
+takes each block from each terminator in order. As an artifact, this also means
+that unreachable blocks are ignored.â
+
+So, using this walk we get BBs from *left* and *right* in the same order, and
+compare them by â``FunctionComparator::compare(const BasicBlock*, const
+BasicBlock*)``â method.
+
+We also associate BBs with each other, like we did it with function formal
+arguments (see ``cmpValues`` method below).
+
+FunctionComparator::cmpType
+---------------------------
+Consider how types comparison works.
+
+1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the
+integer type. It could be done if its address space is 0, or if address spaces
+are ignored at all. Do the same thing for the right type.
+
+2. If left and right types are equal, return 0. Otherwise we need to give
+preference to one of them. So proceed to the next step.
+
+3. If types are of different kind (different type IDs). Return result of type
+IDs comparison, treating them as a numbers (use ``cmpNumbers`` operation).
+
+4. If types are vectors or integers, return result of their pointers comparison,
+comparing them as numbers.
+
+5. Check whether type ID belongs to the next group (call it equivalent-group):
+
+   * Void
+
+   * Float
+
+   * Double
+
+   * X86_FP80
+
+   * FP128
+
+   * PPC_FP128
+
+   * Label
+
+   * Metadata.
+
+   If ID belongs to group above, return 0. Since it's enough to see that
+   types has the same ``TypeID``. No additional information is required.
+
+6. Left and right are pointers. Return result of address space comparison
+(numbers comparison).
+
+7. Complex types (structures, arrays, etc.). Follow complex objects comparison
+technique (see the very first paragraph of this chapter). Both *left* and
+*right* are to be expanded and their element types will be checked the same
+way. If we get -1 or 1 on some stage, return it. Otherwise return 0.
+
+8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't
+get any conclusions, then invoke ``llvm_unreachable``, since it's quite
+unexpectable case.
+
+cmpValues(const Value*, const Value*)
+-------------------------------------
+Method that compares local values.
+
+This method gives us an answer on a very curious quesion: whether we could treat
+local values as equal, and which value is greater otherwise. It's better to
+start from example:
+
+Consider situation when we're looking at the same place in left function "*FL*"
+and in right function "*FR*". And every part of *left* place is equal to the
+corresponding part of *right* place, and (!) both parts use *Value* instances,
+for example:
+
+.. code-block:: llvm
+
+   instr0 i32 %LV   ; left side, function FL
+   instr0 i32 %RV   ; right side, function FR
+
+So, now our conclusion depends on *Value* instances comparison.
+
+Main purpose of this method is to determine relation between such values.
+
+What we expect from equal functions? At the same place, in functions "*FL*" and
+"*FR*" we expect to see *equal* values, or values *defined* at the same place
+in "*FL*" and "*FR*".
+
+Consider small example here:
+
+.. code-block:: llvm
+
+  define void %f(i32 %pf0, i32 %pf1) {
+    instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123
+  }
+
+.. code-block:: llvm
+
+  define void %g(i32 %pg0, i32 %pg1) {
+    instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123
+  }
+
+In this example, *pf0* is associated with *pg0*, *pf1* is associated with *pg1*,
+and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*.
+
+Instructions with opcode "*instr0*" would be *equal*, since their types and
+opcodes are equal, and values are *associated*.
+
+Instruction with opcode "*instr1*" from *f* is *greater* than instruction with
+opcode "*instr1*" from *g*; here we have equal types and opcodes, but "*pf1* is
+greater than "*pg0*".
+
+And instructions with opcode "*instr2*" are equal, because their opcodes and
+types are equal, and the same constant is used as a value.
+
+What we assiciate in cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Function arguments. *i*-th argument from left function associated with
+  *i*-th argument from right function.
+* BasicBlock instances. In basic-block enumeration loop we associate *i*-th
+  BasicBlock from the left function with *i*-th BasicBlock from the right
+  function.
+* Instructions.
+* Instruction operands. Note, we can meet *Value* here we have never seen
+  before. In this case it is not a function argument, nor *BasicBlock*, nor
+  *Instruction*. It is global value. It is constant, since its the only
+  supposed global here. Method also compares:
+* Constants that are of the same type.
+* If right constant could be losslessly bit-casted to the left one, then we
+  also compare them.
+
+How to implement cmpValues?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+*Association* is a case of equality for us. We just treat such values as equal.
+But, in general, we need to implement antisymmetric relation. As it was
+mentioned above, to understand what is *less*, we can use order in which we
+meet values. If both of values has the same order in function (met at the same
+time), then treat values as *associated*. Otherwise â it depends on who was
+first.
+
+Every time we run top-level compare method, we initialize two identical maps
+(one for the left side, another one for the right side):
+
+``map<Value, int> sn_mapL, sn_mapR;``
+
+The key of the map is the *Value* itself, the *value* â is its order (call it
+*serial number*).
+
+To add value *V* we need to perform the next procedure:
+
+``sn_map.insert(std::make_pair(V, sn_map.size()));``
+
+For the first *Value*, map will return *0*, for second *Value* map will return
+*1*, and so on.
+
+Then we can check whether left and right values met at the same time with simple
+comparison:
+
+``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);``
+
+Of course, we can combine insertion and comparison:
+
+.. code-block:: c++
+
+  std::pair<iterator, bool>
+    LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes
+    = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+  return cmpNumbers(LeftRes.first->second, RightRes.first->second);
+
+Let's look, how whole method could be implemented.
+
+1. we have to start from the bad news. Consider function self and
+cross-referencing cases:
+
+.. code-block:: c++
+
+  // self-reference unsigned fact0(unsigned n) { return n > 1 ? n
+  * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n *
+  fact1(n-1) : 1; }
+
+  // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0;
+  } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
+
+..
+
+  This comparison has been implemented in initial *MergeFunctions* pass
+  version. But, unfortunately, it is not transitive. And this is the only case
+  we can't convert to less-equal-greater comparison. It is a seldom case, 4-5
+  functions of 10000 (checked on test-suite), and, we hope, reader would
+  forgive us for such a sacrifice in order to get the O(log(N)) pass time.
+
+2. If left/right *Value* is a constant, we have to compare them. Return 0 if it
+is the same constant, or use ``cmpConstants`` method otherwise.
+
+3. If left/right is *InlineAsm* instance. Return result of *Value* pointers
+comparison.
+
+4. Explicit association of *L* (left value) and *R*  (right value). We need to
+find out whether values met at the same time, and thus are *associated*. Or we
+need to put the rule: when we treat *L* < *R*. Now it is easy: just return
+result of numbers comparison:
+
+.. code-block:: c++
+
+   std::pair<iterator, bool>
+     LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())),
+     RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size()));
+   if (LeftRes.first->second == RightRes.first->second) return 0;
+   if (LeftRes.first->second < RightRes.first->second) return -1;
+   return 1;
+
+Now when *cmpValues* returns 0, we can proceed comparison procedure. Otherwise,
+if we get (-1 or 1), we need to pass this result to the top level, and finish
+comparison procedure.
+
+cmpConstants
+------------
+Performs constants comparison as follows:
+
+1. Compare constant types using ``cmpType`` method. If result is -1 or 1, goto
+step 2, otherwise proceed to step 3.
+
+2. If types are different, we still can check whether constants could be
+losslessly bitcasted to each other. The further explanation is modification of
+``canLosslesslyBitCastTo`` method.
+
+   2.1 Check whether constants are of the first class types
+   (``isFirstClassType`` check):
+
+   2.1.1. If both constants are *not* of the first class type: return result
+   of ``cmpType``.
+
+   2.1.2. Otherwise, if left type is not of the first class, return -1. If
+   right type is not of the first class, return 1.
+
+   2.1.3. If both types are of the first class type, proceed to the next step
+   (2.1.3.1).
+
+   2.1.3.1. If types are vectors, compare their bitwidth using the
+   *cmpNumbers*. If result is not 0, return it.
+
+   2.1.3.2. Different types, but not a vectors:
+
+   * if both of them are pointers, good for us, we can proceed to step 3.
+   * if one of types is pointer, return result of *isPointer* flags
+     comparison (*cmpFlags* operation).
+   * otherwise we have no methods to prove bitcastability, and thus return
+     result of types comparison (-1 or 1).
+
+Steps below are for the case when types are equal, or case when constants are
+bitcastable:
+
+3. One of constants is a "*null*" value. Return the result of
+``cmpFlags(L->isNullValue, R->isNullValue)`` comparison.
+
+4. Compare value IDs, and return result if it is not 0:
+
+.. code-block:: c++
+
+  if (int Res = cmpNumbers(L->getValueID(), R->getValueID()))
+    return Res;
+
+5. Compare the contents of constants. The comparison depends on kind of
+constants, but on this stage it is just a lexicographical comparison. Just see
+how it was described in the beginning of "*Functions comparison*" paragraph.
+Mathematically it is equal to the next case: we encode left constant and right
+constant (with similar way *bitcode-writer* does). Then compare left code
+sequence and right code sequence.
+
+compare(const BasicBlock*, const BasicBlock*)
+---------------------------------------------
+Compares two *BasicBlock* instances.
+
+It enumerates instructions from left *BB* and right *BB*.
+
+1. It assigns serial numbers to the left and right instructions, using
+``cmpValues`` method.
+
+2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as
+greater than other instructions, if both instructions are *GEPs* use ``cmpGEP``
+method for comparison. If result is -1 or 1, pass it to the top-level
+comparison (return it).
+
+   3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or
+   1, return it.
+
+   3.2. Compare number of operands, if result is -1 or 1, return it.
+
+   3.3. Compare operands themselves, use ``cmpValues`` method. Return result
+   if it is -1 or 1.
+
+   3.4. Compare type of operands, using ``cmpType`` method. Return result if
+   it is -1 or 1.
+
+   3.5. Proceed to the next instruction.
+
+4. We can finish instruction enumeration in 3 cases:
+
+   4.1. We reached the end of both left and right basic-blocks. We didn't
+   exit on steps 1-3, so contents is equal, return 0.
+
+   4.2. We have reached the end of the left basic-block. Return -1.
+
+   4.3. Return 1 (the end of the right basic block).
+
+cmpGEP
+------
+Compares two GEPs (``getelementptr`` instructions).
+
+It differs from regular operations comparison with the only thing: possibility
+to use ``accumulateConstantOffset`` method.
+
+So, if we get constant offset for both left and right *GEPs*, then compare it as
+numbers, and return comparison result.
+
+Otherwise treat it like a regular operation (see previous paragraph).
+
+cmpOperation
+------------
+Compares instruction opcodes and some important operation properties.
+
+1. Compare opcodes, if it differs return the result.
+
+2. Compare number of operands. If it differs â return the result.
+
+3. Compare operation types, use *cmpType*. All the same â if types are
+different, return result.
+
+4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData``
+method, and compare it like a numbers.
+
+5. Compare operand types.
+
+6. For some particular instructions check equivalence (relation in our case) of
+some significant attributes. For example we have to compare alignment for
+``load`` instructions.
+
+O(log(N))
+---------
+Methods described above implement order relationship. And latter, could be used
+for nodes comparison in a binary tree. So we can organize functions set into
+the binary tree and reduce the cost of lookup procedure from
+O(N*N) to O(log(N)).
+
+Merging process, mergeTwoFunctions
+==================================
+Once *MergeFunctions* detected that current function (*G*) is equal to one that
+were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*,
+Function*)``.
+
+Operation affects ``FnTree`` contents with next way: *F* will stay in
+``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of
+*G* would be replaced with something else. It changes bodies of callers. So,
+functions that calls *G* would be put into ``Deferred`` set and removed from
+``FnTree``, and analyzed again.
+
+The approach is next:
+
+1. Most wished case: when we can use alias and both of *F* and *G* are weak. We
+make both of them with aliases to the third strong function *H*. Actually *H*
+is *F*. See below how it's made (but it's better to look straight into the
+source code). Well, this is a case when we can just replace *G* with *F*
+everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*).
+
+2. *F* could not be overridden, while *G* could. It would be good to do the
+next: after merging the places where overridable function were used, still use
+overridable stub. So try to make *G* alias to *F*, or create overridable tail
+call wrapper around *F* and replace *G* with that call.
+
+3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just
+change the callers: call *F* instead of *G*.  That's what
+``replaceDirectCallers`` does.
+
+Below is detailed body description.
+
+If âFâ may be overridden
+------------------------
+As follows from ``mayBeOverridden`` comments: âwhether the definition of this
+global may be replaced by something non-equivalent at link timeâ. If so, thats
+ok: we can use alias to *F* instead of *G* or change call instructions itself.
+
+HasGlobalAliases, removeUsers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+First consider the case when we have global aliases of one function name to
+another. Our purpose is  make both of them with aliases to the third strong
+function. Though if we keep *F* alive and without major changes we can leave it
+in ``FnTree``. Try to combine these two goals.
+
+Do stub replacement of *F* itself with an alias to *F*.
+
+1. Create stub function *H*, with the same name and attributes like function
+*F*. It takes maximum alignment of *F* and *G*.
+
+2. Replace all uses of function *F* with uses of function *H*. It is the two
+steps procedure instead. First of all, we must take into account, all functions
+from whom *F* is called would be changed: since we change the call argument
+(from *F* to *H*). If so we must to review these caller functions again after
+this procedure. We remove callers from ``FnTree``, method with name
+``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``):
+
+   2.1. ``Inside removeUsers(Value*
+   V)`` we go through the all values that use value *V* (or *F* in our context).
+   If value is instruction, we go to function that holds this instruction and
+   mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove
+   caller from ``FnTree``.
+
+   2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``.
+
+3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*.
+Do the same with *G*: replace it with alias to *F*. So finally everywhere *F*
+was used, we use *H* and it is alias to *F*, and everywhere *G* was used we
+also have alias to *F*.
+
+4. Set *F* linkage to private. Make it strong :-)
+
+No global aliases, replaceDirectCallers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If global aliases are not supported. We call ``replaceDirectCallers`` then. Just
+go through all calls of *G* and replace it with calls of *F*. If you look into
+method you will see that it scans all uses of *G* too, and if use is callee (if
+user is call instruction and *G* is used as what to be called), we replace it
+with use of *F*.
+
+If âFâ could not be overridden, fix it!
+"""""""""""""""""""""""""""""""""""""""
+
+We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace
+*G* with alias to *F* first. Next conditions are essential:
+
+* target should support global aliases,
+* the address itself of  *G* should be not significant, not named and not
+  referenced anywhere,
+* function should come with external, local or weak linkage.
+
+Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*,
+so *G* could be replaced with this wrapper.
+
+*writeAlias*
+
+As follows from *llvm* reference:
+
+âAliases act as *second name* for the aliasee valueâ. So we just want to create
+second name for *F* and use it instead of *G*:
+
+1. create global alias itself (*GA*),
+
+2. adjust alignment of *F* so it must be maximum of current and *G's* alignment;
+
+3. replace uses of *G*:
+
+   3.1. first mark all callers of *G* as to-be-analyzed-again, using
+   ``removeUsers`` method (see chapter above),
+
+   3.2. call ``G->replaceAllUsesWith(GA)``.
+
+4. Get rid of *G*.
+
+*writeThunk*
+
+As it written in method comments:
+
+âReplace G with a simple tail call to bitcast(F). Also replace direct uses of G
+with bitcast(F). Deletes G.â
+
+In general it does the same as usual when we want to replace callee, except the
+first point:
+
+1. We generate tail call wrapper around *F*, but with interface that allows use
+it instead of *G*.
+
+2. âAs-usualâ: ``removeUsers`` and ``replaceAllUsesWith`` then.
+
+3. Get rid of *G*.
+
+That's it.
+==========
+We have described how to detect equal functions, and how to merge them, and in
+first chapter we have described how it works all-together. Author hopes, reader
+have some picture from now, and it helps him improve and debug Â­this pass.
+
+Reader is welcomed to send us any questions and proposals ;-)
diff --git a/docs/Passes.rst b/docs/Passes.rst
index 9f4009297ac..3f9534182c7 100644
--- a/docs/Passes.rst
+++ b/docs/Passes.rst
@@ -891,17 +891,24 @@ calls, or transforming sets of stores into ``memset``\ s.
 
 This pass looks for equivalent functions that are mergable and folds them.
 
-A hash is computed from the function, based on its type and number of basic
-blocks.
+Total-ordering is introduced among the functions set: we define comparison
+that answers for every two functions which of them is greater. It allows to
+arrange functions into the binary tree.
 
-Once all hashes are computed, we perform an expensive equality comparison on
-each function pair.  This takes n^2/2 comparisons per bucket, so it's important
-that the hash function be high quality.  The equality comparison iterates
-through each instruction in each basic block.
+For every new function we check for equivalent in tree.
 
-When a match is found the functions are folded.  If both functions are
-overridable, we move the functionality into a new internal function and leave
-two overridable thunks to it.
+If equivalent exists we fold such functions. If both functions are overridable,
+we move the functionality into a new internal function and leave two
+overridable thunks to it.
+
+If there is no equivalent, then we add this function to tree.
+
+Lookup routine has O(log(n)) complexity, while whole merging process has
+complexity of O(n*log(n)).
+
+Read
+:doc:`this <MergeFunctions>`
+article for more details.
 
 ``-mergereturn``: Unify function exit nodes
 -------------------------------------------
diff --git a/docs/index.rst b/docs/index.rst
index f7a78f28497..bdf788f7b28 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -241,6 +241,7 @@ For API clients and LLVM developers.
    BigEndianNEON
    CoverageMappingFormat
    Statepoints
+   MergeFunctions
 
 :doc:`WritingAnLLVMPass`
    Information on how to write LLVM transformations and analyses.
@@ -337,6 +338,9 @@ For API clients and LLVM developers.
   This describes a set of experimental extensions for garbage
   collection support.
 
+:doc:`MergeFunctions`
+  Describes functions merging optimization.
+
 Development Process Documentation
 =================================
 
-- 
2.34.1