Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6

[folly.git] / folly / docs / FBVector.md
diff --git a/folly/docs/FBVector.md b/folly/docs/FBVector.md

new file mode 100644 (file)

index 0000000..c340a39
--- /dev/null
+++ b/folly/docs/FBVector.md
@@ -0,0 +1,242 @@
+`folly/FBvector.h`
+------------------
+
+Simply replacing `std::vector` with `folly::fbvector` (after
+having included the `folly/FBVector.h` header file) will
+improve the performance of your C++ code using vectors with
+common coding patterns. The improvements are always non-negative,
+almost always measurable, frequently significant, sometimes
+dramatic, and occasionally spectacular.
+
+### Motivation
+***
+
+std::vector is the stalwart abstraction many use for
+dynamically-allocated arrays in C++. It is also the best known
+and most used of all containers. It may therefore seem a
+surprise that `std::vector` leaves important - and sometimes
+vital - efficiency opportunities on the table. This document
+explains how our own drop-in abstraction `fbvector` improves key
+performance aspects of `std::vector`. Refer to
+folly/test/FBVectorTest.cpp for a few benchmarks.
+
+### Memory Handling
+***
+
+It is well known that `std::vector` grows exponentially (at a
+constant factor) in order to avoid quadratic growth performance.
+The trick is choosing a good factor (any factor greater than 1
+ensures O(1) amortized append complexity towards infinity). A
+factor that's too small causes frequent vector reallocation; one
+that's too large forces the vector to consume much more memory
+than needed. The initial HP implementation by Stepanov used a
+growth factor of 2, i.e. whenever you'd `push_back` into a vector
+without there being room, it would double the current capacity.
+
+With time, other compilers reduced the growth factor to 1.5, but
+gcc has staunchly used a growth factor of 2. In fact it can be
+mathematically proven that a growth factor of 2 is rigorously the
+<i>worst</i> possible because it never allows the vector to reuse
+any of its previously-allocated memory. That makes the vector cache-
+unfriendly and memory manager unfriendly.
+
+To see why that's the case, consider a large vector of capacity C
+residing somewhere at the beginning of an initially unoccupied
+chunk. When the request for growth comes about, the vector
+(assuming no in-place resizing, see the appropriate section in
+this document) will allocate a chunk next to its current chunk,
+copy its existing data, and then deallocate the old chunk. So now
+we have a chunk of size C followed by a chunk of size k * C.
+Continuing this process we'll then have a chunk of size k * k * C
+to the right and so on. That leads to a series of the form (using
+^^ for power):
+
+```
+    C, C*k,  C*k^^2, C*k^^3, ...
+```
+
+If we choose k = 2 we know that every element in the series will
+be strictly larger than the sum of all previous ones because of
+the remarkable equality:
+
+```
+    1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
+```
+
+What that really means is that the new request for a chunk will
+be never satisfiable by coalescing all previously-used chunks.
+This is not quite what you'd want.
+
+We would of course want the vector to not crawl forward in
+memory, but instead to move back to its previously-allocated
+chunks. Any number smaller than 2 guarantees that you'll be able
+at some point to reuse the previous chunks. Going through the
+math reveals the equation:
+
+```
+    k^^n <= 1 + k + k^^2 + ... + k^^(n-2)
+```
+
+If some number n satisfies that equation, it means you can reuse
+memory after n reallocations. The graphical solver below reveals
+that choosing k = 1.5 (blue line) allows memory reuse after 4
+reallocations, choosing k = 1.45 (red line) allows memory reuse
+after 3 reallocations, and choosing k = 1.3 (black line) allows
+reuse after only 2 reallocations.
+
+![graphical solutions](./Fbvector--graphical_solutions.png)
+
+Of course, the above makes a number of simplifying assumptions
+about how the memory allocator works, but definitely you don't
+want to choose the theoretically absolute worst growth factor.
+`fbvector` uses a growth factor of 1.5. That does not impede good
+performance at small sizes because of the way `fbvector`
+cooperates with jemalloc (below).
+
+### The jemalloc Connection
+***
+
+Virtually all modern allocators allocate memory in fixed-size
+quanta that are chosen to minimize management overhead while at
+the same time offering good coverage at low slack. For example, an
+allocator may choose blocks of doubling size (32, 64, 128,
+<t_co>, ...) up to 4096, and then blocks of size multiples of a
+page up until 1MB, and then 512KB increments and so on.
+
+As discussed above, `std::vector` also needs to (re)allocate in
+quanta. The next quantum is usually defined in terms of the
+current size times the infamous growth constant. Because of this
+setup, `std::vector` has some slack memory at the end much like
+an allocated block has some slack memory at the end.
+
+It doesn't take a rocket surgeon to figure out that an allocator-
+aware `std::vector` would be a marriage made in heaven: the
+vector could directly request blocks of "perfect" size from the
+allocator so there would be virtually no slack in the allocator.
+Also, the entire growth strategy could be adjusted to work
+perfectly with allocator's own block growth strategy. That's
+exactly what `fbvector` does - it automatically detects the use
+of jemalloc and adjusts its reallocation strategy accordingly.
+
+But wait, there's more. Many memory allocators do not support in-
+place reallocation, although most of them could. This comes from
+the now notorious design of `realloc()` to opaquely perform
+either in-place reallocation or an allocate-memcpy-deallocate
+cycle. Such lack of control subsequently forced all clib-based
+allocator designs to avoid in-place reallocation, and that
+includes C++'s `new` and `std:allocator`. This is a major loss of
+efficiency because an in-place reallocation, being very cheap,
+may mean a much less aggressive growth strategy. In turn that
+means less slack memory and faster reallocations.
+
+### Object Relocation
+***
+
+One particularly sensitive topic about handling C++ values is
+that they are all conservatively considered <i>non-
+relocatable</i>. In contrast, a relocatable value would preserve
+its invariant even if its bits were moved arbitrarily in memory.
+For example, an `int32` is relocatable because moving its 4 bytes
+would preserve its actual value, so the address of that value
+does not "matter" to its integrity.
+
+C++'s assumption of non-relocatable values hurts everybody for
+the benefit of a few questionable designs. The issue is that
+moving a C++ object "by the book" entails (a) creating a new copy
+from the existing value; (b) destroying the old value. This is
+quite vexing and violates common sense; consider this
+hypothetical conversation between Captain Picard and an
+incredulous alien:
+
+Incredulous Alien: "So, this teleporter, how does it work?"<br>
+Picard: "It beams people and arbitrary matter from one place to
+another."<br> Incredulous Alien: "Hmmm... is it safe?"<br>
+Picard: "Yes, but earlier models were a hassle. They'd clone the
+person to another location. Then the teleporting chief would have
+to shoot the original. Ask O'Brien, he was an intern during those
+times. A bloody mess, that's what it was."
+
+Only a tiny minority of objects are genuinely non-relocatable:
+
+* Objects that use internal pointers, e.g.:
+
+``` Cpp
+    class Ew {
+      char buffer[1024];
+      char * pointerInsideBuffer;
+    public:
+      Ew() : pointerInsideBuffer(buffer) {}
+      ...
+    };
+```
+
+* Objects that need to update "observers" that store pointers to them.
+
+The first class of designs can always be redone at small or no
+cost in efficiency. The second class of objects should not be
+values in the first place - they should be allocated with `new`
+and manipulated using (smart) pointers. It is highly unusual for
+a value to have observers that alias pointers to it.
+
+Relocatable objects are of high interest to `std::vector` because
+such knowledge makes insertion into the vector and vector
+reallocation considerably faster: instead of going to Picard's
+copy-destroy cycle, relocatable objects can be moved around
+simply by using `memcpy` or `memmove`. This optimization can
+yield arbitrarily high wins in efficiency; for example, it
+transforms `vector< vector<double> >` or `vector< hash_map<int,
+string> >` from risky liabilities into highly workable
+compositions.
+
+In order to allow fast relocation without risk, `fbvector` uses a
+trait `folly::IsRelocatable` defined in `"folly/Traits.h"`. By default,
+`folly::IsRelocatable::value` conservatively yields false. If
+you know that your type `Widget` is in fact relocatable, go right
+after `Widget`'s definition and write this:
+
+``` Cpp
+    // at global namespace level
+    namespace folly {
+    struct IsRelocatable<Widget> : boost::true_type {};
+    }
+```
+
+If you don't do this, `fbvector<Widget>` will fail to compile
+with a `BOOST_STATIC_ASSERT`.
+
+#### Additional Constraints
+
+Similar improvements are possible in presence of a "simple" type
+- more specifically, one that has a trivial assignment (i.e.
+assignment is the same as bitblitting the bits over) or a nothrow
+default constructor. These traits are used gainfully by
+`fbvector` in a variety of places. Fortunately, these traits are
+already present in the C++ standard (well, currently in Boost).
+To summarize, in order to work with `fbvector`, a type `Widget`
+must pass:
+
+``` Cpp
+    BOOST_STATIC_ASSERT(
+      IsRelocatable<Widget>::value &&
+      (boost::has_trivial_assign<T>::value || boost::has_nothrow_constructor<T>::value));
+```
+
+These traits go hand in hand; for example, it would be very
+difficult to design a class that satisfies one branch of the
+conjunction above but not the other. `fbvector` uses these simple
+constraints to minimize the number of copies made on many common
+operations such as `push_back`, `insert`, or `resize`.
+
+To make it easy for you to state assumptions about a given type
+or family of parameterized types, check Traits.h and in
+particular handy family of macros FOLLY_ASSUME_FBVECTOR_COMPATIBLE*.
+
+### Miscellaneous
+***
+
+`fbvector` uses a careful implementation all around to make
+sure it doesn't lose efficiency through the cracks. Some future
+directions may be in improving raw memory copying (`memcpy` is
+not an intrinsic in gcc and does not work terribly well for
+large chunks) and in furthering the collaboration with
+jemalloc. Have fun!