folly/docs/FBVector.md

   1 `folly/FBvector.h`
   2 ------------------
   3
   4 Simply replacing `std::vector` with `folly::fbvector` (after
   5 having included the `folly/FBVector.h` header file) will
   6 improve the performance of your C++ code using vectors with
   7 common coding patterns. The improvements are always non-negative,
   8 almost always measurable, frequently significant, sometimes
   9 dramatic, and occasionally spectacular.
  10
  11 ### Motivation
  12 ***
  13
  14 std::vector is the stalwart abstraction many use for
  15 dynamically-allocated arrays in C++. It is also the best known
  16 and most used of all containers. It may therefore seem a
  17 surprise that `std::vector` leaves important - and sometimes
  18 vital - efficiency opportunities on the table. This document
  19 explains how our own drop-in abstraction `fbvector` improves key
  20 performance aspects of `std::vector`. Refer to
  21 folly/test/FBVectorTest.cpp for a few benchmarks.
  22
  23 ### Memory Handling
  24 ***
  25
  26 It is well known that `std::vector` grows exponentially (at a
  27 constant factor) in order to avoid quadratic growth performance.
  28 The trick is choosing a good factor (any factor greater than 1
  29 ensures O(1) amortized append complexity towards infinity). A
  30 factor that's too small causes frequent vector reallocation; one
  31 that's too large forces the vector to consume much more memory
  32 than needed. The initial HP implementation by Stepanov used a
  33 growth factor of 2, i.e. whenever you'd `push_back` into a vector
  34 without there being room, it would double the current capacity.
  35
  36 With time, other compilers reduced the growth factor to 1.5, but
  37 gcc has staunchly used a growth factor of 2. In fact it can be
  38 mathematically proven that a growth factor of 2 is rigorously the
  39 <i>worst</i> possible because it never allows the vector to reuse
  40 any of its previously-allocated memory. That makes the vector cache-
  41 unfriendly and memory manager unfriendly.
  42
  43 To see why that's the case, consider a large vector of capacity C
  44 residing somewhere at the beginning of an initially unoccupied
  45 chunk. When the request for growth comes about, the vector
  46 (assuming no in-place resizing, see the appropriate section in
  47 this document) will allocate a chunk next to its current chunk,
  48 copy its existing data, and then deallocate the old chunk. So now
  49 we have a chunk of size C followed by a chunk of size k * C.
  50 Continuing this process we'll then have a chunk of size k * k * C
  51 to the right and so on. That leads to a series of the form (using
  52 ^^ for power):
  53
  54 ```
  55     C, C*k,  C*k^^2, C*k^^3, ...
  56 ```
  57
  58 If we choose k = 2 we know that every element in the series will
  59 be strictly larger than the sum of all previous ones because of
  60 the remarkable equality:
  61
  62 ```
  63     1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
  64 ```
  65
  66 What that really means is that the new request for a chunk will
  67 be never satisfiable by coalescing all previously-used chunks.
  68 This is not quite what you'd want.
  69
  70 We would of course want the vector to not crawl forward in
  71 memory, but instead to move back to its previously-allocated
  72 chunks. Any number smaller than 2 guarantees that you'll be able
  73 at some point to reuse the previous chunks. Going through the
  74 math reveals the equation:
  75
  76 ```
  77     k^^n <= 1 + k + k^^2 + ... + k^^(n-2)
  78 ```
  79
  80 If some number n satisfies that equation, it means you can reuse
  81 memory after n reallocations. The graphical solver below reveals
  82 that choosing k = 1.5 (blue line) allows memory reuse after 4
  83 reallocations, choosing k = 1.45 (red line) allows memory reuse
  84 after 3 reallocations, and choosing k = 1.3 (black line) allows
  85 reuse after only 2 reallocations.
  86
  87 ![graphical solutions](./Fbvector--graphical_solutions.png)
  88
  89 Of course, the above makes a number of simplifying assumptions
  90 about how the memory allocator works, but definitely you don't
  91 want to choose the theoretically absolute worst growth factor.
  92 `fbvector` uses a growth factor of 1.5. That does not impede good
  93 performance at small sizes because of the way `fbvector`
  94 cooperates with jemalloc (below).
  95
  96 ### The jemalloc Connection
  97 ***
  98
  99 Virtually all modern allocators allocate memory in fixed-size
 100 quanta that are chosen to minimize management overhead while at
 101 the same time offering good coverage at low slack. For example, an
 102 allocator may choose blocks of doubling size (32, 64, 128,
 103 <t_co>, ...) up to 4096, and then blocks of size multiples of a
 104 page up until 1MB, and then 512KB increments and so on.
 105
 106 As discussed above, `std::vector` also needs to (re)allocate in
 107 quanta. The next quantum is usually defined in terms of the
 108 current size times the infamous growth constant. Because of this
 109 setup, `std::vector` has some slack memory at the end much like
 110 an allocated block has some slack memory at the end.
 111
 112 It doesn't take a rocket surgeon to figure out that an allocator-
 113 aware `std::vector` would be a marriage made in heaven: the
 114 vector could directly request blocks of "perfect" size from the
 115 allocator so there would be virtually no slack in the allocator.
 116 Also, the entire growth strategy could be adjusted to work
 117 perfectly with allocator's own block growth strategy. That's
 118 exactly what `fbvector` does - it automatically detects the use
 119 of jemalloc and adjusts its reallocation strategy accordingly.
 120
 121 But wait, there's more. Many memory allocators do not support in-
 122 place reallocation, although most of them could. This comes from
 123 the now notorious design of `realloc()` to opaquely perform
 124 either in-place reallocation or an allocate-memcpy-deallocate
 125 cycle. Such lack of control subsequently forced all clib-based
 126 allocator designs to avoid in-place reallocation, and that
 127 includes C++'s `new` and `std:allocator`. This is a major loss of
 128 efficiency because an in-place reallocation, being very cheap,
 129 may mean a much less aggressive growth strategy. In turn that
 130 means less slack memory and faster reallocations.
 131
 132 ### Object Relocation
 133 ***
 134
 135 One particularly sensitive topic about handling C++ values is
 136 that they are all conservatively considered <i>non-
 137 relocatable</i>. In contrast, a relocatable value would preserve
 138 its invariant even if its bits were moved arbitrarily in memory.
 139 For example, an `int32` is relocatable because moving its 4 bytes
 140 would preserve its actual value, so the address of that value
 141 does not "matter" to its integrity.
 142
 143 C++'s assumption of non-relocatable values hurts everybody for
 144 the benefit of a few questionable designs. The issue is that
 145 moving a C++ object "by the book" entails (a) creating a new copy
 146 from the existing value; (b) destroying the old value. This is
 147 quite vexing and violates common sense; consider this
 148 hypothetical conversation between Captain Picard and an
 149 incredulous alien:
 150
 151 Incredulous Alien: "So, this teleporter, how does it work?"<br>
 152 Picard: "It beams people and arbitrary matter from one place to
 153 another."<br> Incredulous Alien: "Hmmm... is it safe?"<br>
 154 Picard: "Yes, but earlier models were a hassle. They'd clone the
 155 person to another location. Then the teleporting chief would have
 156 to shoot the original. Ask O'Brien, he was an intern during those
 157 times. A bloody mess, that's what it was."
 158
 159 Only a tiny minority of objects are genuinely non-relocatable:
 160
 161 * Objects that use internal pointers, e.g.:
 162
 163 ``` Cpp
 164     class Ew {
 165       char buffer[1024];
 166       char * pointerInsideBuffer;
 167     public:
 168       Ew() : pointerInsideBuffer(buffer) {}
 169       ...
 170     };
 171 ```
 172
 173 * Objects that need to update "observers" that store pointers to them.
 174
 175 The first class of designs can always be redone at small or no
 176 cost in efficiency. The second class of objects should not be
 177 values in the first place - they should be allocated with `new`
 178 and manipulated using (smart) pointers. It is highly unusual for
 179 a value to have observers that alias pointers to it.
 180
 181 Relocatable objects are of high interest to `std::vector` because
 182 such knowledge makes insertion into the vector and vector
 183 reallocation considerably faster: instead of going to Picard's
 184 copy-destroy cycle, relocatable objects can be moved around
 185 simply by using `memcpy` or `memmove`. This optimization can
 186 yield arbitrarily high wins in efficiency; for example, it
 187 transforms `vector< vector<double> >` or `vector< hash_map<int,
 188 string> >` from risky liabilities into highly workable
 189 compositions.
 190
 191 In order to allow fast relocation without risk, `fbvector` uses a
 192 trait `folly::IsRelocatable` defined in `"folly/Traits.h"`. By default,
 193 `folly::IsRelocatable::value` conservatively yields false. If
 194 you know that your type `Widget` is in fact relocatable, go right
 195 after `Widget`'s definition and write this:
 196
 197 ``` Cpp
 198     // at global namespace level
 199     namespace folly {
 200     struct IsRelocatable<Widget> : boost::true_type {};
 201     }
 202 ```
 203
 204 If you don't do this, `fbvector<Widget>` will fail to compile
 205 with a `BOOST_STATIC_ASSERT`.
 206
 207 #### Additional Constraints
 208
 209 Similar improvements are possible in presence of a "simple" type
 210 - more specifically, one that has a trivial assignment (i.e.
 211 assignment is the same as bitblitting the bits over) or a nothrow
 212 default constructor. These traits are used gainfully by
 213 `fbvector` in a variety of places. Fortunately, these traits are
 214 already present in the C++ standard (well, currently in Boost).
 215 To summarize, in order to work with `fbvector`, a type `Widget`
 216 must pass:
 217
 218 ``` Cpp
 219     BOOST_STATIC_ASSERT(
 220       IsRelocatable<Widget>::value &&
 221       (boost::has_trivial_assign<T>::value || boost::has_nothrow_constructor<T>::value));
 222 ```
 223
 224 These traits go hand in hand; for example, it would be very
 225 difficult to design a class that satisfies one branch of the
 226 conjunction above but not the other. `fbvector` uses these simple
 227 constraints to minimize the number of copies made on many common
 228 operations such as `push_back`, `insert`, or `resize`.
 229
 230 To make it easy for you to state assumptions about a given type
 231 or family of parameterized types, check Traits.h and in
 232 particular handy family of macros FOLLY_ASSUME_FBVECTOR_COMPATIBLE*.
 233
 234 ### Miscellaneous
 235 ***
 236
 237 `fbvector` uses a careful implementation all around to make
 238 sure it doesn't lose efficiency through the cracks. Some future
 239 directions may be in improving raw memory copying (`memcpy` is
 240 not an intrinsic in gcc and does not work terribly well for
 241 large chunks) and in furthering the collaboration with
 242 jemalloc. Have fun!