folly/docs/FBVector.md

   1 `folly/FBVector.h`
   2 ------------------
   3
   4 Simply replacing `std::vector` with `folly::fbvector` (after
   5 having included the `folly/FBVector.h` header file) will
   6 improve the performance of your C++ code using vectors with
   7 common coding patterns. The improvements are always non-negative,
   8 almost always measurable, frequently significant, sometimes
   9 dramatic, and occasionally spectacular.
  10
  11 ### Sample
  12 ***
  13
  14     folly::fbvector<int> numbers({0, 1, 2, 3});
  15     numbers.reserve(10);
  16     for (int i = 4; i < 10; i++) {
  17       numbers.push_back(i * 2);
  18     }
  19     assert(numbers[6] == 12);
  20
  21 ### Motivation
  22 ***
  23
  24 `std::vector` is the stalwart abstraction many use for
  25 dynamically-allocated arrays in C++. It is also the best known
  26 and most used of all containers. It may therefore seem a
  27 surprise that `std::vector` leaves important - and sometimes
  28 vital - efficiency opportunities on the table. This document
  29 explains how our own drop-in abstraction `fbvector` improves key
  30 performance aspects of `std::vector`. Refer to
  31 folly/test/FBVectorTest.cpp for a few benchmarks.
  32
  33 ### Memory Handling
  34 ***
  35
  36 It is well known that `std::vector` grows exponentially (at a
  37 constant factor) in order to avoid quadratic growth performance.
  38 The trick is choosing a good factor. Any factor greater than 1
  39 ensures O(1) amortized append complexity towards infinity. But a
  40 factor that's too small (say, 1.1) causes frequent vector reallocation, and
  41 one that's too large (say, 3 or 4) forces the vector to consume much more
  42 memory than needed.
  43
  44 The initial HP implementation by Stepanov used a
  45 growth factor of 2; i.e., whenever you'd `push_back` into a vector
  46 without there being room, it would double the current capacity. This
  47 was not a good choice: it can be mathematically proven that a growth factor of
  48 2 is rigorously the <i>worst</i> possible because it never allows the vector
  49 to reuse any of its previously-allocated memory. Despite other compilers
  50 reducing the growth factor to 1.5, gcc has staunchly maintained its factor of
  51 2. This makes `std::vector` cache- unfriendly and memory manager unfriendly.
  52
  53 To see why that's the case, consider a large vector of capacity C.
  54 When there's a request to grow the vector, the vector
  55 (assuming no in-place resizing, see the appropriate section in
  56 this document) will allocate a chunk of memory next to its current chunk,
  57 copy its existing data to the new chunk, and then deallocate the old chunk.
  58 So now we have a chunk of size C followed by a chunk of size k * C. Continuing
  59 this process we'll then have a chunk of size k * k * C to the right and so on.
  60 That leads to a series of the form (using ^^ for power):
  61
  62     C, C*k,  C*k^^2, C*k^^3, ...
  63
  64 If we choose k = 2 we know that every element in the series will
  65 be strictly larger than the sum of all previous ones because of
  66 the remarkable equality:
  67
  68     1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
  69
  70 This means that any new chunk requested will be larger
  71 than all previously used chunks combined, so the vector must
  72 crawl forward in memory; it can't move back to its deallocated chunks.
  73 But any number smaller than 2 guarantees that you'll at some point be
  74 able to reuse the previous chunks. For instance, choosing 1.5 as the factor
  75 allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3
  76 reallocations; and 1.3 allows reuse after only 2 reallocations.
  77
  78 Of course, the above makes a number of simplifying assumptions
  79 about how the memory allocator works, but definitely you don't
  80 want to choose the theoretically absolute worst growth factor.
  81 `fbvector` uses a growth factor of 1.5. That does not impede good
  82 performance at small sizes because of the way `fbvector`
  83 cooperates with jemalloc (below).
  84
  85 ### The jemalloc Connection
  86 ***
  87
  88 Virtually all modern allocators allocate memory in fixed-size
  89 quanta that are chosen to minimize management overhead while at
  90 the same time offering good coverage at low slack. For example, an
  91 allocator may choose blocks of doubling size (32, 64, 128,
  92 <t_co>, ...) up to 4096, and then blocks of size multiples of a
  93 page up until 1MB, and then 512KB increments and so on.
  94
  95 As discussed above, `std::vector` also needs to (re)allocate in
  96 quanta. The next quantum is usually defined in terms of the
  97 current size times the infamous growth constant. Because of this
  98 setup, `std::vector` has some slack memory at the end much like
  99 an allocated block has some slack memory at the end.
 100
 101 It doesn't take a rocket surgeon to figure out that an allocator-
 102 aware `std::vector` would be a marriage made in heaven: the
 103 vector could directly request blocks of "perfect" size from the
 104 allocator so there would be virtually no slack in the allocator.
 105 Also, the entire growth strategy could be adjusted to work
 106 perfectly with allocator's own block growth strategy. That's
 107 exactly what `fbvector` does - it automatically detects the use
 108 of jemalloc and adjusts its reallocation strategy accordingly.
 109
 110 But wait, there's more. Many memory allocators do not support in-
 111 place reallocation, although most of them could. This comes from
 112 the now notorious design of `realloc()` to opaquely perform
 113 either in-place reallocation or an allocate-memcpy-deallocate
 114 cycle. Such lack of control subsequently forced all clib-based
 115 allocator designs to avoid in-place reallocation, and that
 116 includes C++'s `new` and `std::allocator`. This is a major loss of
 117 efficiency because an in-place reallocation, being very cheap,
 118 may mean a much less aggressive growth strategy. In turn that
 119 means less slack memory and faster reallocations.
 120
 121 ### Object Relocation
 122 ***
 123
 124 One particularly sensitive topic about handling C++ values is
 125 that they are all conservatively considered <i>non-
 126 relocatable</i>. In contrast, a relocatable value would preserve
 127 its invariant even if its bits were moved arbitrarily in memory.
 128 For example, an `int32` is relocatable because moving its 4 bytes
 129 would preserve its actual value, so the address of that value
 130 does not "matter" to its integrity.
 131
 132 C++'s assumption of non-relocatable values hurts everybody for
 133 the benefit of a few questionable designs. The issue is that
 134 moving a C++ object "by the book" entails (a) creating a new copy
 135 from the existing value; (b) destroying the old value. This is
 136 quite vexing and violates common sense; consider this
 137 hypothetical conversation between Captain Picard and an
 138 incredulous alien:
 139
 140 Incredulous Alien: "So, this teleporter, how does it work?"<br>
 141 Picard: "It beams people and arbitrary matter from one place to
 142 another."<br> Incredulous Alien: "Hmmm... is it safe?"<br>
 143 Picard: "Yes, but earlier models were a hassle. They'd clone the
 144 person to another location. Then the teleporting chief would have
 145 to shoot the original. Ask O'Brien, he was an intern during those
 146 times. A bloody mess, that's what it was."
 147
 148 Only a tiny minority of objects are genuinely non-relocatable:
 149
 150 * Objects that use internal pointers, e.g.:
 151
 152     class Ew {
 153       char buffer[1024];
 154       char * pointerInsideBuffer;
 155     public:
 156       Ew() : pointerInsideBuffer(buffer) {}
 157       ...
 158     }
 159
 160 * Objects that need to update "observers" that store pointers to them.
 161
 162 The first class of designs can always be redone at small or no
 163 cost in efficiency. The second class of objects should not be
 164 values in the first place - they should be allocated with `new`
 165 and manipulated using (smart) pointers. It is highly unusual for
 166 a value to have observers that alias pointers to it.
 167
 168 Relocatable objects are of high interest to `std::vector` because
 169 such knowledge makes insertion into the vector and vector
 170 reallocation considerably faster: instead of going to Picard's
 171 copy-destroy cycle, relocatable objects can be moved around
 172 simply by using `memcpy` or `memmove`. This optimization can
 173 yield arbitrarily high wins in efficiency; for example, it
 174 transforms `vector< vector<double> >` or `vector< hash_map<int,
 175 string> >` from risky liabilities into highly workable
 176 compositions.
 177
 178 In order to allow fast relocation without risk, `fbvector` uses a
 179 trait `folly::IsRelocatable` defined in `"folly/Traits.h"`. By default,
 180 `folly::IsRelocatable::value` conservatively yields false. If
 181 you know that your type `Widget` is in fact relocatable, go right
 182 after `Widget`'s definition and write this:
 183
 184     // at global namespace level
 185     namespace folly {
 186       struct IsRelocatable<Widget> : boost::true_type {};
 187     }
 188
 189 If you don't do this, `fbvector<Widget>` will fail to compile
 190 with a `static_assert`.
 191
 192 ### Miscellaneous
 193 ***
 194
 195 `fbvector` uses a careful implementation all around to make
 196 sure it doesn't lose efficiency through the cracks. Some future
 197 directions may be in improving raw memory copying (`memcpy` is
 198 not an intrinsic in gcc and does not work terribly well for
 199 large chunks) and in furthering the collaboration with
 200 jemalloc. Have fun!