folly/docs/Synchronized.md

   1 `folly/Synchronized.h`
   2 ----------------------
   3
   4 `folly/Synchronized.h` introduces a simple abstraction for mutex-
   5 based concurrency. It replaces convoluted, unwieldy, and just
   6 plain wrong code with simple constructs that are easy to get
   7 right and difficult to get wrong.
   8
   9 ### Motivation
  10
  11 Many of our multithreaded C++ programs use shared data structures
  12 associated with locks. This follows the time-honored adage of
  13 mutex-based concurrency control "associate mutexes with data, not code".
  14 Consider the following example:
  15
  16 ``` Cpp
  17
  18     class RequestHandler {
  19       ...
  20       RequestQueue requestQueue_;
  21       SharedMutex requestQueueMutex_;
  22
  23       std::map<std::string, Endpoint> requestEndpoints_;
  24       SharedMutex requestEndpointsMutex_;
  25
  26       HandlerState workState_;
  27       SharedMutex workStateMutex_;
  28       ...
  29     };
  30 ```
  31
  32 Whenever the code needs to read or write some of the protected
  33 data, it acquires the mutex for reading or for reading and
  34 writing. For example:
  35
  36 ``` Cpp
  37     void RequestHandler::processRequest(const Request& request) {
  38       stop_watch<> watch;
  39       checkRequestValidity(request);
  40       SharedMutex::WriteHolder lock(requestQueueMutex_);
  41       requestQueue_.push_back(request);
  42       stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
  43       LOG(INFO) << "enqueued request ID " << request.getID();
  44     }
  45 ```
  46
  47 However, the correctness of the technique is entirely predicated on
  48 convention.  Developers manipulating these data members must take care
  49 to explicitly acquire the correct lock for the data they wish to access.
  50 There is no ostensible error for code that:
  51
  52 * manipulates a piece of data without acquiring its lock first
  53 * acquires a different lock instead of the intended one
  54 * acquires a lock in read mode but modifies the guarded data structure
  55 * acquires a lock in read-write mode although it only has `const` access
  56   to the guarded data
  57
  58 ### Introduction to `folly/Synchronized.h`
  59
  60 The same code sample could be rewritten with `Synchronized`
  61 as follows:
  62
  63 ``` Cpp
  64     class RequestHandler {
  65       ...
  66       Synchronized<RequestQueue> requestQueue_;
  67       Synchronized<std::map<std::string, Endpoint>> requestEndpoints_;
  68       Synchronized<HandlerState> workState_;
  69       ...
  70     };
  71
  72     void RequestHandler::processRequest(const Request& request) {
  73       stop_watch<> watch;
  74       checkRequestValidity(request);
  75       requestQueue_.wlock()->push_back(request);
  76       stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
  77       LOG(INFO) << "enqueued request ID " << request.getID();
  78     }
  79 ```
  80
  81 The rewrite does at maximum efficiency what needs to be done:
  82 acquires the lock associated with the `RequestQueue` object, writes to
  83 the queue, and releases the lock immediately thereafter.
  84
  85 On the face of it, that's not much to write home about, and not
  86 an obvious improvement over the previous state of affairs. But
  87 the features at work invisible in the code above are as important
  88 as those that are visible:
  89
  90 * Unlike before, the data and the mutex protecting it are
  91   inextricably encapsulated together.
  92 * If you tried to use `requestQueue_` without acquiring the lock you
  93   wouldn't be able to; it is virtually impossible to access the queue
  94   without acquiring the correct lock.
  95 * The lock is released immediately after the insert operation is
  96   performed, and is not held for operations that do not need it.
  97
  98 If you need to perform several operations while holding the lock,
  99 `Synchronized` provides several options for doing this.
 100
 101 The `wlock()` method (or `lock()` if you have a non-shared mutex type)
 102 returns a `LockedPtr` object that can be stored in a variable.  The lock
 103 will be held for as long as this object exists, similar to a
 104 `std::unique_lock`.  This object can be used as if it were a pointer to
 105 the underlying locked object:
 106
 107 ``` Cpp
 108     {
 109       auto lockedQueue = requestQueue_.wlock();
 110       lockedQueue->push_back(request1);
 111       lockedQueue->push_back(request2);
 112     }
 113 ```
 114
 115 The `rlock()` function is similar to `wlock()`, but acquires a shared lock
 116 rather than an exclusive lock.
 117
 118 We recommend explicitly opening a new nested scope whenever you store a
 119 `LockedPtr` object, to help visibly delineate the critical section, and
 120 to ensure that the `LockedPtr` is destroyed as soon as it is no longer
 121 needed.
 122
 123 Alternatively, `Synchronized` also provides mechanisms to run a function while
 124 holding the lock.  This makes it possible to use lambdas to define brief
 125 critical sections:
 126
 127 ``` Cpp
 128     void RequestHandler::processRequest(const Request& request) {
 129       stop_watch<> watch;
 130       checkRequestValidity(request);
 131       requestQueue_.withWLock([](auto& queue) {
 132         // withWLock() automatically holds the lock for the
 133         // duration of this lambda function
 134         queue.push_back(request);
 135       });
 136       stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
 137       LOG(INFO) << "enqueued request ID " << request.getID();
 138     }
 139 ```
 140
 141 One advantage of the `withWLock()` approach is that it forces a new
 142 scope to be used for the critical section, making the critical section
 143 more obvious in the code, and helping to encourage code that releases
 144 the lock as soon as possible.
 145
 146 ### Template class `Synchronized<T>`
 147
 148 #### Template Parameters
 149
 150 `Synchronized` is a template with two parameters, the data type and a
 151 mutex type: `Synchronized<T, Mutex>`.
 152
 153 If not specified, the mutex type defaults to `std::mutex`.  However, any
 154 mutex type supported by `folly::LockTraits` can be used instead.
 155 `folly::LockTraits` can be specialized to support other custom mutex
 156 types that it does not know about out of the box.  See
 157 `folly/LockTraitsBoost.h` for an example of how to support additional mutex
 158 types.
 159
 160 `Synchronized` provides slightly different APIs when instantiated with a
 161 shared mutex type than with a plain exclusive mutex type.  When used with
 162 a shared mutex type, it has separate `wlock()` and `rlock()` methods,
 163 rather than a single `lock()` method.  Similarly, it has `withWLock()`
 164 and `withRLock()` rather than `withLock()`.  When using a shared mutex
 165 type, these APIs ensure that callers make an explicit choice to acquire
 166 the a shared or an exclusive lock, and that callers do not
 167 unintentionally lock the mutex in the incorrect mode.  The `rlock()`
 168 APIs only provide `const` access to the underlying data type, ensuring
 169 that it cannot be modified when only holding a shared lock.
 170
 171 #### Constructors
 172
 173 The default constructor default-initializes the data and its
 174 associated mutex.
 175
 176
 177 The copy constructor locks the source for reading and copies its
 178 data into the target. (The target is not locked as an object
 179 under construction is only accessed by one thread.)
 180
 181 Finally, `Synchronized<T>` defines an explicit constructor that
 182 takes an object of type `T` and copies it. For example:
 183
 184 ``` Cpp
 185     // Default constructed
 186     Synchronized<map<string, int>> syncMap1;
 187
 188     // Copy constructed
 189     Synchronized<map<string, int>> syncMap2(syncMap1);
 190
 191     // Initializing from an existing map
 192     map<string, int> init;
 193     init["world"] = 42;
 194     Synchronized<map<string, int>> syncMap3(init);
 195     EXPECT_EQ(syncMap3->size(), 1);
 196 ```
 197
 198 #### Assignment, swap, and copying
 199
 200 The canonical assignment operator locks both objects involved and
 201 then copies the underlying data objects. The mutexes are not
 202 copied. The locks are acquired in increasing address order, so
 203 deadlock is avoided. For example, there is no problem if one
 204 thread assigns `a = b` and the other assigns `b = a` (other than
 205 that design probably deserving a Razzie award). Similarly, the
 206 `swap` method takes a reference to another `Synchronized<T>`
 207 object and swaps the data. Again, locks are acquired in a well-
 208 defined order. The mutexes are not swapped.
 209
 210 An additional assignment operator accepts a `const T&` on the
 211 right-hand side. The operator copies the datum inside a
 212 critical section.
 213
 214 In addition to assignment operators, `Synchronized<T>` has move
 215 assignment operators.
 216
 217 An additional `swap` method accepts a `T&` and swaps the data
 218 inside a critical section. This is by far the preferred method of
 219 changing the guarded datum wholesale because it keeps the lock
 220 only for a short time, thus lowering the pressure on the mutex.
 221
 222 To get a copy of the guarded data, there are two methods
 223 available: `void copy(T*)` and `T copy()`. The first copies data
 224 to a provided target and the second returns a copy by value. Both
 225 operations are done under a read lock. Example:
 226
 227 ``` Cpp
 228     Synchronized<vector<string>> syncVec1, syncVec2;
 229     vector<string> vec;
 230
 231     // Assign
 232     syncVec1 = syncVec2;
 233     // Assign straight from vector
 234     syncVec1 = vec;
 235
 236     // Swap
 237     syncVec1.swap(syncVec2);
 238     // Swap with vector
 239     syncVec1.swap(vec);
 240
 241     // Copy to given target
 242     syncVec1.copy(&vec);
 243     // Get a copy by value
 244     auto copy = syncVec1.copy();
 245 ```
 246
 247 #### `lock()`
 248
 249 If the mutex type used with `Synchronized` is a simple exclusive mutex
 250 type (as opposed to a shared mutex), `Synchronized<T>` provides a
 251 `lock()` method that returns a `LockedPtr<T>` to access the data while
 252 holding the lock.
 253
 254 The `LockedPtr` object returned by `lock()` holds the lock for as long
 255 as it exists.  Whenever possible, prefer declaring a separate inner
 256 scope for storing this variable, to make sure the `LockedPtr` is
 257 destroyed as soon as the lock is no longer needed:
 258
 259 ``` Cpp
 260     void fun(Synchronized<vector<string>, std::mutex>& vec) {
 261       {
 262         auto locked = vec.lock();
 263         locked->push_back("hello");
 264         locked->push_back("world");
 265       }
 266       LOG(INFO) << "successfully added greeting";
 267     }
 268 ```
 269
 270 #### `wlock()` and `rlock()`
 271
 272 If the mutex type used with `Synchronized` is a shared mutex type,
 273 `Synchronized<T>` provides a `wlock()` method that acquires an exclusive
 274 lock, and an `rlock()` method that acquires a shared lock.
 275
 276 The `LockedPtr` returned by `rlock()` only provides const access to the
 277 internal data, to ensure that it cannot be modified while only holding a
 278 shared lock.
 279
 280 ``` Cpp
 281     int computeSum(const Synchronized<vector<int>>& vec) {
 282       int sum = 0;
 283       auto locked = vec.rlock();
 284       for (int n : *locked) {
 285         sum += n;
 286       }
 287       return sum;
 288     }
 289
 290     void doubleValues(Synchronized<vector<int>>& vec) {
 291       auto locked = vec.wlock();
 292       for (int& n : *locked) {
 293         n *= 2;
 294       }
 295     }
 296 ```
 297
 298 This example brings us to a cautionary discussion.  The `LockedPtr`
 299 object returned by `lock()`, `wlock()`, or `rlock()` only holds the lock
 300 as long as it exists.  This object makes it difficult to access the data
 301 without holding the lock, but not impossible.  In particular you should
 302 never store a raw pointer or reference to the internal data for longer
 303 than the lifetime of the `LockedPtr` object.
 304
 305 For instance, if we had written the following code in the examples
 306 above, this would have continued accessing the vector after the lock had
 307 been released:
 308
 309 ``` Cpp
 310     // No. NO. NO!
 311     for (int& n : *vec.wlock()) {
 312       n *= 2;
 313     }
 314 ```
 315
 316 The `vec.wlock()` return value is destroyed in this case as soon as the
 317 internal range iterators are created.  The range iterators point into
 318 the vector's data, but lock is released immediately, before executing
 319 the loop body.
 320
 321 Needless to say, this is a crime punishable by long debugging nights.
 322
 323 Range-based for loops are slightly subtle about the lifetime of objects
 324 used in the initializer statement.  Most other problematic use cases are
 325 a bit easier to spot than this, since the lifetime of the `LockedPtr` is
 326 more explicitly visible.
 327
 328 #### `withLock()`
 329
 330 As an alternative to the `lock()` API, `Synchronized` also provides a
 331 `withLock()` method that executes a function or lambda expression while
 332 holding the lock.  The function receives a reference to the data as its
 333 only argument.
 334
 335 This has a few benefits compared to `lock()`:
 336
 337 * The lambda expression requires its own nested scope, making critical
 338   sections more visible in the code.  Callers are recommended to define
 339   a new scope when using `lock()` if they choose to, but this is not
 340   required.  `withLock()` ensures that a new scope must always be
 341   defined.
 342 * Because a new scope is required, `withLock()` also helps encourage
 343   users to release the lock as soon as possible.  Because the critical
 344   section scope is easily visible in the code, it is harder to
 345   accidentally put extraneous code inside the critical section without
 346   realizing it.
 347 * The separate lambda scope makes it more difficult to store raw
 348   pointers or references to the protected data and continue using those
 349   pointers outside the critical section.
 350
 351 For example, `withLock()` makes the range-based for loop mistake from
 352 above much harder to accidentally run into:
 353
 354 ``` Cpp
 355     vec.withLock([](auto& locked) {
 356       for (int& n : locked) {
 357         n *= 2;
 358       }
 359     });
 360 ```
 361
 362 This code does not have the same problem as the counter-example with
 363 `wlock()` above, since the lock is held for the duration of the loop.
 364
 365 When using `Synchronized` with a shared mutex type, it provides separate
 366 `withWLock()` and `withRLock()` methods instead of `withLock()`.
 367
 368 #### Timed Locking
 369
 370 When `Synchronized` is used with a mutex type that supports timed lock
 371 acquisition, `lock()`, `wlock()`, and `rlock()` can all take an optional
 372 `std::chrono::duration` argument.  This argument specifies a timeout to
 373 use for acquiring the lock.  If the lock is not acquired before the
 374 timeout expires, a null `LockedPtr` object will be returned.  Callers
 375 must explicitly check the return value before using it:
 376
 377 ``` Cpp
 378     void fun(Synchronized<vector<string>>& vec) {
 379       {
 380         auto locked = vec.lock(10ms);
 381         if (!locked) {
 382           throw std::runtime_error("failed to acquire lock");
 383         }
 384         locked->push_back("hello");
 385         locked->push_back("world");
 386       }
 387       LOG(INFO) << "successfully added greeting";
 388     }
 389 ```
 390
 391 #### `unlock()` and `scopedUnlock()`
 392
 393 `Synchronized` is a good mechanism for enforcing scoped
 394 synchronization, but it has the inherent limitation that it
 395 requires the critical section to be, well, scoped. Sometimes the
 396 code structure requires a fleeting "escape" from the iron fist of
 397 synchronization, while still inside the critical section scope.
 398
 399 One common pattern is releasing the lock early on error code paths,
 400 prior to logging an error message.  The `LockedPtr` class provides an
 401 `unlock()` method that makes this possible:
 402
 403 ``` Cpp
 404     Synchronized<map<int, string>> dic;
 405     ...
 406     {
 407       auto locked = dic.rlock();
 408       auto iter = locked->find(0);
 409       if (iter == locked.end()) {
 410         locked.unlock();  // don't hold the lock while logging
 411         LOG(ERROR) << "key 0 not found";
 412         return false;
 413       }
 414       processValue(*iter);
 415     }
 416     LOG(INFO) << "succeeded";
 417 ```
 418
 419 For more complex nested control flow scenarios, `scopedUnlock()` returns
 420 an object that will release the lock for as long as it exists, and will
 421 reacquire the lock when it goes out of scope.
 422
 423 ``` Cpp
 424
 425     Synchronized<map<int, string>> dic;
 426     ...
 427     {
 428       auto locked = dic.wlock();
 429       auto iter = locked->find(0);
 430       if (iter == locked->end()) {
 431         {
 432           auto unlocker = locked.scopedUnlock();
 433           LOG(INFO) << "Key 0 not found, inserting it."
 434         }
 435         locked->emplace(0, "zero");
 436       } else {
 437         *iter = "zero";
 438       }
 439     }
 440 ```
 441
 442 Clearly `scopedUnlock()` comes with specific caveats and
 443 liabilities. You must assume that during the `scopedUnlock()`
 444 section, other threads might have changed the protected structure
 445 in arbitrary ways. In the example above, you cannot use the
 446 iterator `iter` and you cannot assume that the key `0` is not in the
 447 map; another thread might have inserted it while you were
 448 bragging on `LOG(INFO)`.
 449
 450 Whenever a `LockedPtr` object has been unlocked, whether with `unlock()`
 451 or `scopedUnlock()`, it will behave as if it is null.  `isNull()` will
 452 return true.  Dereferencing an unlocked `LockedPtr` is not allowed and
 453 will result in undefined behavior.
 454
 455 #### `Synchronized` and `std::condition_variable`
 456
 457 When used with a `std::mutex`, `Synchronized` supports using a
 458 `std::condition_variable` with its internal mutex.  This allows a
 459 `condition_variable` to be used to wait for a particular change to occur
 460 in the internal data.
 461
 462 The `LockedPtr` returned by `Synchronized<T, std::mutex>::lock()` has a
 463 `getUniqueLock()` method that returns a reference to a
 464 `std::unique_lock<std::mutex>`, which can be given to the
 465 `std::condition_variable`:
 466
 467 ``` Cpp
 468     Synchronized<vector<string>, std::mutex> vec;
 469     std::condition_variable emptySignal;
 470
 471     // Assuming some other thread will put data on vec and signal
 472     // emptySignal, we can then wait on it as follows:
 473     auto locked = vec.lock();
 474     emptySignal.wait_for(locked.getUniqueLock(),
 475                          [&] { return !locked->empty(); });
 476 ```
 477
 478 ### `acquireLocked()`
 479
 480 Sometimes locking just one object won't be able to cut the mustard. Consider a
 481 function that needs to lock two `Synchronized` objects at the
 482 same time - for example, to copy some data from one to the other.
 483 At first sight, it looks like sequential `wlock()` calls will work just
 484 fine:
 485
 486 ``` Cpp
 487     void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
 488       auto lockedA = a.wlock();
 489       auto lockedB = b.wlock();
 490       ... use lockedA and lockedB ...
 491     }
 492 ```
 493
 494 This code compiles and may even run most of the time, but embeds
 495 a deadly peril: if one threads call `fun(x, y)` and another
 496 thread calls `fun(y, x)`, then the two threads are liable to
 497 deadlocking as each thread will be waiting for a lock the other
 498 is holding. This issue is a classic that applies regardless of
 499 the fact the objects involved have the same type.
 500
 501 This classic problem has a classic solution: all threads must
 502 acquire locks in the same order. The actual order is not
 503 important, just the fact that the order is the same in all
 504 threads. Many libraries simply acquire mutexes in increasing
 505 order of their address, which is what we'll do, too. The
 506 `acquireLocked()` function takes care of all details of proper
 507 locking of two objects and offering their innards.  It returns a
 508 `std::tuple` of `LockedPtr`s:
 509
 510 ``` Cpp
 511     void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
 512       auto ret = folly::acquireLocked(a, b);
 513       auto& lockedA = std::get<0>(ret);
 514       auto& lockedB = std::get<1>(ret);
 515       ... use lockedA and lockedB ...
 516     }
 517 ```
 518
 519 Note that C++ 17 introduces
 520 (structured binding syntax)[(http://wg21.link/P0144r2)]
 521 which will make the returned tuple more convenient to use:
 522
 523 ``` Cpp
 524     void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
 525       auto [lockedA, lockedB] = folly::acquireLocked(a, b);
 526       ... use lockedA and lockedB ...
 527     }
 528 ```
 529
 530 An `acquireLockedPair()` function is also available, which returns a
 531 `std::pair` instead of a `std::tuple`.  This is more convenient to use
 532 in many situations, until compiler support for structured bindings is
 533 more widely available.
 534
 535 ### Synchronizing several data items with one mutex
 536
 537 The library is geared at protecting one object of a given type
 538 with a mutex. However, sometimes we'd like to protect two or more
 539 members with the same mutex. Consider for example a bidirectional
 540 map, i.e. a map that holds an `int` to `string` mapping and also
 541 the converse `string` to `int` mapping. The two maps would need
 542 to be manipulated simultaneously. There are at least two designs
 543 that come to mind.
 544
 545 #### Using a nested `struct`
 546
 547 You can easily pack the needed data items in a little struct.
 548 For example:
 549
 550 ``` Cpp
 551     class Server {
 552       struct BiMap {
 553         map<int, string> direct;
 554         map<string, int> inverse;
 555       };
 556       Synchronized<BiMap> bimap_;
 557       ...
 558     };
 559     ...
 560     bimap_.withLock([](auto& locked) {
 561       locked.direct[0] = "zero";
 562       locked.inverse["zero"] = 0;
 563     });
 564 ```
 565
 566 With this code in tow you get to use `bimap_` just like any other
 567 `Synchronized` object, without much effort.
 568
 569 #### Using `std::tuple`
 570
 571 If you won't stop short of using a spaceship-era approach,
 572 `std::tuple` is there for you. The example above could be
 573 rewritten for the same functionality like this:
 574
 575 ``` Cpp
 576     class Server {
 577       Synchronized<tuple<map<int, string>, map<string, int>>> bimap_;
 578       ...
 579     };
 580     ...
 581     bimap_.withLock([](auto& locked) {
 582       get<0>(locked)[0] = "zero";
 583       get<1>(locked)["zero"] = 0;
 584     });
 585 ```
 586
 587 The code uses `std::get` with compile-time integers to access the
 588 fields in the tuple. The relative advantages and disadvantages of
 589 using a local struct vs. `std::tuple` are quite obvious - in the
 590 first case you need to invest in the definition, in the second
 591 case you need to put up with slightly more verbose and less clear
 592 access syntax.
 593
 594 ### Summary
 595
 596 `Synchronized` and its supporting tools offer you a simple,
 597 robust paradigm for mutual exclusion-based concurrency. Instead
 598 of manually pairing data with the mutexes that protect it and
 599 relying on convention to use them appropriately, you can benefit
 600 of encapsulation and typechecking to offload a large part of that
 601 task and to provide good guarantees.