Update documentation for Synchronized

[folly.git] / folly / docs / Synchronized.md
diff --git a/folly/docs/Synchronized.md b/folly/docs/Synchronized.md

index e36f1dbbdfbdaf7f472cd2a8c0b0ec78e88423f1..c29955dc08828811724eebb59f289ffa1bb90618 100644 (file)
--- a/folly/docs/Synchronized.md
+++ b/folly/docs/Synchronized.md
@@ -8,26 +8,23 @@ right and difficult to get wrong.
  
  ### Motivation
  
-Many of our multithreaded Thrift services (not to mention general
-concurrent C++ code) use shared data structures associated with
-locks. This follows the time-honored adage of mutex-based
-concurrency control "associate mutexes with data, not code".
-Examples are abundant and easy to find. For example:
+Many of our multithreaded C++ programs use shared data structures
+associated with locks. This follows the time-honored adage of
+mutex-based concurrency control "associate mutexes with data, not code".
+Consider the following example:
  
  ``` Cpp
  
-    class AdPublisherHandler : public AdPopulatorIf,
-                               public fb303::FacebookBase,
-                               public ZkBaseApplication {
+    class RequestHandler {
        ...
-      OnDemandUpdateIdMap adsToBeUpdated_;
-      ReadWriteMutex adsToBeUpdatedLock_;
+      RequestQueue requestQueue_;
+      SharedMutex requestQueueMutex_;
  
-      OnDemandUpdateIdMap limitsToBeUpdated_;
-      ReadWriteMutex limitsToBeUpdatedLock_;
+      std::map<std::string, Endpoint> requestEndpoints_;
+      SharedMutex requestEndpointsMutex_;
  
-      OnDemandUpdateIdMap campaignsToBeUpdated_;
-      ReadWriteMutex campaignsToBeUpdatedLock_;
+      HandlerState workState_;
+      SharedMutex workStateMutex_;
        ...
      };
  ```
@@ -37,39 +34,26 @@ data, it acquires the mutex for reading or for reading and
  writing. For example:
  
  ``` Cpp
-    void AdPublisherHandler::requestUpdateAdId(const int64_t adId,
-                                               const int32_t dbId) {
-      checkDbHandlingStatus(dbId);
-      RWGuard g(adsToBeUpdatedLock_, RW_WRITE);
-      adsToBeUpdated_[dbId][adId] = 1;
-      adPublisherMonitor_->addStatValue("request_adId_update", 1, dbId);
-      LOG(INFO) << "received request to update ad id " << adId;
+    void RequestHandler::processRequest(const Request& request) {
+      stop_watch<> watch;
+      checkRequestValidity(request);
+      SharedMutex::WriteHolder lock(requestQueueMutex_);
+      requestQueue_.push_back(request);
+      stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
+      LOG(INFO) << "enqueued request ID " << request.getID();
      }
  ```
  
-The pattern is an absolute classic and present everywhere.
-However, it is inefficient, makes incorrect code easy to
-write, is prone to deadlocking, and is bulkier than it could
-otherwise be. To expand:
-
-* In the code above, for example, the critical section is only
-  the line right after `RWGuard`'s definition; it is frivolous
-  that everything else (including a splurging `LOG(INFO)`) keeps
-  the lock acquired for no good reason. This is because the
-  locked regions are not visible; the guard's construction
-  introduces a critical section as long as the remainder of the
-  current scope.
-* The correctness of the technique is entirely predicated on
-  convention. There is no ostensible error for code that:
-
-    * manipulates a piece of data without acquiring its lock first
-    * acquires a different lock instead of the intended one
-    * acquires a lock in read mode but modifies the guarded data structure
-    * acquires a lock in read-write mode although it only has `const`
-      access to the guarded data
-    * acquires one lock when another lock is already held, which may
-      lead to deadlocks if another thread acquires locks in the
-      inverse order
+However, the correctness of the technique is entirely predicated on
+convention.  Developers manipulating these data members must take care
+to explicitly acquire the correct lock for the data they wish to access.
+There is no ostensible error for code that:
+
+* manipulates a piece of data without acquiring its lock first
+* acquires a different lock instead of the intended one
+* acquires a lock in read mode but modifies the guarded data structure
+* acquires a lock in read-write mode although it only has `const` access
+  to the guarded data
  
  ### Introduction to `folly/Synchronized.h`
  
@@ -77,32 +61,26 @@ The same code sample could be rewritten with `Synchronized`
  as follows:
  
  ``` Cpp
-    class AdPublisherHandler : public AdPopulatorIf,
-                               public fb303::FacebookBase,
-                               public ZkBaseApplication {
+    class RequestHandler {
        ...
-      Synchronized<OnDemandUpdateIdMap>
-        adsToBeUpdated_,
-        limitsToBeUpdated_,
-        campaignsToBeUpdated_;
+      Synchronized<RequestQueue> requestQueue_;
+      Synchronized<std::map<std::string, Endpoint>> requestEndpoints_;
+      Synchronized<HandlerState> workState_;
        ...
      };
  
-    void AdPublisherHandler::requestUpdateAdId(const int64_t adId,
-                                               const int32_t dbId) {
-      checkDbHandlingStatus(dbId);
-      SYNCHRONIZED (adsToBeUpdated_) {
-        adsToBeUpdated_[dbId][adId] = 1;
-      }
-      adPublisherMonitor_->addStatValue("request_adId_update", 1, dbId);
-      LOG(INFO) << "received request to update ad id " << adId;
+    void RequestHandler::processRequest(const Request& request) {
+      stop_watch<> watch;
+      checkRequestValidity(request);
+      requestQueue_.wlock()->push_back(request);
+      stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
+      LOG(INFO) << "enqueued request ID " << request.getID();
      }
  ```
  
  The rewrite does at maximum efficiency what needs to be done:
-acquires the lock associated with the `OnDemandUpdateIdMap`
-object, writes to the map, and releases the lock immediately
-thereafter.
+acquires the lock associated with the `RequestQueue` object, writes to
+the queue, and releases the lock immediately thereafter.
  
  On the face of it, that's not much to write home about, and not
  an obvious improvement over the previous state of affairs. But
@@ -111,36 +89,86 @@ as those that are visible:
  
  * Unlike before, the data and the mutex protecting it are
    inextricably encapsulated together.
-* Critical sections are readily visible and emphasize code that
-  needs to do minimal work and be subject to extra scrutiny.
-* Dangerous nested `SYNCHRONIZED` statements are more visible
-  than sequenced declarations of guards at the same level. (This
-  is not foolproof because a method call issued inside a
-  `SYNCHRONIZED` scope may open its own `SYNCHRONIZED` block.) A
-  construct `SYNCHRONIZED_DUAL`, discussed later in this
-  document, allows locking two objects quasi-simultaneously in
-  the same order in all threads, thus avoiding deadlocks.
-* If you tried to use `adsToBeUpdated_` outside the
-  `SYNCHRONIZED` scope, you wouldn't be able to; it is virtually
-  impossible to tease the map object without acquiring the
-  correct lock. However, inside the `SYNCHRONIZED` scope, the
-  *same* name serves as the actual underlying object of type
-  `OnDemandUpdateIdMap` (which is a map of maps).
-* Outside `SYNCHRONIZED`, if you just want to call one
-  method, you can do so by using `adsToBeUpdated_` as a
-  pointer like this:
-
-    `adsToBeUpdated_->clear();`
-
-This acquires the mutex, calls `clear()` against the underlying
-map object, and releases the mutex immediately thereafter.
-
-`Synchronized` offers several other methods, which are described
-in detail below.
+* If you tried to use `requestQueue_` without acquiring the lock you
+  wouldn't be able to; it is virtually impossible to access the queue
+  without acquiring the correct lock.
+* The lock is released immediately after the insert operation is
+  performed, and is not held for operations that do not need it.
+
+If you need to perform several operations while holding the lock,
+`Synchronized` provides several options for doing this.
+
+The `wlock()` method (or `lock()` if you have a non-shared mutex type)
+returns a `LockedPtr` object that can be stored in a variable.  The lock
+will be held for as long as this object exists, similar to a
+`std::unique_lock`.  This object can be used as if it were a pointer to
+the underlying locked object:
+
+``` Cpp
+    {
+      auto lockedQueue = requestQueue_.wlock();
+      lockedQueue->push_back(request1);
+      lockedQueue->push_back(request2);
+    }
+```
+
+The `rlock()` function is similar to `wlock()`, but acquires a shared lock
+rather than an exclusive lock.
+
+We recommend explicitly opening a new nested scope whenever you store a
+`LockedPtr` object, to help visibly delineate the critical section, and
+to ensure that the `LockedPtr` is destroyed as soon as it is no longer
+needed.
+
+Alternatively, `Synchronized` also provides mechanisms to run a function while
+holding the lock.  This makes it possible to use lambdas to define brief
+critical sections:
+
+``` Cpp
+    void RequestHandler::processRequest(const Request& request) {
+      stop_watch<> watch;
+      checkRequestValidity(request);
+      requestQueue_.withWLock([](auto& queue) {
+        // withWLock() automatically holds the lock for the
+        // duration of this lambda function
+        queue.push_back(request);
+      });
+      stats_->addStatValue("requestEnqueueLatency", watch.elapsed());
+      LOG(INFO) << "enqueued request ID " << request.getID();
+    }
+```
+
+One advantage of the `withWLock()` approach is that it forces a new
+scope to be used for the critical section, making the critical section
+more obvious in the code, and helping to encourage code that releases
+the lock as soon as possible.
  
  ### Template class `Synchronized<T>`
  
-##### Constructors
+#### Template Parameters
+
+`Synchronized` is a template with two parameters, the data type and a
+mutex type: `Synchronized<T, Mutex>`.
+
+If not specified, the mutex type defaults to `std::mutex`.  However, any
+mutex type supported by `folly::LockTraits` can be used instead.
+`folly::LockTraits` can be specialized to support other custom mutex
+types that it does not know about out of the box.  See
+`folly/LockTraitsBoost.h` for an example of how to support additional mutex
+types.
+
+`Synchronized` provides slightly different APIs when instantiated with a
+shared mutex type than with a plain exclusive mutex type.  When used with
+a shared mutex type, it has separate `wlock()` and `rlock()` methods,
+rather than a single `lock()` method.  Similarly, it has `withWLock()`
+and `withRLock()` rather than `withLock()`.  When using a shared mutex
+type, these APIs ensure that callers make an explicit choice to acquire
+the a shared or an exclusive lock, and that callers do not
+unintentionally lock the mutex in the incorrect mode.  The `rlock()`
+APIs only provide `const` access to the underlying data type, ensuring
+that it cannot be modified when only holding a shared lock.
+
+#### Constructors
  
  The default constructor default-initializes the data and its
  associated mutex.
@@ -155,15 +183,15 @@ takes an object of type `T` and copies it. For example:
  
  ``` Cpp
      // Default constructed
-    Synchronized< map<string, int> > syncMap1;
+    Synchronized<map<string, int>> syncMap1;
  
      // Copy constructed
-    Synchronized< map<string, int> > syncMap2(syncMap1);
+    Synchronized<map<string, int>> syncMap2(syncMap1);
  
      // Initializing from an existing map
      map<string, int> init;
      init["world"] = 42;
-    Synchronized< map<string, int> > syncMap3(init);
+    Synchronized<map<string, int>> syncMap3(init);
      EXPECT_EQ(syncMap3->size(), 1);
  ```
  
@@ -197,8 +225,8 @@ to a provided target and the second returns a copy by value. Both
  operations are done under a read lock. Example:
  
  ``` Cpp
-    Synchronized< fbvector<fbstring> > syncVec1, syncVec2;
-    fbvector<fbstring> vec;
+    Synchronized<vector<string>> syncVec1, syncVec2;
+    vector<string> vec;
  
      // Assign
      syncVec1 = syncVec2;
@@ -216,299 +244,250 @@ operations are done under a read lock. Example:
      auto copy = syncVec1.copy();
  ```
  
-#### `LockedPtr operator->()` and `ConstLockedPtr operator->() const`
+#### `lock()`
  
-We've already seen `operator->` at work. Essentially calling a
-method `obj->foo(x, y, z)` calls the method `foo(x, y, z)` inside
-a critical section as long-lived as the call itself. For example:
+If the mutex type used with `Synchronized` is a simple exclusive mutex
+type (as opposed to a shared mutex), `Synchronized<T>` provides a
+`lock()` method that returns a `LockedPtr<T>` to access the data while
+holding the lock.
  
-``` Cpp
-    void fun(Synchronized< fbvector<fbstring> > & vec) {
-      vec->push_back("hello");
-      vec->push_back("world");
-    }
-```
-
-The code above appends two elements to `vec`, but the elements
-won't appear necessarily one after another. This is because in
-between the two calls the mutex is released, and another thread
-may modify the vector. At the cost of anticipating a little, if
-you want to make sure you insert "world" right after "hello", you
-should do this:
+The `LockedPtr` object returned by `lock()` holds the lock for as long
+as it exists.  Whenever possible, prefer declaring a separate inner
+scope for storing this variable, to make sure the `LockedPtr` is
+destroyed as soon as the lock is no longer needed:
  
  ``` Cpp
-    void fun(Synchronized< fbvector<fbstring> > & vec) {
-      SYNCHRONIZED (vec) {
-        vec.push_back("hello");
-        vec.push_back("world");
+    void fun(Synchronized<vector<string>, std::mutex>& vec) {
+      {
+        auto locked = vec.lock();
+        locked->push_back("hello");
+        locked->push_back("world");
        }
+      LOG(INFO) << "successfully added greeting";
      }
  ```
  
-This brings us to a cautionary discussion. The way `operator->`
-works is rather ingenious with creating an unnamed temporary that
-enforces locking and all, but it's not a panacea. Between two
-uses of `operator->`, other threads may change the synchronized
-object in arbitrary ways, so you shouldn't assume any sort of
-sequential consistency. For example, the innocent-looking code
-below may be patently wrong.
-
-If another thread clears the vector in between the call to
-`empty` and the call to `pop_back`, this code ends up attempting
-to extract an element from an empty vector. Needless to say,
-iteration a la:
-
-``` Cpp
-    // No. NO. NO!
-    FOR_EACH_RANGE (i, vec->begin(), vec->end()) {
-      ...
-    }
-```
-
-is a crime punishable by long debugging nights.
+#### `wlock()` and `rlock()`
  
-If the `Synchronized<T>` object involved is `const`-qualified,
-then you'll only be able to call `const` methods through `operator->`. 
-So, for example, `vec->push_back("xyz")` won't work if `vec`
-were `const`-qualified. The locking mechanism capitalizes on the
-assumption that `const` methods don't modify their underlying
-data and only acquires a read lock (as opposed to a read and
-write lock), which is cheaper but works only if the immutability
-assumption holds. Note that this is strictly not the case because
-`const`-ness can always be undone via `mutable` members, casts,
-and surreptitious access to shared data. Our code is seldom
-guilty of such, and we also assume the STL uses no shenanigans.
-But be warned.
+If the mutex type used with `Synchronized` is a shared mutex type,
+`Synchronized<T>` provides a `wlock()` method that acquires an exclusive
+lock, and an `rlock()` method that acquires a shared lock.
  
-#### `asConst()`
-
-Consider:
+The `LockedPtr` returned by `rlock()` only provides const access to the
+internal data, to ensure that it cannot be modified while only holding a
+shared lock.
  
  ``` Cpp
-    void fun(Synchronized<fbvector<fbstring>> & vec) {
-      if (vec->size() > 1000000) {
-        LOG(WARNING) << "The blinkenlights are overloaded.";
+    int computeSum(const Synchronized<vector<int>>& vec) {
+      int sum = 0;
+      auto locked = vec.rlock();
+      for (int n : *locked) {
+        sum += n;
        }
-      vec->push_back("another blinkenlight");
+      return sum;
      }
-```
  
-This code is correct (at least according to a trivial intent),
-but less efficient than it could otherwise be. This is because
-the call `vec->size()` acquires a full read-write lock, but only
-needs a read lock. We need to help the type system here by
-telling it "even though `vec` is a mutable object, consider it a
-constant for this call". This should be easy enough because
-conversion to const is trivial - just issue `const_cast<const
-Synchronized<fbvector<fbstring>>&>(vec)`. Ouch. To make that
-operation simpler - a lot simpler - `Synchronized<T>` defines the
-method `asConst()`, which is a glorious one-liner. With `asConst`
-in tow, it's very easy to achieve what we wanted:
-
-``` Cpp
-    void fun(Synchronized<fbvector<fbstring>> & vec) {
-      if (vec.asConst()->size() > 1000000) {
-        LOG(WARNING) << "The blinkenlights are overloaded.";
+    void doubleValues(Synchronized<vector<int>>& vec) {
+      auto locked = vec.wlock();
+      for (int& n : *locked) {
+        n *= 2;
        }
-      vec->push_back("another blinkenlight");
      }
  ```
  
-QED (Quite Easy Done). This concludes the documentation for
-`Synchronized<T>`.
+This example brings us to a cautionary discussion.  The `LockedPtr`
+object returned by `lock()`, `wlock()`, or `rlock()` only holds the lock
+as long as it exists.  This object makes it difficult to access the data
+without holding the lock, but not impossible.  In particular you should
+never store a raw pointer or reference to the internal data for longer
+than the lifetime of the `LockedPtr` object.
  
-### `SYNCHRONIZED`
-
-The `SYNCHRONIZED` macro introduces a pseudo-statement that adds
-a whole new level of usability to `Synchronized<T>`. As
-discussed, `operator->` can only lock over the duration of a
-call, so it is insufficient for complex operations. With
-`SYNCHRONIZED` you get to lock the object in a scoped manner (not
-unlike Java's `synchronized` statement) and to directly access
-the object inside that scope.
-
-`SYNCHRONIZED` has two forms. We've seen the first one a couple
-of times already:
+For instance, if we had written the following code in the examples
+above, this would have continued accessing the vector after the lock had
+been released:
  
  ``` Cpp
-    void fun(Synchronized<fbvector<int>> & vec) {
-      SYNCHRONIZED (vec) {
-        vec.push_back(42);
-        CHECK(vec.back() == 42);
-        ...
-      }
+    // No. NO. NO!
+    for (int& n : *vec.wlock()) {
+      n *= 2;
      }
  ```
  
-The scope introduced by `SYNCHRONIZED` is a critical section
-guarded by `vec`'s mutex. In addition to doing that,
-`SYNCHRONIZED` also does an interesting sleight of hand: it binds
-the name `vec` inside the scope to the underlying `fbvector<int>`
-object - as opposed to `vec`'s normal type, which is
-`Synchronized<fbvector<int>>`. This fits very nice the "form
-follow function" - inside the critical section you have earned
-access to the actual data, and the name bindings reflect that as
-well. `SYNCHRONIZED(xyz)` essentially cracks `xyz` temporarily
-and gives you access to its innards.
-
-Now, what if `fun` wants to take a pointer to
-`Synchronized<fbvector<int>>` - let's call it `pvec`? Generally,
-what if we want to synchronize on an expression as opposed to a
-symbolic variable? In that case `SYNCHRONIZED(*pvec)` would not
-work because "`*pvec`" is not a name. That's where the second
-form of `SYNCHRONIZED` kicks in:
+The `vec.wlock()` return value is destroyed in this case as soon as the
+internal range iterators are created.  The range iterators point into
+the vector's data, but lock is released immediately, before executing
+the loop body.
  
-``` Cpp
-    void fun(Synchronized<fbvector<int>> * pvec) {
-      SYNCHRONIZED (vec, *pvec) {
-        vec.push_back(42);
-        CHECK(vec.back() == 42);
-        ...
-      }
-    }
-```
+Needless to say, this is a crime punishable by long debugging nights.
+
+Range-based for loops are slightly subtle about the lifetime of objects
+used in the initializer statement.  Most other problematic use cases are
+a bit easier to spot than this, since the lifetime of the `LockedPtr` is
+more explicitly visible.
  
-Ha, so now we pass two arguments to `SYNCHRONIZED`. The first
-argument is the name bound to the data, and the second argument
-is the expression referring to the `Synchronized<T>` object. So
-all cases are covered.
+#### `withLock()`
  
-### `SYNCHRONIZED_CONST`
+As an alternative to the `lock()` API, `Synchronized` also provides a
+`withLock()` method that executes a function or lambda expression while
+holding the lock.  The function receives a reference to the data as its
+only argument.
  
-Recall from the discussion about `asConst()` that we
-sometimes want to voluntarily restrict access to an otherwise
-mutable object. The `SYNCHRONIZED_CONST` pseudo-statement
-makes that intent easily realizable and visible to
-maintainers. For example:
+This has a few benefits compared to `lock()`:
+
+* The lambda expression requires its own nested scope, making critical
+  sections more visible in the code.  Callers are recommended to define
+  a new scope when using `lock()` if they choose to, but this is not
+  required.  `withLock()` ensures that a new scope must always be
+  defined.
+* Because a new scope is required, `withLock()` also helps encourage
+  users to release the lock as soon as possible.  Because the critical
+  section scope is easily visible in the code, it is harder to
+  accidentally put extraneous code inside the critical section without
+  realizing it.
+* The separate lambda scope makes it more difficult to store raw
+  pointers or references to the protected data and continue using those
+  pointers outside the critical section.
+
+For example, `withLock()` makes the range-based for loop mistake from
+above much harder to accidentally run into:
  
  ``` Cpp
-    void fun(Synchronized<fbvector<int>> & vec) {
-      fbvector<int> local;
-      SYNCHRONIZED_CONST (vec) {
-        CHECK(vec.size() > 42);
-        local = vec;
-      }
-      local.resize(42000);
-      SYNCHRONIZED (vec) {
-        local.swap(vec);
+    vec.withLock([](auto& locked) {
+      for (int& n : locked) {
+        n *= 2;
        }
-    }
+    });
  ```
  
-Inside a `SYNCHRONIZED_CONST(xyz)` scope, `xyz` is bound to a `const`-
-qualified datum. The corresponding lock is a read lock.
+This code does not have the same problem as the counter-example with
+`wlock()` above, since the lock is held for the duration of the loop.
  
-`SYNCHRONIZED_CONST` also has a two-arguments version, just like
-`SYNCHRONIZED`. In fact, `SYNCHRONIZED_CONST(a)` simply expands
-to `SYNCHRONIZED(a, a.asConst())` and `SYNCHRONIZED_CONST(a, b)`
-expands to `SYNCHRONIZED(a, (b).asConst())`. The type system and
-`SYNCHRONIZED` take care of the rest.
+When using `Synchronized` with a shared mutex type, it provides separate
+`withWLock()` and `withRLock()` methods instead of `withLock()`.
  
-### `TIMED_SYNCHRONIZED` and `TIMED_SYNCHRONIZED_CONST`
+#### Timed Locking
  
-These pseudo-statements allow you to acquire the mutex with a
-timeout. Example:
+When `Synchronized` is used with a mutex type that supports timed lock
+acquisition, `lock()`, `wlock()`, and `rlock()` can all take an optional
+`std::chrono::duration` argument.  This argument specifies a timeout to
+use for acquiring the lock.  If the lock is not acquired before the
+timeout expires, a null `LockedPtr` object will be returned.  Callers
+must explicitly check the return value before using it:
  
  ``` Cpp
-    void fun(Synchronized<fbvector<int>> & vec) {
-      TIMED_SYNCHRONIZED (10, vec) {
-        if (vec) {
-          vec->push_back(42);
-          CHECK(vec->back() == 42);
-        } else {
-            LOG(INFO) << "Dognabbit, I've been waiting over here for 10 milliseconds and couldn't get through!";
+    void fun(Synchronized<vector<string>>& vec) {
+      {
+        auto locked = vec.lock(10ms);
+        if (!locked) {
+          throw std::runtime_error("failed to acquire lock");
          }
+        locked->push_back("hello");
+        locked->push_back("world");
        }
+      LOG(INFO) << "successfully added greeting";
      }
  ```
  
-If the mutex acquisition was successful within a number of
-milliseconds dictated by its first argument, `TIMED_SYNCHRONIZED`
-binds its second argument to a pointer to the protected object.
-Otherwise, the pointer will be `NULL`. (Contrast that with
-`SYNCHRONIZED`), which always succeeds so it binds the protected
-object to a reference.) Inside the `TIMED_SYNCHRONIZED` statement
-you must, of course, make sure the pointer is not null to make
-sure the operation didn't time out.
-
-`TIMED_SYNCHRONIZED` takes two or three parameters. The first is
-always the timeout, and the remaining one or two are just like
-the parameters of `SYNCHRONIZED`.
+#### `unlock()` and `scopedUnlock()`
  
-Issuing `TIMED_SYNCHRONIZED` with a zero timeout is an
-opportunistic attempt to acquire the mutex.
-
-### `UNSYNCHRONIZED`
-
-`SYNCHRONIZED` is a good mechanism for enforcing scoped
+`Synchronized` is a good mechanism for enforcing scoped
  synchronization, but it has the inherent limitation that it
  requires the critical section to be, well, scoped. Sometimes the
  code structure requires a fleeting "escape" from the iron fist of
-synchronization. Clearly, simple cases are handled with sequenced
-`SYNCHRONIZED` scopes:
+synchronization, while still inside the critical section scope.
+
+One common pattern is releasing the lock early on error code paths,
+prior to logging an error message.  The `LockedPtr` class provides an
+`unlock()` method that makes this possible:
  
  ``` Cpp
      Synchronized<map<int, string>> dic;
      ...
-    SYNCHRONIZED (dic) {
-      if (dic.find(0) != dic.end()) {
-        return;
+    {
+      auto locked = dic.rlock();
+      auto iter = locked->find(0);
+      if (iter == locked.end()) {
+        locked.unlock();  // don't hold the lock while logging
+        LOG(ERROR) << "key 0 not found";
+        return false;
        }
+      processValue(*iter);
      }
-    LOG(INFO) << "Key 0 not found, inserting it."
-    SYNCHRONIZED (dic) {
-      dic[0] = "zero";
-    }
+    LOG(INFO) << "succeeded";
  ```
  
-For more complex, nested flow control, you may want to use the
-`UNSYNCHRONIZED` macro. It (only) works inside a `SYNCHRONIZED`
-pseudo-statement and temporarily unlocks the mutex:
+For more complex nested control flow scenarios, `scopedUnlock()` returns
+an object that will release the lock for as long as it exists, and will
+reacquire the lock when it goes out of scope.
  
  ``` Cpp
  
      Synchronized<map<int, string>> dic;
      ...
-    SYNCHRONIZED (dic) {
-      auto i = dic.find(0);
-      if (i != dic.end()) {
-        UNSYNCHRONIZED (dic) {
+    {
+      auto locked = dic.wlock();
+      auto iter = locked->find(0);
+      if (iter == locked->end()) {
+        {
+          auto unlocker = locked.scopedUnlock();
            LOG(INFO) << "Key 0 not found, inserting it."
          }
-        dic[0] = "zero";
+        locked->emplace(0, "zero");
        } else {
-        *i = "zero";
+        *iter = "zero";
        }
      }
-    LOG(INFO) << "Key 0 not found, inserting it."
-    SYNCHRONIZED (dic) {
-      dic[0] = "zero";
-    }
  ```
  
-Clearly `UNSYNCHRONIZED` comes with specific caveats and
-liabilities. You must assume that during the `UNSYNCHRONIZED`
+Clearly `scopedUnlock()` comes with specific caveats and
+liabilities. You must assume that during the `scopedUnlock()`
  section, other threads might have changed the protected structure
  in arbitrary ways. In the example above, you cannot use the
-iterator `i` and you cannot assume that the key `0` is not in the
+iterator `iter` and you cannot assume that the key `0` is not in the
  map; another thread might have inserted it while you were
  bragging on `LOG(INFO)`.
  
-### `SYNCHRONIZED_DUAL`
+Whenever a `LockedPtr` object has been unlocked, whether with `unlock()`
+or `scopedUnlock()`, it will behave as if it is null.  `isNull()` will
+return true.  Dereferencing an unlocked `LockedPtr` is not allowed and
+will result in undefined behavior.
+
+#### `Synchronized` and `std::condition_variable`
+
+When used with a `std::mutex`, `Synchronized` supports using a
+`std::condition_variable` with its internal mutex.  This allows a
+`condition_variable` to be used to wait for a particular change to occur
+in the internal data.
+
+The `LockedPtr` returned by `Synchronized<T, std::mutex>::lock()` has a
+`getUniqueLock()` method that returns a reference to a
+`std::unique_lock<std::mutex>`, which can be given to the
+`std::condition_variable`:
+
+``` Cpp
+    Synchronized<vector<string>, std::mutex> vec;
+    std::condition_variable emptySignal;
+
+    // Assuming some other thread will put data on vec and signal
+    // emptySignal, we can then wait on it as follows:
+    auto locked = vec.lock();
+    emptySignal.wait_for(locked.getUniqueLock(),
+                         [&] { return !locked->empty(); });
+```
+
+### `acquireLocked()`
  
  Sometimes locking just one object won't be able to cut the mustard. Consider a
  function that needs to lock two `Synchronized` objects at the
  same time - for example, to copy some data from one to the other.
-At first sight, it looks like nested `SYNCHRONIZED` statements
-will work just fine:
+At first sight, it looks like sequential `wlock()` calls will work just
+fine:
  
  ``` Cpp
-    void fun(Synchronized<fbvector<int>> & a, Synchronized<fbvector<int>> & b) {
-      SYNCHRONIZED (a) {
-        SYNCHRONIZED (b) {
-          ... use a and b ...
-        }
-      }
+    void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
+      auto lockedA = a.wlock();
+      auto lockedB = b.wlock();
+      ... use lockedA and lockedB ...
      }
  ```
  
@@ -523,22 +502,35 @@ This classic problem has a classic solution: all threads must
  acquire locks in the same order. The actual order is not
  important, just the fact that the order is the same in all
  threads. Many libraries simply acquire mutexes in increasing
-order of their address, which is what we'll do, too. The pseudo-
-statement `SYNCHRONIZED_DUAL` takes care of all details of proper
-locking of two objects and offering their innards:
+order of their address, which is what we'll do, too. The
+`acquireLocked()` function takes care of all details of proper
+locking of two objects and offering their innards.  It returns a
+`std::tuple` of `LockedPtr`s:
  
  ``` Cpp
-    void fun(Synchronized<fbvector<int>> & a, Synchronized<fbvector<int>> & b) {
-      SYNCHRONIZED_DUAL (myA, a, myB, b) {
-        ... use myA and myB ...
-      }
+    void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
+      auto ret = folly::acquireLocked(a, b);
+      auto& lockedA = std::get<0>(ret);
+      auto& lockedB = std::get<1>(ret);
+      ... use lockedA and lockedB ...
      }
  ```
  
-To avoid potential confusions, `SYNCHRONIZED_DUAL` only defines a
-four-arguments version. The code above locks `a` and `b` in
-increasing order of their address and offers their data under the
-names `myA` and `myB`, respectively.
+Note that C++ 17 introduces
+(structured binding syntax)[(http://wg21.link/P0144r2)]
+which will make the returned tuple more convenient to use:
+
+``` Cpp
+    void fun(Synchronized<vector<int>>& a, Synchronized<vector<int>>& b) {
+      auto [lockedA, lockedB] = folly::acquireLocked(a, b);
+      ... use lockedA and lockedB ...
+    }
+```
+
+An `acquireLockedPair()` function is also available, which returns a
+`std::pair` instead of a `std::tuple`.  This is more convenient to use
+in many situations, until compiler support for structured bindings is
+more widely available. 
  
  ### Synchronizing several data items with one mutex
  
@@ -565,10 +557,10 @@ For example:
        ...
      };
      ...
-    SYNCHRONIZED (bymap_) {
-      bymap_.direct[0] = "zero";
-      bymap_.inverse["zero"] = 0;
-    }
+    bimap_.withLock([](auto& locked) {
+      locked.direct[0] = "zero";
+      locked.inverse["zero"] = 0;
+    });
  ```
  
  With this code in tow you get to use `bimap_` just like any other
@@ -586,10 +578,10 @@ rewritten for the same functionality like this:
        ...
      };
      ...
-    SYNCHRONIZED (bymap_) {
-      get<0>(bymap_)[0] = "zero";
-      get<1>(bymap_)["zero"] = 0;
-    }
+    bimap_.withLock([](auto& locked) {
+      get<0>(locked)[0] = "zero";
+      get<1>(locked)["zero"] = 0;
+    });
  ```
  
  The code uses `std::get` with compile-time integers to access the