correctness-model/writeup/example.tex

   1 \mysection{Motivating Example}
   2 \label{sec:example}
   3
   4 We will use the read-copy update (RCU)~\cite{rcu} implementation in
   5 Figure~\ref{fig:rcuexample} as a running example.  RCU data structures
   6 are useful for usage scenarios with large numbers of reads and
   7 relatively few updates.  RCU based data structures scale to large
   8 numbers of readers because read operations do not contain any
   9 low-level store operations and thus do not generate coherence traffic to invalidate cache lines.
  10 Lines~\ref{line:rcuBegin} through~\ref{line:rcuEnd} contain
  11 the code for the RCU implementation, and
  12 Lines~\ref{line:testcaseBegin} through~\ref{line:testcaseEnd} show a
  13 test case for the implementation.
  14
  15 The RCU implementation maintains a shared pointer
  16 (Line~\ref{line:rcuSharedData})
  17 \code{node} to reference the data shared by readers and
  18 updaters. A reader loads the shared pointer (Line~\ref{line:rcuReadLoad}) to read
  19 the shared data, while an updater allocates a new struct
  20 (Line~\ref{line:rcuUpdateAlloc}), loads the shared pointer
  21 (Line~\ref{line:rcuUpdateLoad}), and then attempts a compare and swap (CAS) operation to update the shared pointer.  This process repeats until the CAS is successful.
  22
  23 \begin{figure}[h]
  24 \vspace{-.2cm}
  25 \begin{lstlisting}[xleftmargin=6.0ex]
  26 #define mo_acquire memory_order_acquire
  27 #define mo_acq_rel memory_order_acq_rel
  28 struct Node {/*@ \label{line:rcuBegin} @*/
  29         int data ;
  30         int version;
  31 };
  32
  33 class RCU {
  34         atomic<Node*> node;/*@ \label{line:rcuSharedData} @*/
  35         public:
  36         RCU() {
  37                 Node *n = new Node;
  38                 n->data= 0;
  39                 n->version = 0;
  40                 atomic_init(&node, n);
  41         }
  42         void read(int *data, int *version) {
  43                 Node *res = node.load(mo_acquire);/*@ \label{line:rcuReadLoad} @*/
  44                 *data = res->data;
  45                 *version = res->version;
  46         }
  47         void update(int data) {
  48                 bool succ = false;
  49                 Node *newNode = new Node;/*@ \label{line:rcuUpdateAlloc} @*/
  50                 Node *prev = node.load(mo_acquire);/*@ \label{line:rcuUpdateLoad} @*/
  51                 do {
  52                         newNode->data = data;
  53                         newNode->version = prev->version + 1;
  54                         succ = node.compare_exchange_strong(prev,/*@ \label{line:rcuUpdateCAS} @*/
  55                                 newNode, mo_acq_rel, mo_acquire);/*@ \label{line:rcuUpdateCASMO} @*/
  56                 } while (!succ);
  57         }
  58 };/*@ \label{line:rcuEnd} @*/
  59
  60 RCU x, y; // Define two RCU objects/*@ \label{line:testcaseBegin} @*/
  61 int r1, r2, v1, v2;
  62 void thrd_1() { // Thread 1/*@ \label{line:thrd1} @*/
  63         x.update(1);
  64         y.read(&r1, &v1); // r1 == 0
  65 }
  66 void thrd_2() { // Thread 2/*@ \label{line:thrd2} @*/
  67         y.update(1);
  68         x.read(&r2, &v2); // r2 == 0
  69 }/*@ \label{line:testcaseEnd} @*/
  70 \end{lstlisting}
  71 \vspace{-.2cm}
  72 \caption{\label{fig:rcuexample}C++11 read-copy update example}
  73 \vspace{-.3cm}
  74 \end{figure}
  75
  76 \begin{figure}[t]
  77         \centering
  78 \vspace{-.3cm}
  79         \resizebox{\columnwidth}{!}{
  80                 \includegraphics[scale=0.33]{figures/rcuhistory}
  81         }
  82   \vspace{-.4cm}
  83                 \caption{\label{fig:rcuhistory}History of a possible execution of two RCU objects}
  84 \vspace{-.4cm}
  85 \end{figure}
  86
  87 This implementation uses low-level atomic operations provided by
  88 C/C++ to implement the RCU mechanism. For example, in
  89 Line~\ref{line:rcuUpdateCAS}, the \code{update} method uses a CAS
  90 operation to update the \code{node}. The memory
  91 order parameters attached to the atomic operations ensure that memory
  92 accesses that were performed before the \code{update} operation must
  93 occur before memory accesses that were performed after the \code{read}
  94 operation.
  95
  96 \mysubsection{Consistency Model}
  97 \label{sec:exampleMotivation}
  98
  99 In the example code, two threads, Thread~1 and Thread~2, each update
 100 one of the RCU objects \code{x} and \code{y}, respectively, and then
 101 access the object updated by the other thread.  Under the C/C++
 102 memory model, this example admits an execution in which both Thread~1
 103 and Thread~2 read the initial value 0.
 104
 105 Figure~\ref{fig:rcuhistory} shows a possible order for this
 106 execution in which the model checker processes statements in the order shown in
 107 the
 108 execution labeled Original Trace\footnote{Note that the C/C++ memory
 109 model does not include any notion of a trace.  We merely use the term
 110 trace to refer to the order in which \TOOL prints the execution.},
 111 which is an interleaving of the invocation and response events of
 112 API methods indicated by \lq\lq\_i\rq\rq and \lq\lq\_r\rq\rq,
 113 respectively.
 114
 115 This trace is rather counter-intuitive because it orders
 116 Thread~1's \code{update} event before Thread~2's \code{read} event,
 117 yet Thread~2's \code{read} event does not observe the updated value.
 118 However, under the C/C++11 memory model, this is allowed since a load
 119 is allowed to read a value that is written by an old store.
 120
 121 It is important to note that it is impossible to produce a sequential history
 122 for this example that preserves program order and produces the observed
 123 behavior.  This means that traditional notions of correctness that relate
 124 concurrent executions to sequential executions that were developed for
 125 the SC memory model cannot directly be applied to the C/C++11 memory
 126 model.
 127
 128 This example leads to the following conclusion --- if we want to
 129 define the behavior of concurrent executions by relating them to
 130 equivalent sequential histories, then we must give up on preserving
 131 program order.
 132
 133 We note that for this example, if we give up on program order, we
 134 produce the sequential history in Figure~\ref{fig:rcuhistory}
 135 labeled Reordered Trace.  The data structure operations in this
 136 sequential history have behavior that is consistent with the observed
 137 behaviors in the example execution.
 138
 139 \mysubsection{Synchronization Properties}
 140
 141 A correct RCU data structure ensures the property that the invocation
 142 event of an \code{update} operation happens-before the response event
 143 of the subsequent
 144 \code{read} operation. Informally, all operations that appear before the \code{update} operation must appear to occur before the operations that appear after the \code{read} operation.  While these
 145 semantics follow naturally for SC, for relaxed memory models we must
 146 ensure that neither the compiler nor the processor performs
 147 reorderings that violate these semantics. Therefore, when a reader
 148 reads the fields of the
 149 \code{node},
 150 data races are eliminated since the \code{read} operation synchronizes with the \code{update} operation.
 151
 152
 153 In the C/C++11 memory model, the transitive closure of
 154 \textit{sequence-before} and \textit{synchronizes-with} relations form the
 155 \textit{happens-before} relation in the absence of operations with the \code{consume} memory order.  Since every time an \code{update} operation finishes, a CAS operation to
 156 \code{node} with release semantics is executed, and the subsequent \code{read}
 157 loads the \code{node} reference with acquire semantics, the \code{read} operation synchronizes
 158 with the \code{update} operation, providing the necessary synchronization properties.
 159 Some RCU implementations may use consume semantics instead of acquire semantics,
 160 however we do not use consume for the following two reasons: (1) most existing
 161 C/C++ compilers do not support consume semantics and thus acquire does not
 162 involve extra overhead and (2) using consume semantics provides a more complex interface to
 163 clients that may require them to explicitly declare dependences throughout the code.
 164
 165