correctness-model/writeup/formalization.tex

   1 \mysection{Formalization of Correctness Model}\label{sec:formalization}
   2
   3 \todo{discuss ordering point --- why does ordering point make sense? why do we
   4 want them}
   5
   6 Unlike the SC memory model, finding an appropriate correctness model for
   7 concurrent data structures under the C/C++11 memory model is challenging by the
   8 fact that the C/C++ memory model allows atomic loads to read from atomic stores
   9 that appear later in the trace and that it is in general impossible to produce a
  10 sequential history that preserves program order for the C/C++ memory model.
  11
  12 Developers usually leverage the relaxed atomics provided by the C/C++11 memory
  13 model to gain performace by introducing weaker semantics to their users.
  14 However, the semantics should not be arbitrarily relaxed since users may have a
  15 very difficult time using the data structure implemntations. Intuitively, if
  16 method $A$ has an information flow to method $B$, developers and users generally
  17 assume method $B$ to see the effects before method $A$. Specifically in C/C++11,
  18 we want method $A$ to have \code{release} semantics while method $B$ to have
  19 \code{acquire} semantics so that they establish the happens-before relationship.
  20 For example, for a concurrent queue, it is necessary that a dequeuer synchronize
  21 with the corresponding enqueuer so that when the dequeuer obtains a reference to
  22 an object, it can read the fully initialized value of that object. To some
  23 degree this can also avoid data races. When it comes to C/C++11 data structures,
  24 the ordering points of method calls should have release/acquire semantics on
  25 stores and loads.
  26
  27 However, even if we require data structures to have release/acquire semantics on
  28 method calls, it still remains a challenge to apply existing correctness model
  29 to C/C++11 data structures to generate a correct sequential history that is
  30 consistent with the program order. In order to further illustrate such a
  31 challenge, we define \moc as the \textit{modification order consistency}
  32 relation between memory operations.  Given two operations $X$ and $Y$, the
  33 \moc edges are as follow:
  34
  35 \squishcount
  36 \item {\bf Modification Order Consistency (write-write):} $X \xrightarrow{mo} Y
  37 \Rightarrow X \xrightarrow{moc} Y$.
  38
  39 \item {\bf Modification Order Consistency (read-write):} $A \xrightarrow{mo} Y \
  40 \wedge \  A \xrightarrow{rf} X \Rightarrow X \xrightarrow{moc} Y$.
  41
  42 \item {\bf Modification Order Consistency (write-read):} $X \xrightarrow{mo} B \
  43 \wedge \  B \xrightarrow{rf} Y \Rightarrow X \xrightarrow{moc} Y$.
  44
  45 \item {\bf Modification Order Consistency (read-read):} $A \xrightarrow{mo} B \
  46 \wedge \  A \xrightarrow{rf} X \ \wedge \  B \xrightarrow{rf} Y \Rightarrow X
  47 \xrightarrow{moc} Y$.
  48 \countend
  49 \vspace{0.3cm}
  50
  51 When we generate a sequential history, the \moc relation imposes
  52 restrictions on operations on the same memory location such that every load
  53 operation reads from the most recent store operation in that history. In
  54 general, it can be impossible to generate any sequential history that is
  55 consistent with the \moc relation and the \hb relation. Consider the
  56 following example:
  57
  58 {
  59 \footnotesize
  60 \begin{verbatim}
  61  Thrd 1:              Thrd 2:
  62  x = 1;               y = 1;
  63  r1 = y;              r2 = x;
  64 \end{verbatim}
  65 }
  66
  67 In this example, shared variables \code{x} and \code{y} are both initilly
  68 \code{0}.  Each store operation has \code{release} semantics and each load
  69 operation has \code{acquire} semantics. For the execution where both \code{r1}
  70 and \code{r2} obtain the old value \code{0}, we cannot generate a sequential
  71 history consistent with the \hb and \moc relation. Since both load
  72 operations reads from the initial store operations, there exists an \moc
  73 edge from each load to the corresponding store operation. Since the intra-thread
  74 order (\textit{sb}) is a part of the \hb relation, a cycle exists in the union
  75 of \hb and \moc relation, making it impossible to topologically sort the
  76 operations to generate a consistent sequential history.
  77
  78 However, while maintaining the \moc relation, we can weaken some
  79 constraints, e.g.  eliminating some \textit{happens-before} edges, to generate a
  80 history that is consistent with the union of \hb and \moc relation. For
  81 instance, in the above example, we do not necessarily have to maintain the
  82 intra-thread order in both threads because they are independent operations and
  83 the later operation is a load.
  84
  85 To be specific, in order to make the sequential history intuitive, we want to
  86 keep the happens-before edges from store to store, from load to store because
  87 that can ensure information is only flowing from earlier store operations.
  88 Besides, we also want to keep the happens-before edges between operations on the
  89 same memory location since that is consistent with the dependency between
  90 operations on the same location. However, a correctness model with these
  91 intuitions still does not work in C/C++ in general.  Consider the following
  92 example:
  93
  94 {
  95 \footnotesize
  96 \begin{verbatim}
  97  Thrd 1:     Thrd 2:     Thrd 3:     Thrd 4:
  98  x = 1;      y = 2;      r1 = x;     r3 = y;
  99  y = 1;      x = 2;      r2 = x;     r4 = y;
 100 \end{verbatim}
 101 }
 102
 103 We encounter a challenge for the execution where the following holds:
 104 \path{r1 == 2 && r2 == 1 && r3 == 1 && r4 == 2}. Operation \code{y = 1} has an
 105 \moc edge to operation \code{y = 2}, and operation \code{x = 2} also has
 106 an edge to operation \code{x = 1}. As a result, there exists a cycle in the
 107 union of the \moc and \hb relation.
 108
 109 To solve this, one option is to allow load operations to move up any operation
 110 including load and store operations on the same location. However, this can be
 111 extremely conuter-intuitive for a later load operation in \hb relation to read
 112 an older value than its previous load operation.  Actually, the problem here is
 113 that the store operations with different values to the same memory location are
 114 not ordered by \hb relation. This is a really rare case because it could be
 115 possible that only one store operation takes effect and all other store
 116 operations are useless. We believe that such a case of blind store operations is
 117 not the general pattern for concurrent data structures implementation. Instead,
 118 we believe that store operations to the same memory location should be ordered
 119 by \hb relation.
 120
 121 We first define two categories of memory operations as follow:
 122 \begin{eqnarray*}
 123 s \in \StoreOp &=& \{store\} \times \Address \times \Value \cup \\
 124 &&\{rmw\} \times \Address \times \Value \\
 125 l \in \LoadOp &=& \{load\} \times \Address.
 126 \end{eqnarray*}
 127
 128 We next define a \textit{relaxing operation} --- $r(x, y)$ on relation $R$ ($(x,
 129 y) \in R$), and it generates a new relation, denoted as relation $R_{r(x, y)}$
 130 (or $R_r$ for short) which is defined as follow:
 131
 132 $ R_{r(x, y)} = \{ (a, b) \mid [ (a, b) \in R \wedge \neg [ (a, b) == (x, y)
 133 \wedge \movable(x, y) ]] \vee [ b == y \wedge (a, x) \in R \wedge \movable(x, y) ]$,
 134 where $ \movable(x, y) = \{ \Address(x) \not= \Address(y) \wedge y \in \LoadOp \}$.
 135
 136 We also define a \textit{transform operation} --- $T$ on relation $R$, as a
 137 finite number of relaxing operations, $r1, r2,..., rn$ that are cumulatively
 138 operated on $R$, as follow:
 139
 140 $R_T = R_{r1r2...rn} = (((R_{r1})_{r2})...)_{rn}$ .
 141
 142 Furthermore, we define a special \textit{transform operation} $UT$ on relation
 143 $R$, where any further \textit{relaxing operation} $r$ on $R_{UT}$
 144
 145 Under these analysis and assumptions, we present Lemma~\ref{seqHistoryExistence}.
 146
 147 \begin{lemma} \label{seqHistoryExistence}
 148 For any legal C/C++11 execution, where store / load operations have release /
 149 acquire semantics and store operations to the same memory location are ordered
 150 by \hb relation, there exists a sequential history that is consistent with the
 151 \moc relation and the \hb relation in terms of store to store, load to store and
 152 operations on the same memory location.
 153 \end{lemma}
 154
 155 \begin{proof}
 156 To prove Lemma~\ref{seqHistoryExistence}, we present in
 157 Figure~\ref{fig:algorithmConsistentHistory} our constructive algorithm that
 158 generates a sequential history and prove that such a history satisfies the
 159 condition. The algorithm first topologically sort operations by \hb, and then it
 160 only moves up certain load operations (Line~\ref{line:moveUpLoads}); thus, the
 161 \hb relation in terms of store to store and load to store holds. Since load
 162 operations that read from the correct store is not moved
 163 (Line~\ref{line:loadSet}) and store operations are ordered by \hb relation, the
 164 \moc relation in terms of write-write and write-read operations holds. By
 165 induction, moving up load operations will end up placing load operations between
 166 the store operation that they read from and the next store operation, and thus
 167 the \moc relation in terms of read-read and read-write operations also holds. As
 168 a result, the \moc relation holds. We then illustrates that the \hb relation
 169 between the same memory location also holds by contradiction. The only
 170 possibility of moving load operations across operations on the same location
 171 lies in Line~\ref{line:loadSet} where there exists an operation $o$ between $w'$
 172 and $r$, and $o$ is not moved up while some operation $r'$ in $l'$ is moved up,
 173 and $o$ happens before $r'$. If $o$ is a load operation, $o$ must read from $w$
 174 otherwise $o$ will be moved up. Thus we have the relation: $w \relation{hb} w'
 175 \relation{hb} o \relation{hb} r'$, which contradicts the fact that $r$ reads
 176 from $w$. Likewise, if $o$ is a store operation, since store operations are
 177 ordered by \hb relation, we have the relation: $w \relation{hb} o \relation{hb}
 178 r'$ and this is a contradiciton similarly.  Therefore, the generated sequential
 179 history by this algorithm satisfies the conditions in this Lemma.
 180 \end{proof}
 181
 182 \begin{figure}[!htbp]
 183 \begin{algorithmic}[1]
 184 \Function{GenerateConsistentHistory}{trace}
 185 \State $H$ := topologically sort operations of trace by \hb
 186 \ForAll{load operation $r$ in $H$ in reverse order}
 187 \State $w$ := store operation that $r$ reads from
 188 \State $m$ := memory location of $r$
 189 \If{no store operations on $m$ between $w$ and $r$}
 190 \State continue
 191 \Else
 192 \State $w'$ := most recent store on $m$ before $r$
 193 \State  \parbox[t]{\dimexpr\linewidth-\algorithmicindent}{$l'$ := set of \label{line:loadSet}
 194 load operations on $m$ between $w'$ and $r$\\
 195 that do not read from $w'$ (including $r$)  \strut}
 196 \State move $l'$ (keep inner order) above $w$ and continue \label{line:moveUpLoads}
 197 \EndIf
 198 \EndFor
 199 \State \Return{$H$} \label{line:finalResults}
 200 \EndFunction
 201 \end{algorithmic}
 202 \caption{\label{fig:algorithmConsistentHistory}Algorithm for generating a consistent sequential history}
 203 \end{figure}
 204
 205 \begin{lemma} \label{equivalence}
 206 For any legal C/C++11 execution, where store / load operations have release /
 207 acquire semantics and store operations to the same memory location are ordered
 208 by \textit{happens-before} relation, the sequential history $H$ generated by the
 209 algorithm in Figure~\ref{fig:algorithmConsistentHistory} is equivalent to some
 210 topological sorting of the operations by the \moc relation and $\hb_T$ relation,
 211 where $T$ is some \textit{transform operation} on the \hb relation.
 212 \end{lemma}
 213
 214 \begin{proof}
 215 In the algorithm showed in Figure~\ref{fig:algorithmConsistentHistory}, during
 216 each iteration there can be a set of load operations $l'$
 217 (Line~\ref{line:loadSet}) to be moved up across the most recent store operation
 218 $w'$. For any load opeartion $l''$ in the set $l'$, as showed in the proof of
 219 Lemma~\ref{seqHistoryExistence}, any operations on the same location as $l''$
 220 between $w$ and $r$ must be in $l'$.  When $l''$ is moved across an operation
 221 $o$ that happens before $l''$, we add a \textit{relaxing opeartion} $r(o, l')$
 222 to the \textit{transform operation} $T$ on the \hb relation. When it finishes
 223 generating $H$, we have a \textit{transform} $T$ on the \hb relation, and we
 224 denote that $\hb_T$ as the relation on \hb after taking the operation $T$. Since
 225 each \textit{relaxing operation} $r(o, l')$ eliminates the happens-before edge
 226 from $o$ to $l'$, history $H$ should be consistent with the $\hb_T$ relation and
 227 \moc relation. On the other hand, it means we can topologically sort the
 228 operations by the $\hb_T$ and \moc relation and $H$ is one possible sorting.
 229 \end{proof}
 230
 231 \todo{formalize composability and show hat our correctness model is composable}
 232 1. Should we still assume that store operations to the same memory location are
 233 ordered?
 234 2. Or is it safe to assume that variables from two release/acquire consistent
 235 objects should be disjoint?
 236
 237
 238 First of all, we represent a trace as an \textit{execution graph}, where each
 239 node represents each API method call with a set of ordering points, and edges
 240 between nodes are derived from the \textit{happens-before} edges and the
 241 \textit{modification order} edges between ordering points. We define
 242 \textit{opo} as the \textit{ordering point order} relation between ordering
 243 point.  Given two operations $X$ and $Y$ that are both ordering points, the
 244 \textit{modification order consistency} edges are as follow: