correctness-model/writeup/specification.tex

   1 \mysection{Specification Language Design}\label{sec:specification}
   2
   3 We begin by overviewing \TOOL's basic approach.  The \TOOL
   4 specification language specifies the correctness properties for
   5 concurrent data structures by establishing a correspondence with an
   6 equivalent sequential data structure, which requires: (1) defining the
   7 equivalent sequential data structure; (2) establishing an equivalent
   8 sequential history; (3) relating the behavior of the concurrent
   9 execution to the sequential history; and (4) specifying the
  10 synchronization properties between method invocations. The
  11 specification language has the following key components:
  12
  13 \mypara{\bf 1. Equivalent Sequential Data Structure:} This component defines a
  14 sequential data structure against which the concurrent data structure
  15 will be checked.
  16
  17 \mypara{\bf 2. Defining the Equivalent Sequential History:}
  18 For real-world data structures, generally there exist two types of API methods,
  19 \textit{primitive} API methods and \textit{aggregate} API methods. Take
  20 concurrent hashtables as an example, it provides primitive API methods
  21 such as \code{put} and \code{get} that implement the core functionality of the
  22 data structure, and it also has aggregate API methods such
  23 as \code{putAll} that calls primitive API methods internally. From our
  24 experience working with our benchmark set, it is generally possible to
  25 generate a sequential history of primitive API method invocations for
  26 real-world data structures as they generally serialize on a single memory
  27 location --- while it is possible to observe partially completed aggregate
  28 API method invocations.  Therefore, our specifications will focus on the
  29 correctness of primitive API methods.
  30
  31 Borrowing from VYRD~\cite{vyrd} the concept of commit points, we allow
  32 developers to specify \textit{ordering points} --- the specific atomic
  33 operations between method invocation and response events that are used for
  34 ordering method calls.  Ordering points are a generalization of the commit
  35 points as they also use memory operations that do not commit the data structure
  36 operation to order method calls. Developers then specify ordering points to
  37 generate the equivalent sequential history.
  38
  39 \mypara{\bf 3. Specifying Correct Behaviors:} Developers use
  40 the \TOOL specification to provide a set of \textit{side effects}
  41 and \textit{assertions} to describe the desired behavior of each API
  42 method.  Side effects capture how a method updates the equivalent
  43 sequential data structure.  Assertions include both preconditions and
  44 postconditions which specify what conditions each method must satisfy
  45 before and after the method call, respectively. The specification of
  46 the side effects and assertions may make use of the states of the
  47 equivalent sequential data structure, meaning that these components
  48 can access the internal variables and methods of the equivalent sequential data
  49 structure. Additionally, they can reference the values available at
  50 the method invocation and response, i.e., the parameter values and the
  51 return value.
  52
  53 \mypara{\bf 4. Synchronization:} The synchronization  specification
  54 describes the desired happens before relations between API method
  55 invocations.  \TOOL specifications allow developers to specify the
  56 properties at the abstraction of methods.  This makes specifications
  57 cleaner, more understandable, and less error-prone because the
  58 properties do not rely on low-level implementation
  59 details. Moreover, \TOOL specifications allow developers to attach
  60 conditions to methods when specifying synchronization properties so
  61 that a method call might synchronize with another only under a
  62 specific condition.  For example, consider a spin lock with
  63 a \code{try\_lock()} method, we need to specify that only a
  64 successful \code{try\_lock()} method invocation must synchronize with the
  65 previous \code{unlock()} method invocation.
  66
  67 Figure~\ref{fig:speccfg} presents the grammar for the \TOOL specification
  68 language.  The grammar defines three types of specification annotations:
  69 \textit{structure annotations}, \textit{method annotations}, and
  70 \textit{ordering point annotations}.
  71 In the grammar, \textit{code} means legal C/C++ source code; and
  72 \textit{label} means a legal C/C++ variable name.
  73 Annotations are embedded in C/C++ comments.
  74 This format does not affect the semantics of the original
  75 program and allows for the same source to be used by both a standard C/C++
  76 compiler to generate production code and for the \TOOL specification compiler to
  77 extract the \TOOL specification from the comments.  We discuss these constructs in more detail throughout the remainder of this section.
  78
  79 \begin{figure}[!tb]
  80 \vspace{-.2cm}
  81 {\scriptsize
  82         \begin{eqnarray*}
  83         \textit{\textbf{structureSpec}} & \rightarrow & \code{"@Structure\_Define"}\\
  84  && \textit{structureDefine} \ \ \ (\textit{happensBefore})?\\
  85         \textit{structureDefine} & \rightarrow &
  86 (\code{"@DeclareStruct:"} \
  87         \textit{code})\text{*}\\
  88         && \code{"@DeclareVar:"} \ \textit{code}\\
  89         && \code{"@InitVar:"} \ \textit{code}\\
  90         && (\code{"@DefineFunc:"} \ \textit{code})\text{*}\\
  91         \textit{happensBefore} & \rightarrow & \code{"@Happens\_Before:"} \
  92         (\textit{invocation} \ \code{"->"} \
  93         \textit{invocation})\textsuperscript{+}\\
  94         \textit{invocation} & \rightarrow & \textit{label} \ (\ \  \code{"("} \
  95         \textit{label} \code{")"} \ \ )?\\
  96         \textit{\textbf{methodSpec}} & \rightarrow & \code{"@Method:"} \ \textit{label}\\
  97         && \code{"@Ordering\_Point\_Set:"} \ \textit{label} \ (\code{"|"} \
  98         \textit{label})\text{*}\\
  99         && (\code{"@HB\_Condition:"} \ \textit{label} \ \code{"::"} \ \textit{code})\text{*}\\
 100         && (\code{"@ID:"} \ \textit{code})?\\
 101         && (\code{"@PreCondition:"} \ \textit{code})?\\
 102         && (\code{"@SideEffect:"} \ \textit{code})?\\
 103         && (\code{"@PostCondition:"} \ \textit{code})?\\
 104         \textit{\textbf{potentialOP}} & \rightarrow & \code{"@Potential\_Ordering\_Point:"} \ \textit{code}\\
 105         && \code{"@Label:"} \ \ \textit{label}\\
 106         \textit{\textbf{opCheck}} & \rightarrow & \code{"@Ordering\_Point\_Check:"} \ \textit{code}\\
 107         && \code{"@Potential\_Ordering\_Point\_Label:"} \ \textit{label}\\
 108         && \code{"@Label:"} \ \ \textit{label}\\
 109         \textit{\textbf{opLabelCheck}} & \rightarrow & \code{"@Ordering\_Point\_Label\_Check:"} \ \textit{code}\\
 110         && \code{"@Label:"} \ \ \textit{label}\\
 111         \textit{\textbf{opClear}} & \rightarrow & \code{"@Ordering\_Point\_Clear:"} \ \textit{code}\\
 112         && \code{"@Label:"} \ \ \textit{label}\\
 113         \textit{code} & \rightarrow & \code{<Legal C/C++ code>}
 114         \end{eqnarray*}}
 115 \vspace{-.7cm}
 116 \caption{\label{fig:speccfg}Grammar for \TOOL specification language}
 117 \vspace{-.3cm}
 118 \end{figure}
 119
 120
 121 \mysubsection{Example}
 122
 123 To make the \TOOL specification language more concrete,
 124 Figure~\ref{fig:rcuSpecExample} presents the annotated RCU example. We
 125 use ordering points to order the method invocations to construct
 126 the equivalent sequential history for the concurrent execution.  We
 127 first specify the ordering points for the \code{read}
 128 and \code{update} methods to define the equivalent sequential
 129 history. For the
 130 \code{read} method, the load access of the \code{node} field in
 131 Line~\ref{line:rcuSpecReadLoad} is the only operation that can be an ordering
 132 point.  For the \code{update} method, there exists more than one atomic
 133 operation. However, they only serve as an ordering point when it successfully uses the CAS operation
 134 to update the \code{node} field. Thus, we specify in
 135 Line~\ref{line:rcuSpecUpdateOPBegin} that the CAS operation should be the
 136 ordering point for the update operation when \code{succ} is \code{true}. We
 137 associate the methods with the two ordering points in
 138 Lines~\ref{line:rcuSpecReadInterfaceOPSet} and
 139 \ref{line:rcuSpecUpdateInterfaceOPSet}.
 140
 141 Next, we specify the equivalent sequential data structure for this RCU implementation.
 142 Line~\ref{line:declVar} declares two integer fields \code{\_data} and
 143 \code{\_version} as the internal states of the equivalent sequential RCU, and
 144 Line~\ref{line:initVar} initializes both variables to \code{0}.
 145
 146 We then specify the correct behaviors of the equivalent sequential
 147 history by defining the side effects and assertions for
 148 the \code{read} and \code{update} methods.  Side effects specify how
 149 to perform the corresponding operation on the equivalent sequential
 150 data structure.  For example,
 151 Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} specifies the side
 152 effects of the \code{update} to the sequential states.
 153 Assertions specify properties that should be true of the concurrent data structure execution.
 154 For example, Line~\ref{line:rcuSpecReadInterfacePostCondition}
 155 specifies that the \code{read} method should satisfy the postcondition that the
 156 returned \code{data} and \code{version} fields should have consistent values
 157 as the internal states of the equivalent sequential data structure.
 158
 159 Our implementation of the RCU data structure is designed to establish
 160 synchronization between the \code{update} method invocation and the later
 161 \code{read} or \code{update} method invocation.  This is important, for
 162 example, if a client thread were to update an array index and use the RCU data
 163 structure to communicate the updated index to a second client thread.  Without
 164 synchronization, the second client thread may not see the update to the array
 165 index. Besides, the synchronization between two \code{update} calls ensures that
 166 a later \code{read} will see the most updated values.
 167 In order to specify the synchronization, we associate \code{read} and \code{update} methods with method call identifiers
 168 (Lines~\ref{line:rcuSpecReadInterfaceID} and~\ref{line:rcuSpecUpdateInterfaceID}), which for this example is the value of the
 169 \code{this} pointer.
 170 Together these annotations ensure that every
 171 API method call on the same RCU object will have the same method call ID.
 172 Line~\ref{line:rcuSpecHB} then specifies
 173 the synchronization properties for the RCU --- any \code{update} method invocation
 174 should synchronize with all later \code{read} and \code{update} method invocations that have the
 175 same method call identifier as that \code{update} method. This guarantees that
 176 \code{update} calls should synchronize with the \code{read} and \code{update}
 177 calls of the same RCU object.
 178
 179
 180 \begin{figure}[h!]
 181 \vspace{-.2cm}
 182 \begin{lstlisting}[xleftmargin=6.0ex]
 183 class RCU {
 184         /** @Structure_Define:
 185                 @DeclareVar:/*@ \label{line:declVar} @*/ int _data, _version;
 186                 @InitVar:/*@ \label{line:initVar} @*/ _data = 0; _version = 0;
 187         @Happens_Before: Update->Read  Update->Update*//*@ \label{line:rcuSpecHB} @*/
 188         atomic<Node*> node;
 189         public:
 190         RCU() {
 191                 Node *n = new Node;
 192                 n->data = 0;
 193                 n->version = 0;
 194                 atomic_init(&node, n);
 195         }
 196         /** @Interface: Read/*@ \label{line:rcuSpecReadInterfaceLabel} @*/
 197         @Ordering_Point_Set: Read_Point/*@ \label{line:rcuSpecReadInterfaceOPSet} @*/
 198         @ID: this/*@ \label{line:rcuSpecReadInterfaceID} @*/
 199         @PostCondition:/*@ \label{line:rcuSpecReadInterfacePostCondition} @*/
 200                 _data == *data && _version == *version *//*@ \label{line:rcuSpecReadInterfaceEnd} @*/
 201         void read(int *data, int *version) {
 202                 Node *res = node.load(mo_acquire);/*@ \label{line:rcuSpecReadLoad} @*/
 203                 /** @Ordering_Point_Label_Check: true/*@ \label{line:rcuSpecReadOPBegin} @*/
 204                 @Label: Read_Point */
 205                 *data = res->data;
 206                 *version = res->version;
 207         }
 208         /** @Interface: Update
 209         @Ordering_Point_Set: Update_Point/*@ \label{line:rcuSpecUpdateInterfaceOPSet} @*/
 210         @ID: this/*@ \label{line:rcuSpecUpdateInterfaceID} @*/
 211         @SideEffect: _data = data; _version++; *//*@ \label{line:rcuSpecUpdateInterfaceSideEffect} @*/
 212         void update(int data) {
 213                 bool succ = false;
 214                 Node *newNode = new Node;/*@ \label{line:rcuSpecUpdateAlloc} @*/
 215                 Node *prev = node.load(mo_acquire);/*@ \label{line:rcuSpecUpdateLoad} @*/
 216                 do {
 217                         newNode->data = data;
 218                         newNode->version = prev->version + 1;
 219                         succ = node.compare_exchange_strong(prev,/*@\label{line:rcuSpecUpdateCAS} @*/
 220                                 newNode, mo_acq_rel, mo_acquire);
 221                         /** @Ordering_Point_Label_Check: succ/*@ \label{line:rcuSpecUpdateOPBegin} @*/
 222                         @Label: Update_Point */
 223                 } while (!succ);
 224         }
 225 };
 226
 227 \end{lstlisting}
 228 \vspace{-.2cm}
 229 \caption{\label{fig:rcuSpecExample}Annotated RCU specification example}
 230 \vspace{-.3cm}
 231 \end{figure}
 232
 233 \subsection{Defining the Equivalent Sequential History}
 234
 235 While defining the equivalent sequential history for concurrent executions is well studied in the context of the
 236 SC memory model, optimizations that developers typically use in the
 237 context of weaker memory models create the following new challenges when
 238 we generate the equivalent sequential history using ordering points.
 239
 240 \squishlist
 241 \item {\bf Absence of a Meaningful Trace or Total Order:}
 242 For the SC memory model, an execution can be represented by a simple
 243 interleaving of all the memory operations, where each load operation reads from
 244 the last store operation to the same location in the trace.  However, under a
 245 relaxed memory model like C/C++11, the interleaving does not uniquely define an
 246 execution because a given load operation can potentially read from many
 247 different store operations in the interleaving. Therefore, we have to rely on
 248 the intrinsic relations between ordering points such as \textit{reads from} and
 249 \textit{modification order} to order method calls.
 250
 251 \item {\bf Lack of Program Order Preserving Sequential History:}
 252 Moreover, as discussed in Section~\ref{sec:exampleMotivation}, in
 253 general it is not possible to arrange an execution in any totally
 254 ordered sequential history that preserves program order.
 255
 256 A key insight is that many concurrent data structures' API methods
 257 have a commit point, which is a single memory operation that makes the
 258 update visible to other threads and that also serves as an ordering
 259 point.  When two data structure operations have a dependence, it is
 260 often the case that their respective commit points are both
 261 conflicting accesses to the same memory location.  In this case, the
 262 modification order provided by C/C++ is sufficient to order these
 263 operations since modification order is guaranteed to be consistent
 264 with the happens before relation (and therefore also the sequenced
 265 before relation).
 266
 267 For cases where the method calls are independent, such as
 268 a \code{put(X, 1)} followed by a
 269 \code{get(Y)} in a hashtable where \code{X} and \code{Y} are different keys,
 270 the lack of an ordering is not a problem since those methods commute.
 271 \squishend
 272
 273 \mypara{\bf Ordering Points Annotations:} In many cases, it is not
 274 possible to determine whether a given atomic operation is an ordering
 275 point until later in the execution. For example,
 276 some internal methods may be called by multiple API methods. In this
 277 case, the same atomic operation can be identified as a potential
 278 ordering point for multiple API methods, and each API method later has
 279 a checking annotation to verify whether it was a real ordering
 280 point.  Therefore, the \TOOL specification separates the definition of ordering
 281 points as follows:
 282 \squishcount
 283 \item {\code{Potential\_Ordering\_Point} annotation:} The labeling of ordering
 284 points that identifies the location of a potential ordering point.
 285 \item {\code{Ordering\_Point\_Check} annotation:} The checking of ordering
 286 points that checks at a later point whether a potential ordering point was
 287 really an ordering point.
 288 \countend
 289
 290 These two constructs together identify ordering points.  For
 291 example, assume that $A$ is an atomic operation that is potentially an
 292 ordering point under some specific
 293 condition.  The developer would then write a
 294 \code{Potential\_Ordering\_Point} annotation with a condition
 295 \code{ConditionA} and a label
 296 \code{LabelA}, and then use the label \code{LabelA} in an
 297 \code{Ordering\_Point\_Check} annotation at a later point.
 298
 299 The \code{Ordering\_Point\_Label\_Check} annotation combines
 300 the \code{Potential\_Ordering\_Point} and the
 301 \code{Ordering\_Point\_Check} annotations, and makes
 302 specifications simpler for the use case where the ordering point is known immediately.
 303 For example, in Line~\ref{line:rcuSpecReadOPBegin} of
 304 Figure~\ref{fig:rcuSpecExample}, we use the \code{Ordering\_Point\_Label\_Check} annotation to
 305 identify the ordering point for \code{read} because we know the load operation
 306 in Line~\ref{line:rcuSpecReadLoad} is an ordering point at the time it is
 307 executed.
 308
 309 Some data structure operations may require multiple ordering points.
 310 For example, consider a transaction implementation that first attempts
 311 to lock all of the involved objects (dropping the locks and retrying
 312 if it fails to acquire a lock), performs the updates, and then releases
 313 the locks.  To order such transactions in a relaxed memory model, we must consider all of the
 314 locks it acquires rather than just the last lock.  Thus, we allow a
 315 method invocation to have more than one ordering point, and the
 316 additional ordering points serve to order the operation with respect to
 317 multiple different memory locations.  For the transaction example, it
 318 may be necessary to retry the acquisition of locks.  To support this
 319 scenario, the \code{Ordering\_Point\_Clear} annotation removes all
 320 previous ordering points when it satisfies a specific condition.
 321
 322 Moreover, when an API method calls another API method, they can share ordering points.  In
 323 that case, \TOOL requires that at that ordering point, the concurrent data
 324 structure should satisfy the precondition and postcondition of both API methods.
 325
 326 \subsection{Checking the IO Behavior of Methods}
 327
 328 With the specified ordering points, \TOOL is able to generate the equivalent
 329 sequential history. Developers then need to define the equivalent sequential
 330 data structure. For example, in Line~\ref{line:declVar} and~\ref{line:initVar} of the annotated RCU example, we use the
 331 \code{Structure\_define} annotation to define the equivalent sequential RCU
 332 by specifying the internal states as two integers, \code{\_version} and \code{\_data}. In the
 333 \code{Structure\_define} annotation, developers can also specify definitions for customized
 334 structs and methods for convenience. We design these annotations in such a way
 335 that developers can
 336 write specifications in C/C++ code such that they do not have to learn a new
 337 specification language.
 338
 339 After defining the internal states and methods of the
 340 equivalent data structure, developers use the \code{SideEffect} annotation to
 341 define the corresponding sequential API methods, which should contain the action
 342 to be performed on the equivalent sequential data structure. For example, in
 343 Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} of the annotated RCU example,
 344 we use \code{SideEffect} to specify that when we execute the \code{update}
 345 method on the equivalent sequential RCU, we should update the
 346 internal states of \code{\_version} and \code{\_data} accordingly. When
 347 the \code{SideEffect} annotation is omitted for a specific API method, it means
 348 that no side effects will happen on the sequential data structure when that
 349 method is called. Take the annotated RCU as an example, the \code{read} has no
 350 side effects on the equivalent sequential RCU.
 351
 352 With the well-defined equivalent sequential data structure, developers then relate
 353 the generated equivalent sequential history to the equivalent sequential data
 354 structure. In \TOOL, we allow developers to accomplish this by using the
 355 \code{PreCondition} and \code{PostCondition} annotations to specify the
 356 conditions to be checked before and after the API method appears to happen.
 357 For example, Line~\ref{line:rcuSpecReadInterfaceEnd} in the
 358 annotated RCU example means that when \code{read} appears to happen, it should
 359 return the same value as the current internal variables of the equivalent
 360 sequential RCU.
 361 Note that these two annotations contain legitimate C/C++
 362 expressions that only access the method call parameters, return value and the
 363 internal states of the equivalent sequential data structure.
 364
 365 \subsection{Checking Synchronization}
 366
 367 Under a relaxed memory model, compilers and processors can reorder
 368 memory operations and thus the execution can exhibit counter-intuitive
 369 behaviors. The C/C++11 memory model provides developers with memory ordering that establish synchronization, e.g., \code{acquire}, \code{release}, \code{seq\_cst}.  Synchronization
 370 serves to control which reorderings are allowed --- however,
 371 restricting reorderings comes at a runtime cost so developers must
 372 balance complexity against runtime overhead.  Checking that data
 373 structures establish proper synchronization is important to ensure
 374 that the data structures can be effectively composed with client code.
 375
 376 We generalize the notion of happens before to methods as follows.
 377 Method call $c_1$ happens-before method call $c_2$ if the invocation
 378 event of $c_1$ happens before the response event of $c_2$.  Note that
 379 by this definition two method calls can both happen before each other
 380 --- an example of this is the \code{barrier} synchronization
 381 construct.  With this notion, for example, for a correctly synchronized queue, we want
 382 an enqueue to happen before the corresponding dequeue, which avoids
 383 the synchronization problems discussed earlier in
 384 Section~\ref{sec:introNewChallenges}.
 385
 386 In order to flexibly express the synchronization between methods, we associate
 387 API methods with method call identifiers (or IDs) and happens-before conditional guard
 388 expressions. The method
 389 call ID is a C/C++ expression that computes a unique ID for the call, and if it
 390 is omitted, a default value is used. For example, in our RCU example in
 391 Figure~\ref{fig:rcuSpecExample}, both the \code{update} and \code{read} methods
 392 have the \code{this} pointer of the corresponding RCU object.
 393 The \code{HB\_Condition} component
 394 associates one happens-before conditional guard expression with a unique label.
 395 For one method, multiple conditional guard expressions are allowed to be
 396 defined, and the conditional guard expression can only access the method
 397 instance's argument values and return value.
 398
 399 After specifying the
 400 method call IDs and the \code{HB\_Condition} labels, developers
 401 can specify the synchronization as ``\code{\small{method1(HB\_condition1)}}
 402 \code{\small{->}}
 403 \code{\small{method2(HB\_condition2)}}''. When the \code{HB\_condition} is omitted, it
 404 defaults to \code{true}. The semantics of this expression is that all
 405 instances of calls to \code{method1} that satisfy the conditional guard
 406 expression \code{HB\_condition1} should happen-before all later instances (as determined by ordering points) of
 407 calls to \code{method2} that satisfy the conditional guard expression
 408 \code{HB\_condition2} such that both instances shared the same ID.
 409 The ID and happens-before conditional guard expression are important because they
 410 allow developers to impose synchronization only between specific method
 411 invocations under specific conditions. For example, in
 412 Figure~\ref{fig:rcuSpecExample}, Line~\ref{line:rcuSpecHB} specifies two
 413 synchronization rules, which together mean that the
 414 \code{update} should only establish synchronization with later \code{read} and
 415 \code{update} from
 416 the same RCU object under all circumstances.