writeup/.#specification.tex.1.47

   1 \mysection{Specification Language Design}\label{sec:specification}
   2
   3 We begin by briefly overviewing \TOOL's basic approach.  The \TOOL
   4 specification language specifies the correctness properties for
   5 concurrent data structures by establishing a correspondence with an
   6 equivalent sequential data structure, which requires: (1) defining the
   7 equivalent sequential data structure; (2) establishing an equivalent
   8 sequential history; (3) relating the behavior of the concurrent
   9 execution to the sequential history; and (4) specifying the
  10 synchronization properties between method invocations. The \TOOL
  11 specification language has the following key components:
  12
  13 \mypara{\bf 1. Equivalent Sequential Data Structure:} This component defines a
  14 sequential data structure against which the concurrent data structure
  15 will be checked.
  16
  17 \mypara{\bf 2. Defining the Equivalent Sequential History:}
  18 For real-world data structures, generally there exist two types of API methods,
  19 \textit{primitive} API methods and \textit{aggregate} API methods. Take
  20 concurrent hashtables as an example, it provides primitive API methods
  21 such as \code{put} and \code{get} that implement the core functionality of the
  22 data structure, and it also has aggregate API methods such
  23 as \code{putAll} that calls primitive API methods internally. From our
  24 experience working with our benchmark set, it is generally possible to
  25 generate a sequential history of primitive API method invocations for
  26 real-world data structures as they generally serialize on a single memory
  27 location --- while it is possible to observe partially completed aggregate
  28 API method invocations.  Therefore, our specifications will focus on the
  29 correctness of primitive API methods.
  30
  31 Borrowing from VYRD~\cite{vyrd} the concept of commit points, we allow
  32 developers to specify \textit{ordering points} --- the specific atomic
  33 operations between method invocation and response events that are used for
  34 ordering method calls.  Ordering points are a generalization of the commit
  35 points as they also use memory operations that do not commit the data structure
  36 operation to order method calls. Developers then specify ordering points to
  37 generate the equivalent sequential history.
  38
  39 \mypara{\bf 3. Specifying Correct Behaviors:} Developers use
  40 the \TOOL specification to provide a set of \textit{side effects}
  41 and \textit{assertions} to describe the desired behavior of each API
  42 method.  Side effects capture how a method updates the equivalent
  43 sequential data structure.  Assertions include both preconditions and
  44 postconditions which specify what conditions each method must satisfy
  45 before and after the method call, respectively. The specification of
  46 the side effects and assertions may make use of the states of the
  47 equivalent sequential data structure, meaning that these components
  48 can access the internal variables and methods of the equivalent sequential data
  49 structure. Additionally, they can reference the values available at
  50 the method invocation and response, i.e., the parameter values and the
  51 return value.
  52
  53 \mypara{\bf 4. Synchronization:} The synchronization  specification
  54 describes the desired happens before relations between API method
  55 invocations.  \TOOL specifications allow developers to specify the
  56 properties at the abstraction of methods.  This makes specifications
  57 cleaner, more understandable, and less error-prone because the
  58 properties do not rely on low-level implementation
  59 details. Moreover, \TOOL specifications allow developers to attach
  60 conditions to methods when specifying synchronization properties so
  61 that a method call might synchronize with another only under a
  62 specific condition.  For example, consider a spin lock with
  63 a \code{try\_lock()} method, we need to specify that only a
  64 successful \code{try\_lock()} method invocation must synchronize with the
  65 previous \code{unlock()} method invocation.
  66
  67 Figure~\ref{fig:speccfg} presents the grammar for the \TOOL specification
  68 language.  The grammar defines three types of specification annotations:
  69 \textit{structure annotations}, \textit{method annotations}, and
  70 \textit{ordering point annotations}.
  71 In the grammar, \textit{code} means legal C/C++ source code; and
  72 \textit{label} means legal C/C++ variable name.
  73 Annotations are embedded in C/C++ comments.
  74 This format does not affect the semantics of the original
  75 program and allows for the same source to be used by both a standard C/C++
  76 compiler to generate production code and for the \TOOL specification compiler to
  77 extract the \TOOL specification from the comments.  We discuss these constructs in more detail throughout the remainder of this section.
  78
  79 \begin{figure}[!tb]
  80 \vspace{-.2cm}
  81 {\scriptsize
  82         \begin{eqnarray*}
  83         \textit{\textbf{structureSpec}} & \rightarrow & \code{"@Structure\_Define"}\\
  84  && \textit{structureDefine} \ \ \ (\textit{happensBefore})?\\
  85         \textit{structureDefine} & \rightarrow &
  86 (\code{"@DeclareStruct:"} \
  87         \textit{code})\text{*}\\
  88         && \code{"@DeclareVar:"} \ \textit{code}\\
  89         && \code{"@InitVar:"} \ \textit{code}\\
  90         && (\code{"@DefineFunc:"} \ \textit{code})\text{*}\\
  91         \textit{happensBefore} & \rightarrow & \code{"@Happens\_Before:"} \
  92         (\textit{invocation} \ \code{"->"} \
  93         \textit{invocation})\textsuperscript{+}\\
  94         \textit{invocation} & \rightarrow & \textit{label} \ (\ \  \code{"("} \
  95         \textit{label} \code{")"} \ \ )?\\
  96         \textit{\textbf{methodSpec}} & \rightarrow & \code{"@Method:"} \ \textit{label}\\
  97         && \code{"@Ordering\_Point\_Set:"} \ \textit{label} \ (\code{"|"} \
  98         \textit{label})\text{*}\\
  99         && (\code{"@HB\_Condition:"} \ \textit{label} \ \code{"::"} \ \textit{code})\text{*}\\
 100         && (\code{"@ID:"} \ \textit{code})?\\
 101         && (\code{"@PreCondition:"} \ \textit{code})?\\
 102         && (\code{"@SideEffect:"} \ \textit{code})?\\
 103         && (\code{"@PostCondition:"} \ \textit{code})?\\
 104         \textit{\textbf{potentialOP}} & \rightarrow & \code{"@Potential\_Ordering\_Point:"} \ \textit{code}\\
 105         && \code{"@Label:"} \ \ \textit{label}\\
 106         \textit{\textbf{opCheck}} & \rightarrow & \code{"@Ordering\_Point\_Check:"} \ \textit{code}\\
 107         && \code{"@Potential\_Ordering\_Point\_Label:"} \ \textit{label}\\
 108         && \code{"@Label:"} \ \ \textit{label}\\
 109         \textit{\textbf{opLabelCheck}} & \rightarrow & \code{"@Ordering\_Point\_Label\_Check:"} \ \textit{code}\\
 110         && \code{"@Label:"} \ \ \textit{label}\\
 111         \textit{\textbf{opClear}} & \rightarrow & \code{"@Ordering\_Point\_Clear:"} \ \textit{code}\\
 112         && \code{"@Label:"} \ \ \textit{label}\\
 113         \textit{code} & \rightarrow & \code{<Legal C/C++ code>}
 114         \end{eqnarray*}}
 115 \vspace{-.7cm}
 116 \caption{\label{fig:speccfg}Grammar for \TOOL specification language}
 117 \vspace{-.3cm}
 118 \end{figure}
 119
 120
 121 \mysubsection{Example}
 122
 123 To make the \TOOL specification language more concrete,
 124 Figure~\ref{fig:rcuSpecExample} presents the annotated RCU example. We
 125 use {\it ordering points} to order the method invocations to construct
 126 the equivalent sequential history for the concurrent execution.  We
 127 first specify the ordering points for the \code{read}
 128 and \code{update} methods to help define the equivalent sequential
 129 history. For the
 130 \code{read} method, the load access of the \code{node} field in
 131 Line~\ref{line:rcuSpecReadLoad} is the only operation that can be an ordering
 132 point.  For the \code{update} method, there exists more than one atomic
 133 operation, however they only serve as an ordering point when it successfully uses the CAS operation
 134 to update the \code{node} field. Thus, we specify in
 135 Line~\ref{line:rcuSpecUpdateOPBegin} that the CAS operation should be the
 136 ordering point for the update operation when \code{succ} is \code{true}. We
 137 associate the methods with the two ordering points in
 138 Line~\ref{line:rcuSpecReadInterfaceOPSet} and
 139 \ref{line:rcuSpecUpdateInterfaceOPSet}.
 140
 141 Next, we specify the equivalent sequential data structure for this RCU implementation.
 142 Line~\ref{line:declVar} declares two integer fields \code{\_data} and
 143 \code{\_version} as the internal states of the equivalent sequential RCU, and
 144 Line~\ref{line:initVar} initializes both variables to \code{0}.
 145
 146 We then specify the correct behaviors of the equivalent sequential
 147 history by defining the side effects and assertions for
 148 the \code{read} and \code{update} methods.  Side effects specify how
 149 to perform the corresponding operation on the equivalent sequential
 150 data structure.  For example,
 151 Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} specifies the side
 152 effects of the \code{update} to the sequential states.
 153 Assertions specify properties that should be true of the concurrent data structure execution.
 154 For example, Line~\ref{line:rcuSpecReadInterfacePostCondition}
 155 specifies that the \code{read} method should satisfy the postcondition that the
 156 returned \code{data} and \code{version} fields should have consistent values
 157 as the internal state of the equivalent sequential data structure.
 158
 159 Our implementation of the RCU data structure is designed to establish
 160 synchronization between the \code{update} method invocation and the later
 161 \code{read} or \code{update} method invocation.  This is important, for
 162 example, if a client thread were to update an array index and use the RCU data
 163 structure to communicate the updated index to a second client thread.  Without
 164 synchronization, the second client thread may not see the update to the array
 165 index. Besides, the synchronization between two \code{update} calls ensures that
 166 later \code{read} will see the most updated values.
 167 In order to specify the synchronization, we associate \code{read} and \code{update} methods with method call identifiers
 168 (Line~\ref{line:rcuSpecReadInterfaceID} and Line~\ref{line:rcuSpecUpdateInterfaceID}), which for this example is the value of the
 169 \code{this} pointer.
 170 Together these annotations ensure that every
 171 API method call on the same RCU object will have the same method call ID.
 172 Line~\ref{line:rcuSpecHB} then specifies
 173 the synchronization property for the RCU --- any \code{update} method invocation
 174 should synchronize with all later \code{read} and \code{update} method invocations that have the
 175 same method call identifier as that \code{update} method. This guarantees that
 176 \code{update} calls should synchronize with the \code{read} and \code{update}
 177 calls of the same RCU object.
 178
 179
 180 \begin{figure}[h!]
 181 \vspace{-.2cm}
 182 \begin{lstlisting}[xleftmargin=6.0ex]
 183 class RCU {
 184         /** @Structure_Define:
 185                 @DeclareVar:/*@ \label{line:declVar} @*/ int _data, _version;
 186                 @InitVar:/*@ \label{line:initVar} @*/ _data = 0; _version = 0;
 187         @Happens_Before: Update->Read  Update->Update*//*@ \label{line:rcuSpecHB} @*/
 188         atomic<Node*> node;
 189         public:
 190         RCU() {
 191                 Node *n = new Node;
 192                 n->data = 0;
 193                 n->version = 0;
 194                 atomic_init(&node, n);
 195         }
 196         /** @Interface: Read/*@ \label{line:rcuSpecReadInterfaceLabel} @*/
 197         @Ordering_Point_Set: Read_Point/*@ \label{line:rcuSpecReadInterfaceOPSet} @*/
 198         @ID: this/*@ \label{line:rcuSpecReadInterfaceID} @*/
 199         @PostCondition:/*@ \label{line:rcuSpecReadInterfacePostCondition} @*/
 200                 _data == *data && _version == *version *//*@ \label{line:rcuSpecReadInterfaceEnd} @*/
 201         void read(int *data, int *version) {
 202                 Node *res = node.load(mo_acquire);/*@ \label{line:rcuSpecReadLoad} @*/
 203                 /** @Ordering_Point_Label_Check: true/*@ \label{line:rcuSpecReadOPBegin} @*/
 204                 @Label: Read_Point */
 205                 *data = res->data;
 206                 *version = res->version;
 207         }
 208         /** @Interface: Update
 209         @Ordering_Point_Set: Update_Point/*@ \label{line:rcuSpecUpdateInterfaceOPSet} @*/
 210         @ID: this/*@ \label{line:rcuSpecUpdateInterfaceID} @*/
 211         @SideEffect: _data = data; _version++; *//*@ \label{line:rcuSpecUpdateInterfaceSideEffect} @*/
 212         void update(int data) {
 213                 bool succ = false;
 214                 Node *newNode = new Node;/*@ \label{line:rcuSpecUpdateAlloc} @*/
 215                 Node *prev = node.load(mo_acquire);/*@ \label{line:rcuSpecUpdateLoad} @*/
 216                 do {
 217                         newNode->data = data;
 218                         newNode->version = prev->version + 1;
 219                         succ = node.compare_exchange_strong(prev,/*@\label{line:rcuSpecUpdateCAS} @*/
 220                                 newNode, mo_acq_rel, mo_acquire);
 221                         /** @Ordering_Point_Label_Check: succ/*@ \label{line:rcuSpecUpdateOPBegin} @*/
 222                         @Label: Update_Point */
 223                 } while (!succ);
 224         }
 225 };
 226
 227 \end{lstlisting}
 228 \vspace{-.2cm}
 229 \caption{\label{fig:rcuSpecExample}Annotated RCU specification example}
 230 \vspace{-.3cm}
 231 \end{figure}
 232
 233 \subsection{Defining the Equivalent Sequential History}
 234
 235 While defining the equivalent sequential history for concurrent executions is well studied in the context of the
 236 SC memory model, optimizations that developers typically use in the
 237 context of weaker memory models create the following new challenges when
 238 we generate the equivalent sequential history using ordering points.
 239
 240 \begin{itemize}
 241 \item {\bf Absence of a Meaningful Trace or Total Order:}
 242 For the SC memory model, an execution can be represented by a simple
 243 interleaving of all the memory operations, where each load operation reads from
 244 the last store operation to the same location in the trace.  However, under a
 245 relaxed memory model like C/C++11, the interleaving does not uniquely define an
 246 execution because a given load operation can potentially read from many
 247 different store operations in the interleaving. Therefore, we have to rely on
 248 the intrinsic relations between ordering points such as \textit{reads from} and
 249 \textit{modification order} to order method calls.
 250
 251 \item {\bf Lack of Program Order Preserving Sequential History:}
 252 Moreover, as discussed in Section~\ref{sec:exampleMotivation}, in
 253 general it is not possible to arrange an execution in any totally
 254 ordered sequential history that preserves program order.
 255
 256 A key insight is that many concurrent data structures' API methods
 257 have a commit point, which is a single memory operation that makes the
 258 update visible to other threads and that also serves as an ordering
 259 point.  When two data structure operations have a dependence, it is
 260 often the case that their respective commit points are both
 261 conflicting accesses to the same memory location.  In this case, the
 262 modification order provided by C/C++ is sufficient to order these
 263 operations since modification order is guaranteed to be consistent
 264 with the happens before relation (and therefore also the sequenced
 265 before relation).
 266
 267 For cases where the method calls are independent, such as
 268 a \code{put(X, 1)} followed by a
 269 \code{get(Y)} in a hashtable where \code{X} and \code{Y} are different keys,
 270 the lack of an ordering is not a problem since those methods commute.
 271 \end{itemize}
 272
 273 \mypara{\bf Ordering Point Annotations:} In many cases, it is not
 274 possible to determine whether a given atomic operation is an ordering
 275 point until later in the execution. For example,
 276 some internal methods may be called by multiple API methods. In this
 277 case, the same atomic operation can be identified as a potential
 278 ordering point for multiple API methods, and each API method later has
 279 a checking annotation to verify whether it was a real ordering
 280 point.  Therefore, the \TOOL specification separates the definition of ordering
 281 points as follows:
 282 \begin{enumerate}
 283 \vspace{-.1cm}
 284 \item {\code{Potential\_Ordering\_Point} annotation:} The labeling of ordering
 285 points that identifies the location of a potential ordering point.
 286 \vspace{-.2cm}
 287 \item {\code{Ordering\_Point\_Check} annotation:} The checking of ordering
 288 points that checks at a later point whether a potential ordering point was
 289 really an ordering point.
 290 \end{enumerate}
 291 \vspace{-.1cm}
 292
 293 These two constructs together identify ordering points.  For
 294 example, assume that $A$ is an atomic operation that is marked as a
 295 potential ordering point with the label \code{LabelA} under the
 296 condition \code{ConditionA}.  The developer would write a
 297 \code{Potential\_Ordering\_Point} annotation with the label
 298 \code{LabelA} and then use the label \code{LabelA} in an
 299 \code{Ordering\_Point\_Check} annotation at a later point.
 300
 301 The \code{Ordering\_Point\_Label\_Check} annotation combines
 302 the \code{Potential\_Ordering\_Point} and the
 303 \code{Ordering\_Point\_Check} annotations, and makes
 304 specifications simpler for the use case that the ordering point is known immediately.
 305 For example, in Line~\ref{line:rcuSpecReadOPBegin} of
 306 Figure~\ref{fig:rcuSpecExample}, we use the \code{Ordering\_Point\_Label\_Check} annotation to
 307 identify the ordering point for \code{read} because we know the load operation
 308 in Line~\ref{line:rcuSpecReadLoad} is an ordering point at the time it is
 309 executed.
 310
 311 Some data structure operations may require multiple ordering points.
 312 For example, consider a transaction implementation that first attempts
 313 to lock all of the involved objects (dropping the locks and retrying
 314 if it fails to acquire a lock), performs the updates, and then releases
 315 the locks.  To order such transactions in a relaxed memory model, we must consider all of the
 316 locks it acquires and not just the last lock.  Thus, we allow a
 317 method invocation to have more than one ordering point, and the
 318 additional ordering points serve to order the operation with respect to
 319 multiple different memory locations.  For the transaction example, it
 320 may be necessary to retry the acquisition of locks.  To support this
 321 scenario, the \code{Ordering\_Point\_Clear} annotation removes all
 322 previous ordering points when it satisfies a specific condition.
 323
 324 Moreover, when an API method calls another API method, they can share ordering points.  In
 325 that case, \TOOL requires that at that ordering point, the concurrent data
 326 structure should satisfy the precondition and postcondition of both the inner
 327 and outer API methods.
 328
 329 \subsection{Checking the IO Behavior of Methods}
 330
 331 With the specified ordering points, \TOOL is able to generate the equivalent
 332 sequential history. Developers then need to define the equivalent sequential
 333 data structure. For example, in Line~\ref{line:declVar} and
 334 ~\ref{line:initVar} of the annotated RCU example, we use the
 335 \code{Structure\_define} annotation to define the equivalent sequential RCU
 336 by specifying the internal states as two integers, \code{\_verion} and \code{\_data}. In the
 337 \code{Structure\_define} annotation, developers can also specify definitions for customized
 338 structs and methods for convenience. We design these annotations in such a way
 339 that developers can
 340 write specifications in C/C++ code such that they do not have to learn a new
 341 specification language.
 342
 343 After defining the internal states and methods of the
 344 equivalent data structure, developers use the \code{SideEffect} annotation to
 345 define the corresponding sequential API methods, which should contain the action
 346 to be performed on the equivalent sequential data structure. For example, in
 347 Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} of the annotated RCU example,
 348 we use \code{SideEffect} to specify that when we execute the \code{update}
 349 method on the equivalent sequential RCU, we should update the
 350 internal states of \code{\_version} and \code{\_data} accordingly. When
 351 the \code{SideEffect} annotation is omitted for a specific API method, it means
 352 that no side effects will happen on the sequential data structure when that
 353 method is called. Take the annotated RCU as an example, the \code{read} has no
 354 side effects on the equivalent sequential RCU.
 355
 356 With the well-defined equivalent sequential data structure, developers then relate
 357 the generated equivalent sequential history to the equivalent sequential data
 358 structure. In \TOOL, we allow developers to accomplish this by using the
 359 \code{PreCondition} and \code{PostCondition} annotations to specify the
 360 conditions to be checked before and after the API method appears to happen.
 361 For example, Line~\ref{line:rcuSpecReadInterfaceEnd} in the
 362 annotated RCU example means that when \code{read} appears to happen, it should
 363 return the same value as the current interal variables of the equivalent
 364 sequential RCU.
 365 Note that these two annotations contain legitimate C/C++
 366 expressions that only access the method call parameters, return value and the
 367 internal states of the equivalent sequential data structure.
 368
 369 \subsection{Checking Synchronization}
 370
 371 Under a relaxed memory model, compilers and processors can reorder
 372 memory operations and thus the execution can exhibit counter-intuitive
 373 behaviors. The C/C++11 memory model provides the developer with memory ordering that establish synchronization, e.g., \code{acquire}, \code{release}, \code{seq\_cst}.  Synchronization
 374 serves to control which reorderings are allowed --- however,
 375 restricting reorderings comes at a runtime cost so developers must
 376 balance complexity against runtime overhead.  Checking that data
 377 structures establish proper synchronization is important to ensure
 378 that the data structures can be effectively composed with client code.
 379
 380 We generalize the notion of happens before to methods as follows.
 381 Method call $c_1$ happens-before method call $c_2$ if the invocation
 382 event of $c_1$ happens before the response event of $c_2$.  Note that
 383 by this definition two method calls can both happen before each other
 384 --- an example of this is the \code{barrier} synchronization
 385 construct.  For example, for a correctly synchronized queue, we want
 386 an enqueue to happen before the corresponding dequeue, which avoids
 387 the synchronization problems discussed earlier in
 388 Section~\ref{sec:introNewChallenges}.
 389
 390 In order to flexibly express the synchronization between methods, we associate
 391 API methods with method call identifiers (or IDs) and happens-before conditional guard
 392 expressions. The method
 393 call ID is a C/C++ expression that computes a unique ID for the call, and if it
 394 is omitted, a default value is used. For example, in our RCU example in
 395 Figure~\ref{fig:rcuSpecExample}, method \code{update} and \code{read} both
 396 have the \code{this} pointer of the corresponding RCU object.
 397 The \code{HB\_Condition} component
 398 associates one happens-before conditional guard expression with a unique label.
 399 For one method, multiple conditional guard expressions are allowed to be
 400 defined, and the conditional guard expression can only access the method
 401 instance's argument values and return value.
 402
 403 After specifying the
 404 method call IDs and the \code{HB\_Condition} labels, developers
 405 can specify the synchronization as ``\code{method1(HB\_condition1)} \code{->}
 406 \code{method2(HB\_condition2)}''. When the \code{HB\_condition} is omitted, it
 407 defaults to \code{true}. The semantics of this expression is that all
 408 instances of calls to \code{method1} that satisfy the conditional guard
 409 expression \code{HB\_condition1} should happen-before all later instances (as determined by ordering points) of
 410 calls to \code{method2} that satisfy the conditional guard expression
 411 \code{HB\_condition2} such that both instances shared the same ID.
 412 The ID and happens-before conditional guard expression are important because they
 413 allow developers to impose synchronization only between specific method
 414 invocations under specific conditions. For example, in
 415 Figure~\ref{fig:rcuSpecExample}, Line~\ref{line:rcuSpecHB} specifies two
 416 synchronization rules, which together mean that the
 417 \code{update} should only establish synchronization with later \code{read} and
 418 \code{update} from
 419 the same RCU object under all circumstances.