\mysection{Specification Language Design}\label{sec:specification}

We begin by briefly overviewing \TOOL's basic approach.  The \TOOL
specification language specifies the correctness properties for
concurrent data structures by establishing a correspondence with an
equivalent sequential data structure, which requires: (1) defining the
equivalent sequential data structure; (2) establishing an equivalent
sequential history; (3) relating the behavior of the concurrent
execution to the sequential history; and (4) specifying the
synchronization properties between method invocations. The \TOOL
specification language has the following key components:

\mypara{\bf 1. Equivalent Sequential Data Structure:} This component defines a 
sequential data structure against which the concurrent data structure
will be checked.

\mypara{\bf 2. Defining the Equivalent Sequential History:}
For real-world data structures, generally there exist two types of API methods,
\textit{primitive} API methods and \textit{aggregate} API methods. Take
concurrent hashtables as an example, it provides primitive API methods
such as \code{put} and \code{get} that implement the core functionality of the
data structure, and it also has aggregate API methods such
as \code{putAll} that calls primitive API methods internally. From our
experience working with our benchmark set, it is generally possible to
generate a sequential history of primitive API method invocations for
real-world data structures as they generally serialize on a single memory
location --- while it is possible to observe partially completed aggregate
API method invocations.  Therefore, our specifications will focus on the
correctness of primitive API methods.

Borrowing from VYRD~\cite{vyrd} the concept of commit points, we allow
developers to specify \textit{ordering points} --- the specific atomic
operations between method invocation and response events that are used for
ordering method calls.  Ordering points are a generalization of the commit
points as they also use memory operations that do not commit the data structure
operation to order method calls. Developers then specify ordering points to
generate the equivalent sequential history.

\mypara{\bf 3. Specifying Correct Behaviors:} Developers use
the \TOOL specification to provide a set of \textit{side effects}
and \textit{assertions} to describe the desired behavior of each API
method.  Side effects capture how a method updates the equivalent
sequential data structure.  Assertions include both preconditions and
postconditions which specify what conditions each method must satisfy
before and after the method call, respectively. The specification of
the side effects and assertions may make use of the states of the
equivalent sequential data structure, meaning that these components
can access the internal variables and methods of the equivalent sequential data
structure. Additionally, they can reference the values available at
the method invocation and response, i.e., the parameter values and the
return value.

\mypara{\bf 4. Synchronization:} The synchronization  specification
describes the desired happens before relations between API method
invocations.  \TOOL specifications allow developers to specify the
properties at the abstraction of methods.  This makes specifications
cleaner, more understandable, and less error-prone because the
properties do not rely on low-level implementation
details. Moreover, \TOOL specifications allow developers to attach
conditions to methods when specifying synchronization properties so
that a method call might synchronize with another only under a
specific condition.  For example, consider a spin lock with
a \code{try\_lock()} method, we need to specify that only a
successful \code{try\_lock()} method invocation must synchronize with the
previous \code{unlock()} method invocation.

Figure~\ref{fig:speccfg} presents the grammar for the \TOOL specification
language.  The grammar defines three types of specification annotations: 
\textit{structure annotations}, \textit{method annotations}, and
\textit{ordering point annotations}.
In the grammar, \textit{code} means legal C/C++ source code; and
\textit{label} means legal C/C++ variable name.
Annotations are embedded in C/C++ comments.
This format does not affect the semantics of the original
program and allows for the same source to be used by both a standard C/C++
compiler to generate production code and for the \TOOL specification compiler to
extract the \TOOL specification from the comments.  We discuss these constructs in more detail throughout the remainder of this section.

\begin{figure}[!tb]
\vspace{-.2cm}
{\scriptsize
	\begin{eqnarray*}
	\textit{\textbf{structureSpec}} & \rightarrow & \code{"@Structure\_Define"}\\ 
 && \textit{structureDefine} \ \ \ (\textit{happensBefore})?\\
	\textit{structureDefine} & \rightarrow & 
(\code{"@DeclareStruct:"} \ 
	\textit{code})\text{*}\\
	&& \code{"@DeclareVar:"} \ \textit{code}\\
	&& \code{"@InitVar:"} \ \textit{code}\\
	&& (\code{"@DefineFunc:"} \ \textit{code})\text{*}\\
	\textit{happensBefore} & \rightarrow & \code{"@Happens\_Before:"} \
	(\textit{invocation} \ \code{"->"} \
	\textit{invocation})\textsuperscript{+}\\
	\textit{invocation} & \rightarrow & \textit{label} \ (\ \  \code{"("} \
	\textit{label} \code{")"} \ \ )?\\
	\textit{\textbf{methodSpec}} & \rightarrow & \code{"@Method:"} \ \textit{label}\\
	&& \code{"@Ordering\_Point\_Set:"} \ \textit{label} \ (\code{"|"} \
	\textit{label})\text{*}\\
	&& (\code{"@HB\_Condition:"} \ \textit{label} \ \code{"::"} \ \textit{code})\text{*}\\
	&& (\code{"@ID:"} \ \textit{code})?\\
	&& (\code{"@PreCondition:"} \ \textit{code})?\\
	&& (\code{"@SideEffect:"} \ \textit{code})?\\
	&& (\code{"@PostCondition:"} \ \textit{code})?\\
	\textit{\textbf{potentialOP}} & \rightarrow & \code{"@Potential\_Ordering\_Point:"} \ \textit{code}\\
	&& \code{"@Label:"} \ \ \textit{label}\\
	\textit{\textbf{opCheck}} & \rightarrow & \code{"@Ordering\_Point\_Check:"} \ \textit{code}\\
	&& \code{"@Potential\_Ordering\_Point\_Label:"} \ \textit{label}\\
	&& \code{"@Label:"} \ \ \textit{label}\\
	\textit{\textbf{opLabelCheck}} & \rightarrow & \code{"@Ordering\_Point\_Label\_Check:"} \ \textit{code}\\
	&& \code{"@Label:"} \ \ \textit{label}\\
	\textit{\textbf{opClear}} & \rightarrow & \code{"@Ordering\_Point\_Clear:"} \ \textit{code}\\
	&& \code{"@Label:"} \ \ \textit{label}\\
	\textit{code} & \rightarrow & \code{<Legal C/C++ code>}
	\end{eqnarray*}}
\vspace{-.7cm}
\caption{\label{fig:speccfg}Grammar for \TOOL specification language}
\vspace{-.3cm}
\end{figure}


\mysubsection{Example}

To make the \TOOL specification language more concrete,
Figure~\ref{fig:rcuSpecExample} presents the annotated RCU example. We
use {\it ordering points} to order the method invocations to construct
the equivalent sequential history for the concurrent execution.  We
first specify the ordering points for the \code{read}
and \code{update} methods to help define the equivalent sequential
history. For the
\code{read} method, the load access of the \code{node} field in
Line~\ref{line:rcuSpecReadLoad} is the only operation that can be an ordering
point.  For the \code{update} method, there exists more than one atomic
operation, however they only serve as an ordering point when it successfully uses the CAS operation
to update the \code{node} field. Thus, we specify in
Line~\ref{line:rcuSpecUpdateOPBegin} that the CAS operation should be the
ordering point for the update operation when \code{succ} is \code{true}. We
associate the methods with the two ordering points in 
Line~\ref{line:rcuSpecReadInterfaceOPSet} and
\ref{line:rcuSpecUpdateInterfaceOPSet}.

Next, we specify the equivalent sequential data structure for this RCU implementation.
Line~\ref{line:declVar} declares two integer fields \code{\_data} and
\code{\_version} as the internal states of the equivalent sequential RCU, and
Line~\ref{line:initVar} initializes both variables to \code{0}.

We then specify the correct behaviors of the equivalent sequential
history by defining the side effects and assertions for
the \code{read} and \code{update} methods.  Side effects specify how
to perform the corresponding operation on the equivalent sequential
data structure.  For example,
Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} specifies the side
effects of the \code{update} to the sequential states.
Assertions specify properties that should be true of the concurrent data structure execution.
For example, Line~\ref{line:rcuSpecReadInterfacePostCondition}
specifies that the \code{read} method should satisfy the postcondition that the
returned \code{data} and \code{version} fields should have consistent values
as the internal state of the equivalent sequential data structure.

Our implementation of the RCU data structure is designed to establish
synchronization between the \code{update} method invocation and the later
\code{read} or \code{update} method invocation.  This is important, for
example, if a client thread were to update an array index and use the RCU data
structure to communicate the updated index to a second client thread.  Without
synchronization, the second client thread may not see the update to the array
index. Besides, the synchronization between two \code{update} calls ensures that
later \code{read} will see the most updated values.
In order to specify the synchronization, we associate \code{read} and \code{update} methods with method call identifiers
(Line~\ref{line:rcuSpecReadInterfaceID} and Line~\ref{line:rcuSpecUpdateInterfaceID}), which for this example is the value of the
\code{this} pointer.
Together these annotations ensure that every
API method call on the same RCU object will have the same method call ID. 
Line~\ref{line:rcuSpecHB} then specifies
the synchronization property for the RCU --- any \code{update} method invocation
should synchronize with all later \code{read} and \code{update} method invocations that have the
same method call identifier as that \code{update} method. This guarantees that
\code{update} calls should synchronize with the \code{read} and \code{update}
calls of the same RCU object.


\begin{figure}[h!]
\vspace{-.2cm}
\begin{lstlisting}[xleftmargin=6.0ex]
class RCU {
	/** @Structure_Define:
		@DeclareVar:/*@ \label{line:declVar} @*/ int _data, _version;
		@InitVar:/*@ \label{line:initVar} @*/ _data = 0; _version = 0;
	@Happens_Before: Update->Read  Update->Update*//*@ \label{line:rcuSpecHB} @*/
	atomic<Node*> node;
	public:
	RCU() {
		Node *n = new Node;
		n->data = 0;
		n->version = 0;
		atomic_init(&node, n);	
	}
	/** @Interface: Read/*@ \label{line:rcuSpecReadInterfaceLabel} @*/
	@Ordering_Point_Set: Read_Point/*@ \label{line:rcuSpecReadInterfaceOPSet} @*/
	@ID: this/*@ \label{line:rcuSpecReadInterfaceID} @*/
	@PostCondition:/*@ \label{line:rcuSpecReadInterfacePostCondition} @*/
		_data == *data && _version == *version *//*@ \label{line:rcuSpecReadInterfaceEnd} @*/
	void read(int *data, int *version) {
		Node *res = node.load(mo_acquire);/*@ \label{line:rcuSpecReadLoad} @*/
		/** @Ordering_Point_Label_Check: true/*@ \label{line:rcuSpecReadOPBegin} @*/
		@Label: Read_Point */
		*data = res->data;
		*version = res->version;
	}
	/** @Interface: Update 
	@Ordering_Point_Set: Update_Point/*@ \label{line:rcuSpecUpdateInterfaceOPSet} @*/
	@ID: this/*@ \label{line:rcuSpecUpdateInterfaceID} @*/
	@SideEffect: _data = data; _version++; *//*@ \label{line:rcuSpecUpdateInterfaceSideEffect} @*/
	void update(int data) {
		bool succ = false;
		Node *newNode = new Node;/*@ \label{line:rcuSpecUpdateAlloc} @*/
		Node *prev = node.load(mo_acquire);/*@ \label{line:rcuSpecUpdateLoad} @*/
		do {
			newNode->data = data;
			newNode->version = prev->version + 1;
			succ = node.compare_exchange_strong(prev,/*@\label{line:rcuSpecUpdateCAS} @*/
				newNode, mo_acq_rel, mo_acquire);
			/** @Ordering_Point_Label_Check: succ/*@ \label{line:rcuSpecUpdateOPBegin} @*/
			@Label: Update_Point */
		} while (!succ);
	}
};

\end{lstlisting}
\vspace{-.2cm}
\caption{\label{fig:rcuSpecExample}Annotated RCU specification example}
\vspace{-.3cm}
\end{figure}

\subsection{Defining the Equivalent Sequential History}

While defining the equivalent sequential history for concurrent executions is well studied in the context of the
SC memory model, optimizations that developers typically use in the
context of weaker memory models create the following new challenges when
we generate the equivalent sequential history using ordering points.

\begin{itemize}
\item {\bf Absence of a Meaningful Trace or Total Order:}
For the SC memory model, an execution can be represented by a simple
interleaving of all the memory operations, where each load operation reads from
the last store operation to the same location in the trace.  However, under a
relaxed memory model like C/C++11, the interleaving does not uniquely define an
execution because a given load operation can potentially read from many
different store operations in the interleaving. Therefore, we have to rely on
the intrinsic relations between ordering points such as \textit{reads from} and
\textit{modification order} to order method calls.

\item {\bf Lack of Program Order Preserving Sequential History:}
Moreover, as discussed in Section~\ref{sec:exampleMotivation}, in
general it is not possible to arrange an execution in any totally
ordered sequential history that preserves program order.

A key insight is that many concurrent data structures' API methods
have a commit point, which is a single memory operation that makes the
update visible to other threads and that also serves as an ordering
point.  When two data structure operations have a dependence, it is
often the case that their respective commit points are both
conflicting accesses to the same memory location.  In this case, the
modification order provided by C/C++ is sufficient to order these
operations since modification order is guaranteed to be consistent
with the happens before relation (and therefore also the sequenced
before relation).

For cases where the method calls are independent, such as
a \code{put(X, 1)} followed by a
\code{get(Y)} in a hashtable where \code{X} and \code{Y} are different keys,
the lack of an ordering is not a problem since those methods commute.
\end{itemize}

\mypara{\bf Ordering Point Annotations:} In many cases, it is not
possible to determine whether a given atomic operation is an ordering 
point until later in the execution. For example, 
some internal methods may be called by multiple API methods. In this
case, the same atomic operation can be identified as a potential
ordering point for multiple API methods, and each API method later has
a checking annotation to verify whether it was a real ordering
point.  Therefore, the \TOOL specification separates the definition of ordering
points as follows:
\begin{enumerate}
\vspace{-.1cm}
\item {\code{Potential\_Ordering\_Point} annotation:} The labeling of ordering
points that identifies the location of a potential ordering point.
\vspace{-.2cm}
\item {\code{Ordering\_Point\_Check} annotation:} The checking of ordering
points that checks at a later point whether a potential ordering point was
really an ordering point.
\end{enumerate}
\vspace{-.1cm}

These two constructs together identify ordering points.  For
example, assume that $A$ is an atomic operation that is marked as a
potential ordering point with the label \code{LabelA} under the
condition \code{ConditionA}.  The developer would write a
\code{Potential\_Ordering\_Point} annotation with the label
\code{LabelA} and then use the label \code{LabelA} in an
\code{Ordering\_Point\_Check} annotation at a later point.

The \code{Ordering\_Point\_Label\_Check} annotation combines
the \code{Potential\_Ordering\_Point} and the
\code{Ordering\_Point\_Check} annotations, and makes
specifications simpler for the use case that the ordering point is known immediately.
For example, in Line~\ref{line:rcuSpecReadOPBegin} of
Figure~\ref{fig:rcuSpecExample}, we use the \code{Ordering\_Point\_Label\_Check} annotation to
identify the ordering point for \code{read} because we know the load operation
in Line~\ref{line:rcuSpecReadLoad} is an ordering point at the time it is
executed.

Some data structure operations may require multiple ordering points.
For example, consider a transaction implementation that first attempts
to lock all of the involved objects (dropping the locks and retrying
if it fails to acquire a lock), performs the updates, and then releases
the locks.  To order such transactions in a relaxed memory model, we must consider all of the
locks it acquires and not just the last lock.  Thus, we allow a
method invocation to have more than one ordering point, and the
additional ordering points serve to order the operation with respect to
multiple different memory locations.  For the transaction example, it
may be necessary to retry the acquisition of locks.  To support this
scenario, the \code{Ordering\_Point\_Clear} annotation removes all
previous ordering points when it satisfies a specific condition.

Moreover, when an API method calls another API method, they can share ordering points.  In
that case, \TOOL requires that at that ordering point, the concurrent data
structure should satisfy the precondition and postcondition of both the inner
and outer API methods.

\subsection{Checking the IO Behavior of Methods}

With the specified ordering points, \TOOL is able to generate the equivalent
sequential history. Developers then need to define the equivalent sequential
data structure. For example, in Line~\ref{line:declVar} and
~\ref{line:initVar} of the annotated RCU example, we use the
\code{Structure\_define} annotation to define the equivalent sequential RCU
by specifying the internal states as two integers, \code{\_verion} and \code{\_data}. In the
\code{Structure\_define} annotation, developers can also specify definitions for customized
structs and methods for convenience. We design these annotations in such a way
that developers can  
write specifications in C/C++ code such that they do not have to learn a new
specification language.

After defining the internal states and methods of the
equivalent data structure, developers use the \code{SideEffect} annotation to
define the corresponding sequential API methods, which should contain the action
to be performed on the equivalent sequential data structure. For example, in
Line~\ref{line:rcuSpecUpdateInterfaceSideEffect} of the annotated RCU example,
we use \code{SideEffect} to specify that when we execute the \code{update}
method on the equivalent sequential RCU, we should update the
internal states of \code{\_version} and \code{\_data} accordingly. When
the \code{SideEffect} annotation is omitted for a specific API method, it means
that no side effects will happen on the sequential data structure when that
method is called. Take the annotated RCU as an example, the \code{read} has no
side effects on the equivalent sequential RCU.

With the well-defined equivalent sequential data structure, developers then relate
the generated equivalent sequential history to the equivalent sequential data
structure. In \TOOL, we allow developers to accomplish this by using the
\code{PreCondition} and \code{PostCondition} annotations to specify the
conditions to be checked before and after the API method appears to happen.
For example, Line~\ref{line:rcuSpecReadInterfaceEnd} in the
annotated RCU example means that when \code{read} appears to happen, it should
return the same value as the current interal variables of the equivalent
sequential RCU.
Note that these two annotations contain legitimate C/C++
expressions that only access the method call parameters, return value and the
internal states of the equivalent sequential data structure.

\subsection{Checking Synchronization}

Under a relaxed memory model, compilers and processors can reorder
memory operations and thus the execution can exhibit counter-intuitive
behaviors. The C/C++11 memory model provides the developer with memory ordering that establish synchronization, e.g., \code{acquire}, \code{release}, \code{seq\_cst}.  Synchronization 
serves to control which reorderings are allowed --- however,
restricting reorderings comes at a runtime cost so developers must
balance complexity against runtime overhead.  Checking that data
structures establish proper synchronization is important to ensure
that the data structures can be effectively composed with client code.

We generalize the notion of happens before to methods as follows.
Method call $c_1$ happens-before method call $c_2$ if the invocation
event of $c_1$ happens before the response event of $c_2$.  Note that
by this definition two method calls can both happen before each other
--- an example of this is the \code{barrier} synchronization
construct.  For example, for a correctly synchronized queue, we want
an enqueue to happen before the corresponding dequeue, which avoids
the synchronization problems discussed earlier in
Section~\ref{sec:introNewChallenges}.

In order to flexibly express the synchronization between methods, we associate
API methods with method call identifiers (or IDs) and happens-before conditional guard
expressions. The method
call ID is a C/C++ expression that computes a unique ID for the call, and if it
is omitted, a default value is used. For example, in our RCU example in
Figure~\ref{fig:rcuSpecExample}, method \code{update} and \code{read} both
have the \code{this} pointer of the corresponding RCU object.
The \code{HB\_Condition} component
associates one happens-before conditional guard expression with a unique label.
For one method, multiple conditional guard expressions are allowed to be
defined, and the conditional guard expression can only access the method
instance's argument values and return value. 

After specifying the
method call IDs and the \code{HB\_Condition} labels, developers
can specify the synchronization as ``\code{method1(HB\_condition1)} \code{->}
\code{method2(HB\_condition2)}''. When the \code{HB\_condition} is omitted, it
defaults to \code{true}. The semantics of this expression is that all
instances of calls to \code{method1} that satisfy the conditional guard
expression \code{HB\_condition1} should happen-before all later instances (as determined by ordering points) of
calls to \code{method2} that satisfy the conditional guard expression
\code{HB\_condition2} such that both instances shared the same ID.
The ID and happens-before conditional guard expression are important because they
allow developers to impose synchronization only between specific method
invocations under specific conditions. For example, in
Figure~\ref{fig:rcuSpecExample}, Line~\ref{line:rcuSpecHB} specifies two
synchronization rules, which together mean that the
\code{update} should only establish synchronization with later \code{read} and
\code{update} from
the same RCU object under all circumstances.