1 \mysection{Evaluation}\label{sec:evaluation}
3 We have implemented \TOOL. Our evaluation focuses on the following
4 questions: (1) How expressive is \TOOL for specifying
5 the correctness properties of real-world concurrent data structures? (2) How easy is it
7 is the performance of \TOOL? (4) How effective was \TOOL
10 In order to evaluate \TOOL, we have gathered a contention
11 free lock, two types of concurrent queues, and a work stealing
12 deque~\cite{ppoppworkstealing}. As C/C++11 is relatively new
13 there are no C/C++11 implementations for many concurrent data
14 structures, thus we ported several data structures. The Linux
15 kernel's reader-writer spinlock and the Michael Scott queue were originally ported
16 for the \cdschecker benchmark suite. We also ported an RCU
17 implementation and Cliff Click's hashtable from its Java
18 implementation~\cite{clickhashtable}. We report execution times on an
21 \mysubsection{Expressiveness}\label{sec:expressiveness}
23 In this section, we evaluate the expressiveness of \TOOL by reporting
24 our experiences writing specifications for a range of concurrent data
27 \mypara{\bf Lockfree hashtable:} We ported Cliff Click's hashtable,
28 which supports simultaneous lookups and updates by multiple threads as
29 well as concurrent table resizing. The implementation uses an array
30 of atomic variables to store the key/value slots, and uses
31 acquire/release synchronization to establish the synchronization
32 between hashtable accesses.
34 Hashtable updates consist of two CAS operations --- one to claim
35 the key slot and one to update the value. When a \code{put} method
36 invocation successfully updates both the key and value, the
37 update is visible to other threads. Thus, both CAS operations are
38 ordering points for the \code{put} method, and we annotate both of
39 them as potential ordering points. The \code{get} method is ordered after
40 an invocation of the \code{put} only if it sees both the key and
41 value updates. Thus we annotate an ordering point for the
42 key read only if the key is null. We also
43 annotate an ordering point for the value read if
44 it reads the value slot. The test driver has two threads both of
45 which update and read the value for the same key.
47 \mypara{\bf RCU:} As discussed in the example, this is a
48 synchronization mechanism used in the Linux kernel that allows concurrent reads
49 and updates. We ran this benchmark with four threads, two update the data
50 structure and two read the data structure.
52 \mypara{\bf Chase-Lev Deque:} This is a bug-fixed version of a published C11 adaptation
53 of the Chase-Lev deque~\cite{ppoppworkstealing}.
54 It maintains a top and bottom index to a shared array of
55 references. In terms of synchronization,
56 when pushing an item into the sequential deque,
57 we attach a unique ID tag to that element. When stealing or taking an item, we
58 use that tag as the ID of the method call. Thus, we have (push, steal) or
59 (push, take) pairs that have the same method call ID.
60 In our test driver, one thread
61 pushes 3 items and takes 2 items while the other one steals 1
64 \mypara{\bf Linux Reader-Writer Lock:} A reader-writer lock allows either
65 multiple concurrent readers or one exclusive writer. We can abstract
66 it with a boolean \code{writer\_lock} representing
67 whether the writer lock is held and an integer \code{reader\_cnt}
68 representing the number of threads that are reading.
69 We test this benchmark with a single lock that
70 protects shared variables. We have two threads that read and write
71 the shared variables under the protection of a read lock and a write
74 \mypara{\bf MCS Lock:} This benchmark is an implementation of
75 the Mellor-Crummey and Scott lock~\cite{mcs-lock,
76 mcs-lock-url}. This lock queues waiting threads in a FIFO.
77 Our test driver utilizes two
78 threads that read and write shared variables with the protection of
81 \mypara{\bf M\&S Queue:} This benchmark is an adaptation of the
82 Michael and Scott lock free queue~\cite{lockfreequeue} to the C/C++
83 memory model. We ran with two threads, one of which enqueues and the
84 other of which dequeues an item.
86 \mypara{\bf SPSC Queue:} This is a lock-free single-producer,
87 single-consumer queue.
88 We used a test driver that has two
89 threads --- one enqueues a value and the other dequeues
92 \mypara{\bf MPMC Queue:} This is a multiple-producer, multiple-consumer
93 queue. Producers call \code{write\_prepare} to obtain a free
94 slot, update the slot, and call \code{write\_publish} to
95 publish it. Consumers call \code{read\_fetch} to obtain
96 a valid slot, read the slot, and call \code{read\_consume}
97 to free it. The specification focuses on the synchronization
98 properties which require \code{write\_publish} to synchronize with
99 \code{read\_fetch} to ensure the data integrity and
100 \code{read\_consume} to synchronize with \code{write\_prepare} to
101 ensure that slots are not prematurely recycled. The test driver
102 contains two threads, each of which enqueue and dequeue
105 \subsection{Ease of Use} In addition to expressiveness, it is also important for
106 specification languages to be easy to use. In our experience using \TOOL
107 to specify the real-world data
108 structures in our benchmark set, we found that \TOOL was easy to use. \TOOL specifications have only three
109 parts --- equivalent sequential data structures, ordering points, and
110 synchronization properties, and we explain the reasons as follows. (1) Specifying
111 sequential data structures is easy and straightforward, and developers can often
112 just use an off-the-shelf implementation from a library.
113 (2) When developers specify ordering points, they
114 only need to know what operations order methods without needing to specify
115 the subtle reasoning about the corner cases involving
116 interleavings and reorderings introduced by relaxed memory models. Take the
117 Michael \& Scott queue as an example, we can easily order enqueuers with the
118 point when \code{enqueue} loads the \code{tail} pointer right before inserting the new
120 (3) For synchronization, the fact that we allow specifying
121 synchronization at the abstraction of methods makes it easy. For example, we only used
122 36 lines to specify synchronization in a total 1,033 lines of
123 code (omitting blanks and comments).
130 \resizebox{\columnwidth}{!}{
131 \begin{tabular}{lrrr}
133 Benchmark & \# Executions & \# Feasible & Total Time (s) \\
135 Chase-Lev Deque & 1,365 & 232 & 0.15 \\
136 SPSC Queue & 19 & 15 & 0.01 \\
137 RCU & 1269 & 756 & 0.11 \\
138 Lockfree Hashtable & 30,941 & 25,731 & 11.39 \\
139 MCS Lock & 19,501 & 13,546 & 2.62 \\
140 MPMC Queue & 170,220 & 93,224 & 45.63 \\ % 1r2w
141 M\&S Queue & 168 & 114 & 0.05 \\
142 Linux RW Lock & 148,053 & 405 & 13.06 \\
143 % Results from: 2014-03-25-13:26
144 % Flags: -y -m 2 -u 3
149 \caption{\label{fig:benchmark_results} Benchmark results}
154 \mysubsection{Performance}\label{sec:performance}
156 Figure~\ref{fig:benchmark_results} presents performance results for
157 \TOOL on our benchmark set. We list the number of
158 the total executions that \cdschecker has explored, the number of the feasible
159 executions that we checked the specification for, and the time the benchmark
160 took to complete. All of our benchmarks complete within one minute
161 and most take less than 3 seconds to complete.
163 \mysubsection{Finding Bugs}\label{sec:findbugs}
170 \resizebox{\columnwidth}{!}{
171 \begin{tabular}{l|ccccc|c}
173 Benchmark & \# Injection & \# DR & \# UL & \# Correctness & \# Sync & Rate\\
175 Chase-Lev Deque & 10 & 0 & 2 & 3 & 2 & 70\%\\
176 SPSC Queue & 2 & 2 & 0 & 0 & 0 & 100\%\\
177 RCU & 3 & 0 & 0 & 1 & 2 & 100\%\\
178 Lockfree Hashtable & 5 & 0 & 0 & 0 & 2 & 40\%\\
179 MCS Lock & 4 & 0 & 0 & 0 & 4 & 100\%\\
180 MPMC Queue & 6 & 0 & 0 & 0 & 2 & 33\%\\
181 M\&S Queue & 11 & 0 & 6 & 3 & 0 & 82\%\\
182 Linux RW Lock & 8 & 0 & 0 & 0 & 8 & 100\%\\
184 Total & 49 & 2 & 8 & 7 & 20 & 76\%\\
187 % Results from: 2014-03-25-17:08
188 % Flags: -y -m 2 -u 3
192 \caption{\label{fig:injection_results} Bug injection detection results}
197 The next component of the evaluation examines the effectiveness of \TOOL for finding bugs.
199 \mypara{\bf New Bugs:} In the M\&S queue benchmark used
200 in~\cite{cdschecker}, the dequeue interface does not differentiate
201 between dequeuing the integer zero and returning that no item is
202 available, and it passed our initial specification. However, after
203 modifying the dequeue interface to match that in the original paper,
204 \TOOL is able to find a new bug that \cdschecker did not find. The
205 original test driver for this benchmark performed the enqueues first
206 to make it easy to write assertions that are valid for all executions.
207 \TOOL allows specification assertions to capture the behavior of the specific execution
208 and thus is able to discover the given bug.
210 \mypara {\bf Injected Bugs:} To further evaluate \TOOL,
211 we injected bugs in our benchmarks by weakening the ordering
212 parameter of the atomic operations. These include
213 changing \code{release}, \code{acquire}, \code{acq\_rel} and
214 \code{seq\_cst} to \code{relaxed}. We weakened one operation per
215 each trial, and covered all of the atomic operations that our
216 tests exercise. While this injection strategy may not
217 reproduce all types of errors that developers make, it does
218 simulate errors that are caused by misunderstanding the
219 complicated semantics of relaxed memory models.
221 This fault injection strategy will introduce one of two types of bugs.
222 The first type is a specification-independent bug, which can be
223 detected by the underlying \cdschecker infrastructure which includes
224 internal data races and uninitialized loads. The second type is a
225 specification-dependent bug, which passes the built-in checks
226 but violates the \TOOL specification. These include
227 failed assertions and synchronization violations. We
228 classifying bugs as follows. If \cdschecker reports a data
229 race or an uninitialized load, \TOOL reports the error and stops.
230 If not, \TOOL continues to check the
231 execution against the specification. It first
232 checks for violations of the preconditions and postconditions and then for
233 violations of the synchronization specification.
235 Figure~\ref{fig:injection_results} shows the results of the injection
236 detection. The column \textit{DR} represents data races, \textit{UL} represents
237 uninitialized loads, \textit{correctness} represents a failed precondition or
239 \textit{sync} represents a synchronization violation. The detection rate is the
240 number of injections for which we detected a bug divided by the total
241 number of injections.
243 \mypara{\bf Linux Reader-Writer Lock:} Our initial specification for this
244 benchmark did not allow \code{write\_trylock} to spuriously fail.
245 However, when we checked this benchmark against that specification,
246 \TOOL checker reports a correctness violation. We then analyzed the code
247 and found that \code{write\_trylock} first subtracts a bias from the
248 \code{lock} variable to attempt to acquire the lock, and restores that
249 bias if the attempt to acquire the lock fails. In the scenario where
250 two \code{write\_trylock} are racing for the lock before the lock is
251 released, one \code{write\_trylock} can first decrement the lock
252 variable, the lock can be released by the original holder, and then
253 the second \code{write\_trylock} can attempt to acquire the lock.
254 Even though the history indicates that the lock is unlocked, it
255 still holds a transient value due to the partially completed first
256 \code{write\_trylock} invocation. Thus, the second
257 \code{write\_trylock} invocation will also fail. As the second
258 \code{write\_trylock} serializes after both the first unsuccessful
259 \code{write\_trylock} and the \code{unlock} operation,
260 the sequential specification would force it to succeed. We then modified
262 of \code{write\_trylock} to allow spurious failures so that our correctness
263 model fits this data structure. This shows
264 \TOOL can help developers iteratively refine the
265 specifications of their data structures. By analyzing the \TOOL
266 diagnostic report, developers can better understand
267 any inconsistencies between the specification and the
270 \mypara{\bf MCS Lock:} Three of the weakened
271 operations are not detected because they cause the execution
272 to fail to terminate (and hit a trace bound). We
273 reviewed the code and found that weakening any of those three
274 operations makes the lock spin forever.
276 \mypara{\bf M\&S Queue:} Our test driver does not cause
277 an enqueue or dequeue thread to help another
278 enqueue thread update the tail pointer, which corresponds to two of the
279 undetected injections.
281 \mypara{\bf Lockfree Hashtable:} Our experiment only focuses on the two
282 primitive methods, \code{get} and \code{put} without triggering the
283 \code{resize}. We were able to successfully check all executions from a test driver for the lockfree hash table that generates executions that have no program order preserving sequential histories.
285 \mypara{\bf MPMC Queue:} The undetected injections in this benchmark are
286 primarily due to the limitation of our test driver. One
287 synchronization property of this benchmark is
288 that \code{read\_consume} should synchronize with the
289 next \code{write\_prepare} to ensure that a slot cannot be reused
290 before the consumer has finished with the slot. Our test driver is
291 unable to reach this case so those injections are not detected.
293 From our experiments on concurrent data structures, we can see that
294 \TOOL checker can help detect incorrect memory orderings,
295 help developers refine data structure specifications,
296 and help determine whether strong memory orderings are really
297 necessary. Since \TOOL checker is a unit testing tool, it is limited
298 to small-scale tests to explore common usage scenarios of the data
299 structures. As a unit testing tool, \TOOL was able to find 100\% of
300 injections for many data structures and to find 76\% of the injections
301 on average. For our 49 injections, 10 of them were detected by checks
302 in \cdschecker, and 27 additional injections were detected by
303 \TOOL. This shows that by writing specifications, we detect
304 significantly more fault injections.