Fixing the barriers speed up a micro-benchmark (16 threads accessing an RCU-protected split-list set) by 30-40%. Thanks to Todd Lipcon who found this improvement
Chapter 6 "User-Level Implementations of Read-Copy Update"
- [2011] M.Desnoyers, P.McKenney, A.Stern, M.Dagenias, J.Walpole "User-Level
Implementations of Read-Copy Update"
+ - [2012] M.Desnoyers, P.McKenney, A.Stern, M.Dagenias, J.Walpole "Supplementary
+ Material for User-Level Implementations of Read-Copy Update"
<b>Informal introduction to user-space %RCU</b>
design, thus being appropriate for use within a general-purpose library, but it has
relatively higher read-side overhead. The \p libcds contains several implementations of general-purpose
%RCU: \ref general_instant, \ref general_buffered, \ref general_threaded.
- - \ref signal_buffered: the signal-handling %RCU presents an implementation having low read-side overhead and
+ - \p signal_buffered: the signal-handling %RCU presents an implementation having low read-side overhead and
requiring only that the application give up one POSIX signal to %RCU update processing.
@note The signal-handled %RCU is defined only for UNIX-like systems, not for Windows.
uint32_t tmp = pRec->m_nAccessControl.load( atomics::memory_order_relaxed );
if ( (tmp & rcu_class::c_nNestMask) == 0 ) {
pRec->m_nAccessControl.store( gp_singleton<RCUtag>::instance()->global_control_word(atomics::memory_order_relaxed),
- atomics::memory_order_release );
- atomics::atomic_thread_fence( atomics::memory_order_acquire );
- CDS_COMPILER_RW_BARRIER;
+ atomics::memory_order_relaxed );
+
+ // acquire barrier
+ pRec->m_nAccessControl.load( atomics::memory_order_acquire );
}
else {
- pRec->m_nAccessControl.fetch_add( 1, atomics::memory_order_relaxed );
+ // nested lock
+ pRec->m_nAccessControl.store( tmp + 1, atomics::memory_order_relaxed );
}
}
thread_record * pRec = get_thread_record();
assert( pRec != nullptr );
- CDS_COMPILER_RW_BARRIER;
- pRec->m_nAccessControl.fetch_sub( 1, atomics::memory_order_release );
+ uint32_t tmp = pRec->m_nAccessControl.load( atomics::memory_order_relaxed );
+ assert( (tmp & rcu_class::c_nNestMask) > 0 );
+
+ pRec->m_nAccessControl.store( tmp - 1, atomics::memory_order_release );
}
template <typename RCUtag>
assert( pRec != nullptr );
uint32_t tmp = pRec->m_nAccessControl.load( atomics::memory_order_relaxed );
+ assert( ( tmp & rcu_class::c_nNestMask ) > 0 );
+
if ( (tmp & rcu_class::c_nNestMask) == 0 ) {
- pRec->m_nAccessControl.store(
- sh_singleton<RCUtag>::instance()->global_control_word(atomics::memory_order_acquire),
- atomics::memory_order_release
- );
+ pRec->m_nAccessControl.store( sh_singleton<RCUtag>::instance()->global_control_word(atomics::memory_order_relaxed),
+ atomics::memory_order_relaxed );
+
+ // acquire barrier
+ pRec->m_nAccessControl.load( atomics::memory_order_acquire );
}
else {
- pRec->m_nAccessControl.fetch_add( 1, atomics::memory_order_release );
+ // nested lock
+ pRec->m_nAccessControl.store( tmp + 1, atomics::memory_order_relaxed );
}
- CDS_COMPILER_RW_BARRIER;
}
template <typename RCUtag>
thread_record * pRec = get_thread_record();
assert( pRec != nullptr);
- CDS_COMPILER_RW_BARRIER;
- pRec->m_nAccessControl.fetch_sub( 1, atomics::memory_order_release );
+ uint32_t tmp = pRec->m_nAccessControl.load( atomics::memory_order_relaxed );
+ assert( ( tmp & rcu_class::c_nNestMask ) > 0 );
+
+ pRec->m_nAccessControl.store( tmp - 1, atomics::memory_order_release );
}
template <typename RCUtag>
- Changed: exception handling. Now, exceptions raise by invoking new
cds::throw_exception() function. If you compile your code with exception disabled,
the function prints an exception message to stdout and calls abort()
- instead of throwing. You can provide your own cds::throw_exception() function
- and compile libcds with -DCDS_USER_DEFINED_THROW_EXCEPTION.
+ instead of throwing.
+ - Flat Combining: fixed memory-order bug that can lead to crash on weak ordered
+ architecture like PowerPC or ARM
- Added: erase_at( iterator ) function to MichaelHashSet/Map and SplitListSet/Map
based on IterableList
- Fixed a bug in BronsonAVLTreeMap::extract_min()/extract_max()/clear().
Nikolai Rapotkin\r
rwf (https://github.com/rfw)\r
Tamas Lengyel\r
+Todd Lipcon\r