2 This file is a part of libcds - Concurrent Data Structures library
4 (C) Copyright Maxim Khizhinsky (libcds.dev@gmail.com) 2006-2016
6 Source code repo: http://github.com/khizmax/libcds/
7 Download: http://sourceforge.net/projects/libcds/files/
9 Redistribution and use in source and binary forms, with or without
10 modification, are permitted provided that the following conditions are met:
12 * Redistributions of source code must retain the above copyright notice, this
13 list of conditions and the following disclaimer.
15 * Redistributions in binary form must reproduce the above copyright notice,
16 this list of conditions and the following disclaimer in the documentation
17 and/or other materials provided with the distribution.
19 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 #ifndef CDSLIB_CONTAINER_IMPL_FELDMAN_HASHSET_H
32 #define CDSLIB_CONTAINER_IMPL_FELDMAN_HASHSET_H
34 #include <cds/intrusive/impl/feldman_hashset.h>
35 #include <cds/container/details/feldman_hashset_base.h>
37 namespace cds { namespace container {
39 /// Hash set based on multi-level array
40 /** @ingroup cds_nonintrusive_set
41 @anchor cds_container_FeldmanHashSet_hp
44 - [2013] Steven Feldman, Pierre LaBorde, Damian Dechev "Concurrent Multi-level Arrays:
45 Wait-free Extensible Hash Maps"
47 [From the paper] The hardest problem encountered while developing a parallel hash map is how to perform
48 a global resize, the process of redistributing the elements in a hash map that occurs when adding new
49 buckets. The negative impact of blocking synchronization is multiplied during a global resize, because all
50 threads will be forced to wait on the thread that is performing the involved process of resizing the hash map
51 and redistributing the elements. \p %FeldmanHashSet implementation avoids global resizes through new array
52 allocation. By allowing concurrent expansion this structure is free from the overhead of an explicit resize,
53 which facilitates concurrent operations.
55 The presented design includes dynamic hashing, the use of sub-arrays within the hash map data structure;
56 which, in combination with <b>perfect hashing</b>, means that each element has a unique final, as well as current, position.
57 It is important to note that the perfect hash function required by our hash map is trivial to realize as
58 any hash function that permutes the bits of the key is suitable. This is possible because of our approach
59 to the hash function; we require that it produces hash values that are equal in size to that of the key.
60 We know that if we expand the hash map a fixed number of times there can be no collision as duplicate keys
61 are not provided for in the standard semantics of a hash map.
63 \p %FeldmanHashSet is a multi-level array which has an internal structure similar to a tree:
64 @image html feldman_hashset.png
65 The multi-level array differs from a tree in that each position on the tree could hold an array of nodes or a single node.
66 A position that holds a single node is a \p dataNode which holds the hash value of a key and the value that is associated
67 with that key; it is a simple struct holding two variables. A \p dataNode in the multi-level array could be marked.
68 A \p markedDataNode refers to a pointer to a \p dataNode that has been bitmarked at the least significant bit (LSB)
69 of the pointer to the node. This signifies that this \p dataNode is contended. An expansion must occur at this node;
70 any thread that sees this \p markedDataNode will try to replace it with an \p arrayNode; which is a position that holds
71 an array of nodes. The pointer to an \p arrayNode is differentiated from that of a pointer to a \p dataNode by a bitmark
72 on the second-least significant bit.
74 \p %FeldmanHashSet multi-level array is similar to a tree in that we keep a pointer to the root, which is a memory array
75 called \p head. The length of the \p head memory array is unique, whereas every other \p arrayNode has a uniform length;
76 a normal \p arrayNode has a fixed power-of-two length equal to the binary logarithm of a variable called \p arrayLength.
77 The maximum depth of the tree, \p maxDepth, is the maximum number of pointers that must be followed to reach any node.
78 We define \p currentDepth as the number of memory arrays that we need to traverse to reach the \p arrayNode on which
79 we need to operate; this is initially one, because of \p head.
81 That approach to the structure of the hash set uses an extensible hashing scheme; <b> the hash value is treated as a bit
82 string</b> and rehash incrementally.
84 @note Two important things you should keep in mind when you're using \p %FeldmanHashSet:
85 - all keys must be fixed-size. It means that you cannot use \p std::string as a key for \p %FeldmanHashSet.
86 Instead, for the strings you should use well-known hashing algorithms like <a href="https://en.wikipedia.org/wiki/Secure_Hash_Algorithm">SHA1, SHA2</a>,
87 <a href="https://en.wikipedia.org/wiki/MurmurHash">MurmurHash</a>, <a href="https://en.wikipedia.org/wiki/CityHash">CityHash</a>
88 or its successor <a href="https://code.google.com/p/farmhash/">FarmHash</a> and so on, which
89 converts variable-length strings to fixed-length bit-strings, and use that hash as a key in \p %FeldmanHashSet.
90 - \p %FeldmanHashSet uses a perfect hashing. It means that if two different keys, for example, of type \p std::string,
91 have identical hash then you cannot insert both that keys in the set. \p %FeldmanHashSet does not maintain the key,
92 it maintains its fixed-size hash value.
94 The set supports @ref cds_container_FeldmanHashSet_iterators "bidirectional thread-safe iterators".
97 - \p GC - safe memory reclamation schema. Can be \p gc::HP, \p gc::DHP or one of \ref cds_urcu_type "RCU type"
98 - \p T - a value type to be stored in the set
99 - \p Traits - type traits, the structure based on \p feldman_hashset::traits or result of \p feldman_hashset::make_traits metafunction.
100 \p Traits is the mandatory argument because it has one mandatory type - an @ref feldman_hashset::traits::hash_accessor "accessor"
101 to hash value of \p T. The set algorithm does not calculate that hash value.
103 There are several specializations of \p %FeldmanHashSet for each \p GC. You should include:
104 - <tt><cds/container/feldman_hashset_hp.h></tt> for \p gc::HP garbage collector
105 - <tt><cds/container/feldman_hashset_dhp.h></tt> for \p gc::DHP garbage collector
106 - <tt><cds/container/feldman_hashset_rcu.h></tt> for \ref cds_intrusive_FeldmanHashSet_rcu "RCU type". RCU specialization
107 has a slightly different interface.
112 #ifdef CDS_DOXYGEN_INVOKED
113 , class Traits = feldman_hashset::traits
119 #ifdef CDS_DOXYGEN_INVOKED
120 : protected cds::intrusive::FeldmanHashSet< GC, T, Traits >
122 : protected cds::container::details::make_feldman_hashset< GC, T, Traits >::type
126 typedef cds::container::details::make_feldman_hashset< GC, T, Traits > maker;
127 typedef typename maker::type base_class;
131 typedef GC gc; ///< Garbage collector
132 typedef T value_type; ///< type of value stored in the set
133 typedef Traits traits; ///< Traits template parameter, see \p feldman_hashset::traits
135 typedef typename base_class::hash_accessor hash_accessor; ///< Hash accessor functor
136 typedef typename base_class::hash_type hash_type; ///< Hash type deduced from \p hash_accessor return type
137 typedef typename base_class::hash_comparator hash_comparator; ///< hash compare functor based on \p opt::compare and \p opt::less option setter
139 typedef typename traits::item_counter item_counter; ///< Item counter type
140 typedef typename traits::allocator allocator; ///< Element allocator
141 typedef typename traits::node_allocator node_allocator; ///< Array node allocator
142 typedef typename traits::memory_model memory_model; ///< Memory model
143 typedef typename traits::back_off back_off; ///< Backoff strategy
144 typedef typename traits::stat stat; ///< Internal statistics type
146 typedef typename gc::template guarded_ptr< value_type > guarded_ptr; ///< Guarded pointer
148 /// Count of hazard pointers required
149 static CDS_CONSTEXPR size_t const c_nHazardPtrCount = base_class::c_nHazardPtrCount;
151 /// The size of hash_type in bytes, see \p feldman_hashset::traits::hash_size for explanation
152 static CDS_CONSTEXPR size_t const c_hash_size = base_class::c_hash_size;
155 typedef feldman_hashset::level_statistics level_statistics;
159 typedef typename maker::cxx_node_allocator cxx_node_allocator;
160 typedef std::unique_ptr< value_type, typename maker::node_disposer > scoped_node_ptr;
164 ///@name Thread-safe iterators
166 /// Bidirectional iterator
167 /** @anchor cds_container_FeldmanHashSet_iterators
168 The set supports thread-safe iterators: you may iterate over the set in multi-threaded environment.
169 It is guaranteed that the iterators will remain valid even if another thread deletes the node the iterator points to:
170 Hazard Pointer embedded into the iterator object protects the node from physical reclamation.
172 @note Since the iterator object contains hazard pointer that is a thread-local resource,
173 the iterator should not be passed to another thread.
175 Each iterator object supports the following interface:
176 - dereference operators:
178 value_type [const] * operator ->() noexcept
179 value_type [const] & operator *() noexcept
181 - pre-increment and pre-decrement. Post-operators is not supported
182 - equality operators <tt>==</tt> and <tt>!=</tt>.
183 Iterators are equal iff they point to the same cell of the same array node.
184 Note that for two iterators \p it1 and \p it2, the conditon <tt> it1 == it2 </tt>
185 does not entail <tt> &(*it1) == &(*it2) </tt>
186 - helper member function \p release() that clears internal hazard pointer.
187 After \p release() the iterator points to \p nullptr but it still remain valid: further iterating is possible.
189 During iteration you may safely erase any item from the set;
190 @ref erase_at() function call doesn't invalidate any iterator.
191 If some iterator points to the item to be erased, that item is not deleted immediately
192 but only after that iterator will be advanced forward or backward.
194 @note It is possible the item can be iterated more that once, for example, if an iterator points to the item
195 in array node that is being splitted.
197 typedef typename base_class::iterator iterator;
198 typedef typename base_class::const_iterator const_iterator; ///< @ref cds_container_FeldmanHashSet_iterators "bidirectional const iterator" type
199 typedef typename base_class::reverse_iterator reverse_iterator; ///< @ref cds_container_FeldmanHashSet_iterators "bidirectional reverse iterator" type
200 typedef typename base_class::const_reverse_iterator const_reverse_iterator; ///< @ref cds_container_FeldmanHashSet_iterators "bidirectional reverse const iterator" type
202 /// Returns an iterator to the beginning of the set
205 return base_class::begin();
208 /// Returns an const iterator to the beginning of the set
209 const_iterator begin() const
211 return base_class::begin();
214 /// Returns an const iterator to the beginning of the set
215 const_iterator cbegin()
217 return base_class::cbegin();
220 /// Returns an iterator to the element following the last element of the set. This element acts as a placeholder; attempting to access it results in undefined behavior.
223 return base_class::end();
226 /// Returns a const iterator to the element following the last element of the set. This element acts as a placeholder; attempting to access it results in undefined behavior.
227 const_iterator end() const
229 return base_class::end();
232 /// Returns a const iterator to the element following the last element of the set. This element acts as a placeholder; attempting to access it results in undefined behavior.
233 const_iterator cend()
235 return base_class::cend();
238 /// Returns a reverse iterator to the first element of the reversed set
239 reverse_iterator rbegin()
241 return base_class::rbegin();
244 /// Returns a const reverse iterator to the first element of the reversed set
245 const_reverse_iterator rbegin() const
247 return base_class::rbegin();
250 /// Returns a const reverse iterator to the first element of the reversed set
251 const_reverse_iterator crbegin()
253 return base_class::crbegin();
256 /// Returns a reverse iterator to the element following the last element of the reversed set
258 It corresponds to the element preceding the first element of the non-reversed container.
259 This element acts as a placeholder, attempting to access it results in undefined behavior.
261 reverse_iterator rend()
263 return base_class::rend();
266 /// Returns a const reverse iterator to the element following the last element of the reversed set
268 It corresponds to the element preceding the first element of the non-reversed container.
269 This element acts as a placeholder, attempting to access it results in undefined behavior.
271 const_reverse_iterator rend() const
273 return base_class::rend();
276 /// Returns a const reverse iterator to the element following the last element of the reversed set
278 It corresponds to the element preceding the first element of the non-reversed container.
279 This element acts as a placeholder, attempting to access it results in undefined behavior.
281 const_reverse_iterator crend()
283 return base_class::crend();
288 /// Creates empty set
290 @param head_bits: 2<sup>head_bits</sup> specifies the size of head array, minimum is 4.
291 @param array_bits: 2<sup>array_bits</sup> specifies the size of array node, minimum is 2.
293 Equation for \p head_bits and \p array_bits:
295 sizeof(hash_type) * 8 == head_bits + N * array_bits
297 where \p N is multi-level array depth.
299 FeldmanHashSet( size_t head_bits = 8, size_t array_bits = 4 )
300 : base_class( head_bits, array_bits )
303 /// Destructs the set and frees all data
307 /// Inserts new element
309 The function creates an element with copy of \p val value and then inserts it into the set.
311 The type \p Q should contain as minimum the complete hash for the element.
312 The object of \ref value_type should be constructible from a value of type \p Q.
313 In trivial case, \p Q is equal to \ref value_type.
315 Returns \p true if \p val is inserted into the set, \p false otherwise.
317 template <typename Q>
318 bool insert( Q const& val )
320 scoped_node_ptr sp( cxx_node_allocator().New( val ));
321 if ( base_class::insert( *sp )) {
328 /// Inserts new element
330 The function allows to split creating of new item into two part:
331 - create item with key only
332 - insert new item into the set
333 - if inserting is success, calls \p f functor to initialize value-fields of \p val.
335 The functor signature is:
337 void func( value_type& val );
339 where \p val is the item inserted. User-defined functor \p f should guarantee that during changing
340 \p val no any other changes could be made on this set's item by concurrent threads.
341 The user-defined functor is called only if the inserting is success.
343 template <typename Q, typename Func>
344 bool insert( Q const& val, Func f )
346 scoped_node_ptr sp( cxx_node_allocator().New( val ));
347 if ( base_class::insert( *sp, f )) {
354 /// Updates the element
356 The operation performs inserting or replacing with lock-free manner.
358 If the \p val key not found in the set, then the new item created from \p val
359 will be inserted into the set iff \p bInsert is \p true.
360 Otherwise, if \p val is found, it is replaced with new item created from \p val
361 and previous item is disposed.
362 In both cases \p func functor is called.
364 The functor \p Func signature:
367 void operator()( value_type& cur, value_type * prev );
371 - \p cur - current element
372 - \p prev - pointer to previous element with such hash. \p prev is \p nullptr
373 if \p cur was just inserted.
375 The functor may change non-key fields of the \p item; however, \p func must guarantee
376 that during changing no any other modifications could be made on this item by concurrent threads.
378 Returns <tt> std::pair<bool, bool> </tt> where \p first is \p true if operation is successful,
379 i.e. the item has been inserted or updated,
380 \p second is \p true if the new item has been added or \p false if the item with key equal to \p val
383 template <typename Q, typename Func>
384 std::pair<bool, bool> update( Q const& val, Func func, bool bInsert = true )
386 scoped_node_ptr sp( cxx_node_allocator().New( val ));
387 std::pair<bool, bool> bRes = base_class::do_update( *sp, func, bInsert );
393 /// Inserts data of type \p value_type created in-place from <tt>std::forward<Args>(args)...</tt>
395 Returns \p true if inserting successful, \p false otherwise.
397 template <typename... Args>
398 bool emplace( Args&&... args )
400 scoped_node_ptr sp( cxx_node_allocator().MoveNew( std::forward<Args>(args)... ));
401 if ( base_class::insert( *sp )) {
408 /// Deletes the item from the set
410 The function searches \p hash in the set,
411 deletes the item found, and returns \p true.
412 If that item is not found the function returns \p false.
414 bool erase( hash_type const& hash )
416 return base_class::erase( hash );
419 /// Deletes the item from the set
421 The function searches \p hash in the set,
422 call \p f functor with item found, and deltes the element from the set.
424 The \p Func interface is
427 void operator()( value_type& item );
431 If \p hash is not found the function returns \p false.
433 template <typename Func>
434 bool erase( hash_type const& hash, Func f )
436 return base_class::erase( hash, f );
439 /// Deletes the item pointed by iterator \p iter
441 Returns \p true if the operation is successful, \p false otherwise.
443 The function does not invalidate the iterator, it remains valid and can be used for further traversing.
445 bool erase_at( iterator const& iter )
447 return base_class::erase_at( iter );
450 bool erase_at( reverse_iterator const& iter )
452 return base_class::erase_at( iter );
456 /// Extracts the item with specified \p hash
458 The function searches \p hash in the set,
459 unlinks it from the set, and returns a guarded pointer to the item extracted.
460 If \p hash is not found the function returns an empty guarded pointer.
462 The item returned is reclaimed by garbage collector \p GC
463 when returned \ref guarded_ptr object to be destroyed or released.
464 @note Each \p guarded_ptr object uses the GC's guard that can be limited resource.
468 typedef cds::container::FeldmanHashSet< your_template_args > my_set;
472 my_set::guarded_ptr gp( theSet.extract( 5 ));
477 // Destructor of gp releases internal HP guard
481 guarded_ptr extract( hash_type const& hash )
483 return base_class::extract( hash );
486 /// Finds an item by it's \p hash
488 The function searches the item by \p hash and calls the functor \p f for item found.
489 The interface of \p Func functor is:
492 void operator()( value_type& item );
495 where \p item is the item found.
497 The functor may change non-key fields of \p item. Note that the functor is only guarantee
498 that \p item cannot be disposed during the functor is executing.
499 The functor does not serialize simultaneous access to the set's \p item. If such access is
500 possible you must provide your own synchronization schema on item level to prevent unsafe item modifications.
502 The function returns \p true if \p hash is found, \p false otherwise.
504 template <typename Func>
505 bool find( hash_type const& hash, Func f )
507 return base_class::find( hash, f );
510 /// Checks whether the set contains \p hash
512 The function searches the item by its \p hash
513 and returns \p true if it is found, or \p false otherwise.
515 bool contains( hash_type const& hash )
517 return base_class::contains( hash );
520 /// Finds an item by it's \p hash and returns the item found
522 The function searches the item by its \p hash
523 and returns the guarded pointer to the item found.
524 If \p hash is not found the function returns an empty \p guarded_ptr.
526 @note Each \p guarded_ptr object uses one GC's guard which can be limited resource.
530 typedef cds::container::FeldmanHashSet< your_template_params > my_set;
534 my_set::guarded_ptr gp( theSet.get( 5 ));
535 if ( theSet.get( 5 )) {
539 // Destructor of guarded_ptr releases internal HP guard
543 guarded_ptr get( hash_type const& hash )
545 return base_class::get( hash );
548 /// Clears the set (non-atomic)
550 The function unlink all data node from the set.
551 The function is not atomic but is thread-safe.
552 After \p %clear() the set may not be empty because another threads may insert items.
559 /// Checks if the set is empty
561 Emptiness is checked by item counting: if item count is zero then the set is empty.
562 Thus, the correct item counting feature is an important part of the set implementation.
566 return base_class::empty();
569 /// Returns item count in the set
572 return base_class::size();
575 /// Returns const reference to internal statistics
576 stat const& statistics() const
578 return base_class::statistics();
581 /// Returns the size of head node
582 size_t head_size() const
584 return base_class::head_size();
587 /// Returns the size of the array node
588 size_t array_node_size() const
590 return base_class::array_node_size();
593 /// Collects tree level statistics into \p stat
595 The function traverses the set and collects statistics for each level of the tree
596 into \p feldman_hashset::level_statistics struct. The element of \p stat[i]
597 represents statistics for level \p i, level 0 is head array.
598 The function is thread-safe and may be called in multi-threaded environment.
600 Result can be useful for estimating efficiency of hash functor you use.
602 void get_level_statistics(std::vector< feldman_hashset::level_statistics>& stat) const
604 base_class::get_level_statistics(stat);
608 }} // namespace cds::container
610 #endif // #ifndef CDSLIB_CONTAINER_IMPL_FELDMAN_HASHSET_H