+#1: The new bug we found in this data structure is as the following:
+
+In dequeue(), the load of Tail has to be acquire. If not, the load of the next
+field won't get synchronize with the enqueuer that inserted the element right
+after this dequeue() operation. We can catch this bug using the two independent
+dequeuers and 1 (or 2) enqueuer.
+
+#2: When writing the speicification for this data structure, we have been
+considering the following invariance of the enqueue() and dequeue() operations.
+For enqueue() to succeed, it finally loads the Tail, inserts the new element by
+updating the next field right after the Tail load. The load of Tail actually
+orders the enqueuers, and the CAS of the next field will order the enqueuer and
+its dequeuer. For dequeue() to succeed, it either loads the Head and Tail, and
+then bails; or it loads the Head and then loads the next field. Similarly, the
+load of Head orders dequeuers.
+
+We have one key "additional_ordering_point" in enqueue() method for ordering the
+dequeuer and its immediately next enqueuer. For example, when we have two
+threads, 1 dequeuer and 1 enqueuer, when dequeue() returns NULL, we can order
+dequeue() before enqueue() by using the additional ordering point of enqueue()
+where the swing of Tail (or just the load of Tail when someone else swings it).
+Since dequeue() will load the Tail, and enqueue() will CAS or load the Tail
+later, deuqeue() can be ordered() before enqueue(). Notably here, additional
+ordering points are only used when methods has 'same-location' commit points and
+they can't order each other.