blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe
authorTejun Heo <tj@kernel.org>
Tue, 23 Sep 2014 19:24:32 +0000 (15:24 -0400)
committerJens Axboe <axboe@fb.com>
Wed, 24 Sep 2014 14:29:36 +0000 (08:29 -0600)
blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take more than ten seconds
when scsi-mq is used.

This will be properly fixed by implementing a mechanism to keep
q->mq_usage_counter in atomic mode till genhd registration; however,
that involves rather big updates to percpu_ref which is difficult to
apply late in the devel cycle (v3.17-rc6 at the moment).  As a
stop-gap measure till the proper fix can be implemented in the next
cycle, this patch introduces __percpu_ref_kill_expedited() and makes
blk_mq_freeze_queue() use it.  This is heavy-handed but should work
for testing the experimental SCSI blk-mq implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
block/blk-mq.c
include/linux/percpu-refcount.h
lib/percpu-refcount.c

index c88e6089746d82267c61d572baa25fe855704f22..df8e1e09dd172d9ab67dbd0a91e519c5c01919ef 100644 (file)
@@ -119,7 +119,16 @@ void blk_mq_freeze_queue(struct request_queue *q)
        spin_unlock_irq(q->queue_lock);
 
        if (freeze) {
-               percpu_ref_kill(&q->mq_usage_counter);
+               /*
+                * XXX: Temporary kludge to work around SCSI blk-mq stall.
+                * SCSI synchronously creates and destroys many queues
+                * back-to-back during probe leading to lengthy stalls.
+                * This will be fixed by keeping ->mq_usage_counter in
+                * atomic mode until genhd registration, but, for now,
+                * let's work around using expedited synchronization.
+                */
+               __percpu_ref_kill_expedited(&q->mq_usage_counter);
+
                blk_mq_run_queues(q, false);
        }
        wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->mq_usage_counter));
index 3dfbf237cd8f32fc684e2934e7a8ace0da043b14..ef5894ca8e503d9171f74cce2f63298c703333d3 100644 (file)
@@ -71,6 +71,7 @@ void percpu_ref_reinit(struct percpu_ref *ref);
 void percpu_ref_exit(struct percpu_ref *ref);
 void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
                                 percpu_ref_func_t *confirm_kill);
+void __percpu_ref_kill_expedited(struct percpu_ref *ref);
 
 /**
  * percpu_ref_kill - drop the initial ref
index fe5a3342e9607d8ee9bd11678ef31d070a81ccf5..a89cf09a82684d729222699afb1f911162977ba7 100644 (file)
@@ -184,3 +184,19 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
        call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_kill_and_confirm);
+
+/*
+ * XXX: Temporary kludge to work around SCSI blk-mq stall.  Used only by
+ * block/blk-mq.c::blk_mq_freeze_queue().  Will be removed during v3.18
+ * devel cycle.  Do not use anywhere else.
+ */
+void __percpu_ref_kill_expedited(struct percpu_ref *ref)
+{
+       WARN_ONCE(ref->pcpu_count_ptr & PCPU_REF_DEAD,
+                 "percpu_ref_kill() called more than once on %pf!",
+                 ref->release);
+
+       ref->pcpu_count_ptr |= PCPU_REF_DEAD;
+       synchronize_sched_expedited();
+       percpu_ref_kill_rcu(&ref->rcu);
+}