Tejun Heo [Thu, 3 Nov 2011 17:52:11 +0000 (18:52 +0100)]
block: don't call blk_drain_queue() if elevator is not up
blk_cleanup_queue() may be called before elevator is set up on a
queue which triggers the following oops.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70
...
Pid: 830, comm: kworker/0:2 Not tainted 3.1.0-next-20111025_64+ #1590
Bochs Bochs
RIP: 0010:[<
ffffffff8125a69c>] [<
ffffffff8125a69c>] elv_drain_elevator+0x1c/0x70
...
Call Trace:
[<
ffffffff8125da92>] blk_drain_queue+0x42/0x70
[<
ffffffff8125db90>] blk_cleanup_queue+0xd0/0x1c0
[<
ffffffff81469640>] md_free+0x50/0x70
[<
ffffffff8126f43b>] kobject_release+0x8b/0x1d0
[<
ffffffff81270d56>] kref_put+0x36/0xa0
[<
ffffffff8126f2b7>] kobject_put+0x27/0x60
[<
ffffffff814693af>] mddev_delayed_delete+0x2f/0x40
[<
ffffffff81083450>] process_one_work+0x100/0x3b0
[<
ffffffff8108527f>] worker_thread+0x15f/0x3a0
[<
ffffffff81089937>] kthread+0x87/0x90
[<
ffffffff81621834>] kernel_thread_helper+0x4/0x10
Fix it by making blk_cleanup_queue() check whether q->elevator is set
up before invoking blk_drain_queue.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 25 Oct 2011 13:51:48 +0000 (15:51 +0200)]
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
We can't use the latter if !CONFIG_LOCKDEP.
Reported-by: Sedat Dilek <sedat.dilek@googlemail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vivek Goyal [Tue, 25 Oct 2011 13:48:12 +0000 (15:48 +0200)]
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blkcg->policy_list is protected by blkcg->lock. Its not rcu protected
list. So even for readers, they need to take blkcg->lock. There are
few functions which were reading the list without taking lock. Fix it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vivek Goyal [Tue, 25 Oct 2011 13:48:12 +0000 (15:48 +0200)]
blk-throttle: Free up policy node associated with deleted rule
If a rule is being deleted, free up associated policy node. Otherwise
that memory is leaked.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tao Ma [Tue, 25 Oct 2011 08:20:05 +0000 (10:20 +0200)]
block: warn if tag is greater than real_max_depth.
In case tag depth is reduced, it is max_depth not real_max_depth.
So we should allow a request with tag >= max_depth, but for a
tag >= real_max_depth, there really should be some problem.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 24 Oct 2011 14:24:38 +0000 (16:24 +0200)]
Merge branch 'for-linus' into for-3.2/core
Tejun Heo [Mon, 17 Oct 2011 11:42:43 +0000 (13:42 +0200)]
block: make gendisk hold a reference to its queue
The following command sequence triggers an oops.
# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt
general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
RIP: 0010:[<
ffffffff810d0879>] [<
ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
Call Trace:
[<
ffffffff810d2845>] lock_acquire+0x95/0x140
[<
ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
[<
ffffffff811573bc>] bdi_lock_two+0x5c/0x70
[<
ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
[<
ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
[<
ffffffff811c4010>] __blkdev_put+0x160/0x1d0
[<
ffffffff811c40df>] blkdev_put+0x5f/0x190
[<
ffffffff8118f18d>] kill_block_super+0x4d/0x80
[<
ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
[<
ffffffff8119003a>] deactivate_super+0x4a/0x70
[<
ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
[<
ffffffff811acf2e>] sys_umount+0x7e/0x3a0
[<
ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
This is because bdev holds on to disk but disk doesn't pin the
associated queue. If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.
Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.
Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jeff Moyer [Mon, 17 Oct 2011 10:57:23 +0000 (12:57 +0200)]
blk-flush: move the queue kick into
A dm-multipath user reported[1] a problem when trying to boot
a kernel with commit
4853abaae7e4a2af938115ce9071ef8684fb7af4
(block: fix flush machinery for stacking drivers with differring
flush flags) applied. It turns out that an empty flush request
can be sent into blk_insert_flush. When the BUG_ON was fixed
to allow for this, I/O on the underlying device would stall. The
reason is that blk_insert_cloned_request does not kick the queue.
In the aforementioned commit, I had added a special case to
kick the queue if data was sent down but the queue flags did
not require a flush. A better solution is to push the queue
kick up into blk_insert_cloned_request.
This patch, along with a follow-on which fixes the BUG_ON, fixes
the issue reported.
[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html
Reported-by: Christophe Saout <christophe@saout.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jeff Moyer [Mon, 17 Oct 2011 10:57:22 +0000 (12:57 +0200)]
blk-flush: fix invalid BUG_ON in blk_insert_flush
A user reported a regression due to commit
4853abaae7e4a2af938115ce9071ef8684fb7af4 (block: fix flush
machinery for stacking drivers with differring flush flags).
Part of the problem is that blk_insert_flush required a
single bio be attached to the request. In reality, having
no attached bio is also a valid case, as can be observed with
an empty flush.
[1] http://www.redhat.com/archives/dm-devel/2011-September/msg00154.html
Reported-by: Christophe Saout <christophe@saout.de>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com
Acked-by: Tejun Heo <tj@kernel.org>
Stable note: 3.1
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tao Ma [Mon, 24 Oct 2011 14:11:30 +0000 (16:11 +0200)]
block: Remove the control of complete cpu from bio.
bio originally has the functionality to set the complete cpu, but
it is broken.
Chirstoph said that "This code is unused, and from the all the
discussions lately pretty obviously broken. The only thing keeping
it serves is creating more confusion and possibly more bugs."
And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
with leaving cpu control to the request based drivers, they are the
only ones that can toggle the setting anyway".
So this patch tries to remove all the work of controling complete cpu
from a bio.
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jie Liu [Mon, 24 Oct 2011 14:08:38 +0000 (16:08 +0200)]
block: fix a typo in the blk-cgroup.h file
byptes -> bytes.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Vrabel [Thu, 20 Oct 2011 19:24:30 +0000 (21:24 +0200)]
block: initialize the bounce pool if high memory may be added later
init_emergency_pool() does not create the page pool for bouncing block
requests if the current count of high pages is zero. If high memory
may be added later (either via memory hotplug or a balloon driver in a
virtualized system) then a oops occurs if a request with a high page
need bouncing because the pool does not exist.
So, always create the pool if memory hotplug is enabled and change the
test so it's valid even if all high pages are currently in the balloon
(the balloon drivers adjust totalhigh_pages but not max_pfn).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:42:16 +0000 (14:42 +0200)]
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
request_queue is refcounted but actually depdends on lifetime
management from the queue owner - on blk_cleanup_queue(), block layer
expects that there's no request passing through request_queue and no
new one will.
This is fundamentally broken. The queue owner (e.g. SCSI layer)
doesn't have a way to know whether there are other active users before
calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
guarantee that the queue is and would stay valid while it's holding a
reference.
With delay added in blk_queue_bio() before queue_lock is grabbed, the
following oops can be easily triggered when a device is removed with
in-flight IOs.
sd 0:0:1:0: [sdb] Stopping disk
ata1.01: disabled
general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
RIP: 0010:[<
ffffffff8137d651>] [<
ffffffff8137d651>] elv_rqhash_find+0x61/0x100
...
Process test_rawio (pid: 648, threadinfo
ffff880019efa000, task
ffff880019ef8a80)
...
Call Trace:
[<
ffffffff8137d774>] elv_merge+0x84/0xe0
[<
ffffffff81385b54>] blk_queue_bio+0xf4/0x400
[<
ffffffff813838ea>] generic_make_request+0xca/0x100
[<
ffffffff81383994>] submit_bio+0x74/0x100
[<
ffffffff811c53ec>] dio_bio_submit+0xbc/0xc0
[<
ffffffff811c610e>] __blockdev_direct_IO+0x92e/0xb40
[<
ffffffff811c39f7>] blkdev_direct_IO+0x57/0x60
[<
ffffffff8113b1c5>] generic_file_aio_read+0x6d5/0x760
[<
ffffffff8118c1ca>] do_sync_read+0xda/0x120
[<
ffffffff8118ce55>] vfs_read+0xc5/0x180
[<
ffffffff8118cfaa>] sys_pread64+0x9a/0xb0
[<
ffffffff81afaf6b>] system_call_fastpath+0x16/0x1b
This happens because blk_queue_cleanup() destroys the queue and
elevator whether IOs are in progress or not and DEAD tests are
sprinkled in the request processing path without proper
synchronization.
Similar problem exists for blk-throtl. On queue cleanup, blk-throtl
is shutdown whether it has requests in it or not. Depending on
timing, it either oopses or throttled bios are lost putting tasks
which are waiting for bio completion into eternal D state.
The way it should work is having the usual clear distinction between
shutdown and release. Shutdown drains all currently pending requests,
marks the queue dead, and performs partial teardown of the now
unnecessary part of the queue. Even after shutdown is complete,
reference holders are still allowed to issue requests to the queue
although they will be immmediately failed. The rest of teardown
happens on release.
This patch makes the following changes to make blk_queue_cleanup()
behave as proper shutdown.
* QUEUE_FLAG_DEAD is now set while holding both q->exit_mutex and
queue_lock.
* Unsynchronized DEAD check in generic_make_request_checks() removed.
This couldn't make any meaningful difference as the queue could die
after the check.
* blk_drain_queue() updated such that it can drain all requests and is
now called during cleanup.
* blk_throtl updated such that it checks DEAD on grabbing queue_lock,
drains all throttled bios during cleanup and free td when queue is
released.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:33:08 +0000 (14:33 +0200)]
block: drop @tsk from attempt_plug_merge() and explain sync rules
attempt_plug_merge() accesses elevator without holding queue_lock and
may call into ->elevator_bio_merge_fn(). The elvator is guaranteed to
be valid because it's accessed iff the plugged list has requests and
elevator is never exited with live requests, so as long as the
elevator method can deal with unlocked access, this is safe.
Explain the sync rules around attempt_plug_merge() and drop the
unnecessary @tsk parameter.
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:33:05 +0000 (14:33 +0200)]
block: make get_request[_wait]() fail if queue is dead
Currently get_request[_wait]() allocates request whether queue is dead
or not. This patch makes get_request[_wait]() return NULL if @q is
dead. blk_queue_bio() is updated to fail the submitted bio if request
allocation fails. While at it, add docbook comments for
get_request[_wait]().
Note that the current code has rather unclear (there are spurious DEAD
tests scattered around) assumption that the owner of a queue
guarantees that no request travels block layer if the queue is dead
and this patch in itself doesn't change much; however, this will allow
fixing the broken assumption in the next patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:33:01 +0000 (14:33 +0200)]
block: reorganize throtl_get_tg() and blk_throtl_bio()
blk_throtl_bio() and throtl_get_tg() have rather unusual interface.
* throtl_get_tg() returns pointer to a valid tg or ERR_PTR(-ENODEV),
and drops queue_lock in the latter case. Different locking context
depending on return value is error-prone and DEAD state is scheduled
to be protected by queue_lock anyway. Move DEAD check inside
queue_lock and return valid tg or NULL.
* blk_throtl_bio() indicates return status both with its return value
and in/out param **@bio. The former is used to indicate whether
queue is found to be dead during throtl processing. The latter
whether the bio is throttled.
There's no point in returning DEAD check result from
blk_throtl_bio(). The queue can die after blk_throtl_bio() is
finished but before make_request_fn() grabs queue lock.
Make it take *@bio instead and return boolean result indicating
whether the request is throttled or not.
This patch doesn't cause any visible functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:32:38 +0000 (14:32 +0200)]
block: reorganize queue draining
Reorganize queue draining related code in preparation of queue exit
changes.
* Factor out actual draining from elv_quiesce_start() to
blk_drain_queue().
* Make elv_quiesce_start/end() responsible for their own locking.
* Replace open-coded ELVSWITCH clearing in elevator_switch() with
elv_quiesce_end().
This patch doesn't cause any visible functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:31:25 +0000 (14:31 +0200)]
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
blk_get/put_queue() in scsi_cmd_ioctl() and throtl_get_tg() are
completely bogus. The caller must have a reference to the queue on
entry and taking an extra reference doesn't change anything.
For scsi_cmd_ioctl(), the only effect is that it ends up checking
QUEUE_FLAG_DEAD on entry; however, this is bogus as queue can die
right after blk_get_queue(). Dead queue should be and is handled in
request issue path (it's somewhat broken now but that's a separate
problem and doesn't affect this one much).
throtl_get_tg() incorrectly assumes that q is rcu freed. Also, it
doesn't check return value of blk_get_queue(). If the queue is
already dead, it ends up doing an extra put.
Drop them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:31:22 +0000 (14:31 +0200)]
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
blk_alloc_request() and freed_request() take different combinations of
REQ_* @flags, @priv and @is_sync when @flags is superset of the latter
two. Make them take @flags only. This cleans up the code a bit and
will ease updating allocation related REQ_* flags.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:31:18 +0000 (14:31 +0200)]
block: move blk_throtl prototypes to block/blk.h
blk_throtl interface is block internal and there's no reason to have
them in linux/blkdev.h. Move them to block/blk.h.
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:31:15 +0000 (14:31 +0200)]
block: fix genhd refcounting in blkio_policy_parse_and_set()
blkio_policy_parse_and_set() calls blkio_check_dev_num() to check
whether the given dev_t is valid. blkio_check_dev_num() uses
get_gendisk() for verification but never puts the returned genhd
leaking the reference.
This patch collapses blkio_check_dev_num() into its caller and updates
it such that the genhd is put before returning.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun Heo [Wed, 19 Oct 2011 12:31:07 +0000 (14:31 +0200)]
block: make gendisk hold a reference to its queue
The following command sequence triggers an oops.
# mount /dev/sdb1 /mnt
# echo 1 > /sys/class/scsi_device/0\:0\:1\:0/device/delete
# umount /mnt
general protection fault: 0000 [#1] PREEMPT SMP
CPU 2
Modules linked in:
Pid: 791, comm: umount Not tainted 3.1.0-rc3-work+ #8 Bochs Bochs
RIP: 0010:[<
ffffffff810d0879>] [<
ffffffff810d0879>] __lock_acquire+0x389/0x1d60
...
Call Trace:
[<
ffffffff810d2845>] lock_acquire+0x95/0x140
[<
ffffffff81aed87b>] _raw_spin_lock+0x3b/0x50
[<
ffffffff811573bc>] bdi_lock_two+0x5c/0x70
[<
ffffffff811c2f6c>] bdev_inode_switch_bdi+0x4c/0xf0
[<
ffffffff811c3fcb>] __blkdev_put+0x11b/0x1d0
[<
ffffffff811c4010>] __blkdev_put+0x160/0x1d0
[<
ffffffff811c40df>] blkdev_put+0x5f/0x190
[<
ffffffff8118f18d>] kill_block_super+0x4d/0x80
[<
ffffffff8118f4a5>] deactivate_locked_super+0x45/0x70
[<
ffffffff8119003a>] deactivate_super+0x4a/0x70
[<
ffffffff811ac4ad>] mntput_no_expire+0xed/0x130
[<
ffffffff811acf2e>] sys_umount+0x7e/0x3a0
[<
ffffffff81aeeeab>] system_call_fastpath+0x16/0x1b
This is because bdev holds on to disk but disk doesn't pin the
associated queue. If a SCSI device is removed while the device is
still open, the sdev puts the base reference to the queue on release.
When the bdev is finally released, the associated queue is already
gone along with the bdi and bdev_inode_switch_bdi() ends up
dereferencing already freed bdi.
Even if it were not for this bug, disk not holding onto the associated
queue is very unusual and error-prone.
Fix it by making add_disk() take an extra reference to its queue and
put it on disk_release() and ensuring that disk and its fops owner are
put in that order after all accesses to the disk and queue are
complete.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 19 Oct 2011 12:30:42 +0000 (14:30 +0200)]
Merge branch 'v3.1-rc10' into for-3.2/core
Conflicts:
block/blk-core.c
include/linux/blkdev.h
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Tue, 18 Oct 2011 04:06:23 +0000 (21:06 -0700)]
Linux 3.1-rc10
Linus Torvalds [Mon, 17 Oct 2011 15:24:24 +0000 (08:24 -0700)]
Avoid using variable-length arrays in kernel/sys.c
The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).
Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?). That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.
"Just don't do that, then".
Reported-by: Henrik Grindal Bakken <henribak@cisco.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Mon, 17 Oct 2011 10:57:20 +0000 (12:57 +0200)]
loop: remove the incorrect write_begin/write_end shortcut
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can. This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.
This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools. This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.
The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation). Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 16 Oct 2011 20:08:27 +0000 (13:08 -0700)]
Merge branch 'fixes' of ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
ARM: 7128/1: vic: Don't write to the read-only register VIC_IRQ_STATUS
ARM: 7122/1: localtimer: add header linux/errno.h explicitly
ARM: 7117/1: perf: fix HW_CACHE_* events on Cortex-A9
ARM: 7113/1: mm: Align bank start to MAX_ORDER_NR_PAGES
Zoltan Devai [Mon, 10 Oct 2011 13:54:12 +0000 (14:54 +0100)]
ARM: 7128/1: vic: Don't write to the read-only register VIC_IRQ_STATUS
This is unneeded and causes an abort on the SPMP8000 platform.
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Zoltan Devai <zoss@devai.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Shawn Guo [Thu, 6 Oct 2011 13:57:24 +0000 (14:57 +0100)]
ARM: 7122/1: localtimer: add header linux/errno.h explicitly
Per the text in Documentation/SubmitChecklist as below, we should
explicitly have header linux/errno.h in localtimer.h for ENXIO
reference.
1: If you use a facility then #include the file that defines/declares
that facility. Don't depend on other header files pulling in ones
that you use.
Otherwise, we may run into some compiling error like the following one,
if any file includes localtimer.h without CONFIG_LOCAL_TIMERS defined.
arch/arm/include/asm/localtimer.h: In function ‘local_timer_setup’:
arch/arm/include/asm/localtimer.h:53:10: error: ‘ENXIO’ undeclared (first use in this function)
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Mon, 3 Oct 2011 17:30:53 +0000 (18:30 +0100)]
ARM: 7117/1: perf: fix HW_CACHE_* events on Cortex-A9
Using COHERENT_LINE_{MISS,HIT} for cache misses and references
respectively is completely wrong. Instead, use the L1D events which
are a better and more useful approximation despite ignoring instruction
traffic.
Reported-by: Alasdair Grant <alasdair.grant@arm.com>
Reported-by: Matt Horsnell <matt.horsnell@arm.com>
Reported-by: Michael Williams <michael.williams@arm.com>
Cc: stable@kernel.org
Cc: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Fri, 14 Oct 2011 20:29:09 +0000 (08:29 +1200)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83627ehf) Properly report thermal diode sensors
Linus Torvalds [Fri, 14 Oct 2011 05:07:52 +0000 (17:07 +1200)]
Merge branch 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6
* 'gpio/merge' of git://git.secretlab.ca/git/linux-2.6:
gpio-pca953x: fix gpio_base
gpio/omap: fix build error with certain OMAP1 configs
Linus Torvalds [Fri, 14 Oct 2011 05:06:39 +0000 (17:06 +1200)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: revert to using a kthread for AIL pushing
xfs: force the log if we encounter pinned buffers in .iop_pushbuf
xfs: do not update xa_last_pushed_lsn for locked items
Linus Torvalds [Fri, 14 Oct 2011 04:59:11 +0000 (16:59 +1200)]
Merge branch 'stable' of git://github.com/cmetcalf-tilera/linux-tile
* 'stable' of git://github.com/cmetcalf-tilera/linux-tile:
tile: revert change from <asm/atomic.h> to <linux/atomic.h> in asm files
Linus Torvalds [Fri, 14 Oct 2011 04:54:56 +0000 (16:54 +1200)]
Merge branch 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86: Default to vsyscall=native for now
Mika Westerberg [Thu, 13 Oct 2011 09:04:20 +0000 (12:04 +0300)]
x86, mrst: use a temporary variable for SFI irq
SFI tables reside in RAM and should not be modified once they are
written. Current code went to set pentry->irq to zero which causes
subsequent reads to fail with invalid SFI table checksum. This will
break kexec as the second kernel fails to validate SFI tables.
To fix this we use temporary variable for irq number.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Thu, 13 Oct 2011 19:49:08 +0000 (15:49 -0400)]
hwmon: (w83627ehf) Properly report thermal diode sensors
The w83627ehf driver is improperly reporting thermal diode sensors as
type 2, instead of 3. This caused "sensors" and possibly other
monitoring tools to report these sensors as "transistor" instead of
"thermal diode".
Furthermore, diode subtype selection (CPU vs. external) is only
supported by the original W83627EHF/EHG. All later models only support
CPU diode type, and some (NCT6776F) don't even have the register in
question so we should avoid reading from it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Hartmut Knaack [Mon, 10 Oct 2011 22:22:45 +0000 (00:22 +0200)]
gpio-pca953x: fix gpio_base
gpio_base was set to 0 if no system platform data or open firmware
platform data was provided. This led to conflicts, if any other gpiochip
with a gpiobase of 0 was instantiated already. Setting it to -1 will
automatically use the first one available.
Signed-off-by: Hartmut Knaack <knaack.h@gmx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Janusz Krzysztofik [Tue, 23 Aug 2011 11:42:24 +0000 (13:42 +0200)]
gpio/omap: fix build error with certain OMAP1 configs
With commit
f64ad1a0e21a, "gpio/omap: cleanup _set_gpio_wakeup(), remove
ifdefs", access to build time conditionally omitted 'suspend_wakeup'
member of the 'gpio_bank' structure has been placed unconditionally in
function _set_gpio_wakeup(), which is always built. This resulted in the
driver compilation broken for certain OMAP1, i.e., non-OMAP16xx,
configurations.
Really required or not in previously excluded cases, define this
structure member unconditionally as a fix.
Tested with a custom OMAP1510 only configuration.
Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Chris Metcalf [Wed, 5 Oct 2011 21:09:29 +0000 (17:09 -0400)]
tile: revert change from <asm/atomic.h> to <linux/atomic.h> in asm files
The 32-bit TILEPro support uses some #defines in <asm/atomic_32.h>
for atomic support routines in assembly. To make this more explicit,
I've turned those includes into includes of <asm/atomic_32.h>, which
should hopefully make it clear that they shouldn't be bombed into
<linux/atomic.h> in any cleanups.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Linus Torvalds [Thu, 13 Oct 2011 06:25:45 +0000 (18:25 +1200)]
Merge git://git./linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
mscan: too much data copied to CAN frame due to 16 bit accesses
gro: refetch inet6_protos[] after pulling ext headers
bnx2x: fix cl_id allocation for non-eth clients for NPAR mode
mlx4_en: fix endianness with blue frame support
Johann Felix Soden [Mon, 10 Oct 2011 18:37:00 +0000 (11:37 -0700)]
ide: Fix file references in drivers/ide/
Fix file references in drivers/ide/
There are a lot of file references to now moved or deleted files in the
whole tree, especially in documentation and Kconfig files. This patch
fixes the references in drivers/ide/.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 Oct 2011 06:20:40 +0000 (18:20 +1200)]
Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
Btrfs: make sure not to defrag extents past i_size
Btrfs: fix recursive auto-defrag
Christoph Hellwig [Tue, 11 Oct 2011 15:14:10 +0000 (11:14 -0400)]
xfs: revert to using a kthread for AIL pushing
Currently we have a few issues with the way the workqueue code is used to
implement AIL pushing:
- it accidentally uses the same workqueue as the syncer action, and thus
can be prevented from running if there are enough sync actions active
in the system.
- it doesn't use the HIGHPRI flag to queue at the head of the queue of
work items
At this point I'm not confident enough in getting all the workqueue flags and
tweaks right to provide a perfectly reliable execution context for AIL
pushing, which is the most important piece in XFS to make forward progress
when the log fills.
Revert back to use a kthread per filesystem which fixes all the above issues
at the cost of having a task struct and stack around for each mounted
filesystem. In addition this also gives us much better ways to diagnose
any issues involving hung AIL pushing and removes a small amount of code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 11 Oct 2011 15:14:09 +0000 (15:14 +0000)]
xfs: force the log if we encounter pinned buffers in .iop_pushbuf
We need to check for pinned buffers even in .iop_pushbuf given that inode
items flush into the same buffers that may be pinned directly due operations
on the unlinked inode list operating directly on buffers. To do this add a
return value to .iop_pushbuf that tells the AIL push about this and use
the existing log force mechanisms to unpin it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Tue, 11 Oct 2011 15:14:08 +0000 (15:14 +0000)]
xfs: do not update xa_last_pushed_lsn for locked items
If an item was locked we should not update xa_last_pushed_lsn and thus skip
it when restarting the AIL scan as we need to be able to lock and write it
out as soon as possible. Otherwise heavy lock contention might starve AIL
pushing too easily, especially given the larger backoff once we moved
xa_last_pushed_lsn all the way to the target lsn.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Chris Mason [Tue, 11 Oct 2011 15:41:40 +0000 (11:41 -0400)]
Btrfs: make sure not to defrag extents past i_size
The btrfs file defrag code will loop through the extents and
force COW on them. But there is a concurrent truncate in the middle of
the defrag, it might end up defragging the same range over and over
again.
The problem is that writepage won't go through and do anything on pages
past i_size, so the cow won't happen, so the file will appear to still
be fragmented. defrag will end up hitting the same extents again and
again.
In the worst case, the truncate can actually live lock with the defrag
because the defrag keeps creating new ordered extents which the truncate
code keeps waiting on.
The fix here is to make defrag check for i_size inside the main loop,
instead of just once before the looping starts.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Adrian Bunk [Wed, 5 Oct 2011 21:40:47 +0000 (00:40 +0300)]
x86: Default to vsyscall=native for now
This UML breakage:
linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:
ffffffffff600000 cs:33 sp:
7fbfb9c498 ax:
ffffffffff600000 si:0 di:606790
linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:
ffffffffff600000 cs:33 sp:
7fbfb13168 ax:
ffffffffff600000 si:0 di:606790
Is caused by commit
3ae36655 ("x86-64: Rework vsyscall emulation and add
vsyscall= parameter") - the vsyscall emulation code is not fully cooked
yet as UML relies on some rather fragile SIGSEGV semantics.
Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default
to vsyscall=native for now, this patch implements that.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Andrew Lutomirski <luto@mit.edu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Mon, 10 Oct 2011 19:43:34 +0000 (15:43 -0400)]
Btrfs: fix recursive auto-defrag
Follow those steps:
# mount -o autodefrag /dev/sda7 /mnt
# dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
# sync
# dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc
and then it'll go into a loop: writeback -> defrag -> writeback ...
It's because writeback writes [8K, 200K] and then writes [0, 8K].
I tried to make writeback know if the pages are dirtied by defrag,
but the patch was a bit intrusive. Here I simply set writeback_index
when we defrag a file.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Wolfgang Grandegger [Fri, 7 Oct 2011 09:28:14 +0000 (09:28 +0000)]
mscan: too much data copied to CAN frame due to 16 bit accesses
Due to the 16 bit access to mscan registers there's too much data copied to
the zero initialized CAN frame when having an odd number of bytes to copy.
This patch ensures that only the requested bytes are copied by using an
8 bit access for the remaining byte.
Reported-by: Andre Naujoks <nautsch@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan, Zheng [Sat, 8 Oct 2011 22:34:35 +0000 (22:34 +0000)]
gro: refetch inet6_protos[] after pulling ext headers
ipv6_gro_receive() doesn't update the protocol ops after pulling
the ext headers. It looks like a typo.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 9 Oct 2011 23:57:36 +0000 (23:57 +0000)]
bnx2x: fix cl_id allocation for non-eth clients for NPAR mode
There are some consolidations of NPAR configuration
when FCoE and iSCSI L2 clients will get the same id,
in this case FCoE ring will be non-functional.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thadeu Lima de Souza Cascardo [Mon, 10 Oct 2011 06:42:23 +0000 (06:42 +0000)]
mlx4_en: fix endianness with blue frame support
The doorbell register was being unconditionally swapped. In x86, that
meant it was being swapped to BE and written to the descriptor and to
memory, depending on the case of blue frame support or writing to
doorbell register. On PPC, this meant it was being swapped to LE and
then swapped back to BE while writing to the register. But in the blue
frame case, it was being written as LE to the descriptor.
The fix is not to swap doorbell unconditionally, write it to the
register as BE and convert it to BE when writing it to the descriptor.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reported-by: Richard Hendrickson <richhend@us.ibm.com>
Cc: Eli Cohen <eli@dev.mellanox.co.il>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 10 Oct 2011 02:53:11 +0000 (14:53 +1200)]
Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2
Linus Torvalds [Mon, 10 Oct 2011 02:48:27 +0000 (14:48 +1200)]
Merge branch 'fixes' of git://git.linaro.org/people/arnd/arm-soc
* 'fixes' of git://git.linaro.org/people/arnd/arm-soc:
ARM: mach-ux500: enable fix for ARM errata 754322
ARM: OMAP: musb: Remove a redundant omap4430_phy_init call in usb_musb_init
ARM: OMAP: Fix i2c init for twl4030
ARM: OMAP4: MMC: fix power and audio issue, decouple USBC1 from MMC1
Marc Dietrich [Fri, 7 Oct 2011 15:31:41 +0000 (08:31 -0700)]
ARM: tegra: fix compilation error due to mach/hardware.h removal
This fixes a compilation error in cpu-tegra.c which was introduced in
dc8d966bccde ("ARM: convert PCI defines to variables") which removed the
now obsolete mach/hardware.h from the mach-tegra subtree.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 10 Oct 2011 02:43:06 +0000 (14:43 +1200)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/kms: use hardcoded dig encoder to transmitter mapping for DCE4.1
drm/radeon/kms: fix dp_detect handling for DP bridge chips
drm/radeon/kms: retry aux transactions if there are status flags
Olof Johansson [Fri, 7 Oct 2011 20:27:48 +0000 (13:27 -0700)]
MAINTAINERS: Update tegra maintainer information
A couple of changes to the Tegra maintainership setup:
I'm very glad to bring on Stephen Warren on board as a maintainer. The
work he has done so far is excellent, and the fact that he works for
Nvidia means he has long-term interest in the platform.
Erik Gilling did an astounding amount of work on getting things up and
running but has been a silent partner on the maintainership side for a
while, and is stepping down. Thanks for your contributions so far, Erik.
Finally, update the git URL since I'll take over running the main repo
for a while.
Overall maintainership model isn't changing much at this time: We'll all
three review patches as appropriate, and one of us will collect the main
repo (me at this time).
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Erik Gilling <konkers@android.com>
Acked-by: Colin Cross <ccross@android.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 10 Oct 2011 02:39:03 +0000 (14:39 +1200)]
Merge branch 'upstream' of git://git.linux-mips.org/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (29 commits)
MIPS: Call oops_enter, oops_exit in die
staging/octeon: Software should check the checksum of no tcp/udp packets
MIPS: Octeon: Enable C0_UserLocal probing.
MIPS: No branches in delay slots for huge pages in handle_tlbl
MIPS: Don't clobber CP0_STATUS value for CONFIG_MIPS_MT_SMTC
MIPS: Octeon: Select CONFIG_HOLES_IN_ZONE
MIPS: PM: Use struct syscore_ops instead of sysdevs for PM (v2)
MIPS: Compat: Use 32-bit wrapper for compat_sys_futex.
MIPS: Do not use EXTRA_CFLAGS
MIPS: Alchemy: DB1200: Disable cascade IRQ in handler
SERIAL: Lantiq: Set timeout in uart_port
MIPS: Lantiq: Fix setting the PCI bus speed on AR9
MIPS: Lantiq: Fix external interrupt sources
MIPS: tlbex: Fix build error in R3000 code.
MIPS: Alchemy: Include Au1100 in PM code.
MIPS: Alchemy: Fix typo in MAC0 registration
MIPS: MSP71xx: Fix build error.
MIPS: Handle __put_user() sleeping.
MIPS: Allow forced irq threading
MIPS: i8259: Mark cascade interrupt non-threaded
...
Arnd Bergmann [Sat, 8 Oct 2011 20:21:07 +0000 (22:21 +0200)]
Merge branch 'omap/fixes-for-3.1' into fixes
Steve French [Fri, 7 Oct 2011 04:14:07 +0000 (23:14 -0500)]
[CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2
Microsoft has a bug with ntlmv2 that requires use of ntlmssp, but
we didn't get the required information on when/how to use ntlmssp to
old (but once very popular) legacy servers (various NT4 fixpacks
for example) until too late to merge for 3.1. Will upgrade
to NTLMv2 in NTLMSSP in 3.2
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
srinidhi kasagar [Tue, 20 Sep 2011 05:45:46 +0000 (11:15 +0530)]
ARM: mach-ux500: enable fix for ARM errata 754322
This applies ARM errata fix 754322 for all ux500 platforms.
Cc: stable@kernel.org
Signed-off-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Thu, 6 Oct 2011 23:15:10 +0000 (16:15 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
net: fix typos in Documentation/networking/scaling.txt
bridge: leave carrier on for empty bridge
netfilter: Use proper rwlock init function
tcp: properly update lost_cnt_hint during shifting
tcp: properly handle md5sig_pool references
macvlan/macvtap: Fix unicast between macvtap interfaces in bridge mode
Paul Menzel [Wed, 31 Aug 2011 15:07:10 +0000 (17:07 +0200)]
x86/PCI: use host bridge _CRS info on ASUS M2V-MX SE
In summary, this DMI quirk uses the _CRS info by default for the ASUS
M2V-MX SE by turning on `pci=use_crs` and is similar to the quirk
added by commit
2491762cfb47 ("x86/PCI: use host bridge _CRS info on
ASRock ALiveSATA2-GLAN") whose commit message should be read for further
information.
Since commit
3e3da00c01d0 ("x86/pci: AMD one chain system to use pci
read out res") Linux gives the following oops:
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
HDA Intel 0000:20:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
HDA Intel 0000:20:01.0: setting latency timer to 64
BUG: unable to handle kernel paging request at
ffffc90011c08000
IP: [<
ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
PGD
13781a067 PUD
13781b067 PMD
1300ba067 PTE
800000fd00000173
Oops: 0009 [#1] SMP
last sysfs file: /sys/module/snd_pcm/initstate
CPU 0
Modules linked in: snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event tpm_tis tpm snd_seq tpm_bios psmouse parport_pc snd_timer snd_seq_device parport processor evdev snd i2c_viapro thermal_sys amd64_edac_mod k8temp i2c_core soundcore shpchp pcspkr serio_raw asus_atk0110 pci_hotplug edac_core button snd_page_alloc edac_mce_amd ext3 jbd mbcache sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod raid1 md_mod usbhid hid sg sd_mod crc_t10dif sr_mod cdrom ata_generic uhci_hcd sata_via pata_via libata ehci_hcd usbcore scsi_mod via_rhine mii nls_base [last unloaded: scsi_wait_scan]
Pid: 1153, comm: work_for_cpu Not tainted 2.6.37-1-amd64 #1 M2V-MX SE/System Product Name
RIP: 0010:[<
ffffffffa0578402>] [<
ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
RSP: 0018:
ffff88013153fe50 EFLAGS:
00010286
RAX:
ffffc90011c08000 RBX:
ffff88013029ec00 RCX:
0000000000000006
RDX:
0000000000000000 RSI:
0000000000000246 RDI:
0000000000000246
RBP:
ffff88013341d000 R08:
0000000000000000 R09:
0000000000000040
R10:
0000000000000286 R11:
0000000000003731 R12:
ffff88013029c400
R13:
0000000000000000 R14:
0000000000000000 R15:
ffff88013341d090
FS:
0000000000000000(0000) GS:
ffff8800bfc00000(0000) knlGS:
00000000f7610ab0
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
ffffc90011c08000 CR3:
0000000132f57000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process work_for_cpu (pid: 1153, threadinfo
ffff88013153e000, task
ffff8801303c86c0)
Stack:
0000000000000005 ffffffff8123ad65 00000000000136c0 ffff88013029c400
ffff8801303c8998 ffff88013341d000 ffff88013341d090 ffff8801322d9dc8
ffff88013341d208 0000000000000000 0000000000000000 ffffffff811ad232
Call Trace:
[<
ffffffff8123ad65>] ? __pm_runtime_set_status+0x162/0x186
[<
ffffffff811ad232>] ? local_pci_probe+0x49/0x92
[<
ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
[<
ffffffff8105afc5>] ? do_work_for_cpu+0x0/0x1b
[<
ffffffff8105afd0>] ? do_work_for_cpu+0xb/0x1b
[<
ffffffff8105fd3f>] ? kthread+0x7a/0x82
[<
ffffffff8100a824>] ? kernel_thread_helper+0x4/0x10
[<
ffffffff8105fcc5>] ? kthread+0x0/0x82
[<
ffffffff8100a820>] ? kernel_thread_helper+0x0/0x10
Code: f4 01 00 00 ef 31 f6 48 89 df e8 29 dd ff ff 85 c0 0f 88 2b 03 00 00 48 89 ef e8 b4 39 c3 e0 8b 7b 40 e8 fc 9d b1 e0 48 8b 43 38 <66> 8b 10 66 89 14 24 8b 43 14 83 e8 03 83 f8 01 77 32 31 d2 be
RIP [<
ffffffffa0578402>] azx_probe+0x3ad/0x86b [snd_hda_intel]
RSP <
ffff88013153fe50>
CR2:
ffffc90011c08000
---[ end trace
8d1f3ebc136437fd ]---
Trusting the ACPI _CRS information (`pci=use_crs`) fixes this problem.
$ dmesg | grep -i crs # with the quirk
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
The match has to be against the DMI board entries though since the vendor entries are not populated.
DMI: System manufacturer System Product Name/M2V-MX SE, BIOS 0304 10/30/2007
This quirk should be removed when `pci=use_crs` is enabled for machines
from 2006 or earlier or some other solution is implemented.
Using coreboot [1] with this board the problem does not exist but this
quirk also does not affect it either. To be safe though the check is
tightened to only take effect when the BIOS from American Megatrends is
used.
15:13 < ruik> but coreboot does not need that
15:13 < ruik> because i have there only one root bus
15:13 < ruik> the audio is behind a bridge
$ sudo dmidecode
BIOS Information
Vendor: American Megatrends Inc.
Version: 0304
Release Date: 10/30/2007
[1] http://www.coreboot.org/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=30552
Cc: stable@kernel.org (2.6.34)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Poirier [Tue, 4 Oct 2011 04:00:30 +0000 (04:00 +0000)]
net: fix typos in Documentation/networking/scaling.txt
The second hunk fixes rps_sock_flow_table but has to re-wrap the paragraph.
Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 3 Oct 2011 18:14:45 +0000 (18:14 +0000)]
bridge: leave carrier on for empty bridge
This resolves a regression seen by some users of bridging.
Some users use the bridge like a dummy device.
They expect to be able to put an IPv6 address on the device
with no ports attached. Although there are better ways of doing
this, there is no reason to not allow it.
Note: the bridge still will reflect the state of ports in the
bridge if there are any added.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 6 Oct 2011 15:31:47 +0000 (08:31 -0700)]
Merge branch 'for-linus' of people.redhat.com/agk/git/linux-dm
* 'for-linus' of http://people.redhat.com/agk/git/linux-dm:
dm crypt: always disable discard_zeroes_data
dm: raid fix write_mostly arg validation
dm table: avoid crash if integrity profile changes
dm: flakey fix corrupt_bio_byte error path
Linus Torvalds [Thu, 6 Oct 2011 15:30:03 +0000 (08:30 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: Avoid waking up a thread after it has been freed.
Alex Deucher [Wed, 5 Oct 2011 22:36:50 +0000 (18:36 -0400)]
drm/radeon/kms: use hardcoded dig encoder to transmitter mapping for DCE4.1
The encoders are supposedly fully routeable, but changing the mapping
doesn't always seem to take. Using a hardcoded mapping is much more
reliable.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=41366
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Gleixner [Wed, 5 Oct 2011 03:24:43 +0000 (03:24 +0000)]
netfilter: Use proper rwlock init function
Replace the open coded initialization with the init function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 5 Oct 2011 16:22:38 +0000 (09:22 -0700)]
Merge branch 'for-linus' of git://github.com/dtor/input
* 'for-linus' of git://github.com/dtor/input:
Input: wacom - revert "Cintiq 21UX2 does not have menu strips"
Linus Torvalds [Wed, 5 Oct 2011 16:16:11 +0000 (09:16 -0700)]
Merge git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
[SCSI] libsas: fix panic when single phy is disabled on a wide port
[SCSI] qla2xxx: Fix crash in qla2x00_abort_all_cmds() on unload
Alex Deucher [Tue, 4 Oct 2011 16:23:24 +0000 (12:23 -0400)]
drm/radeon/kms: fix dp_detect handling for DP bridge chips
The HPD pin is not reliable for detecting whether a monitor
is connected or not. Skip HPD and just use DDC or load
detection.
Fixes phantom VGA connected bugs.
[Michel: fixes phantom VGA bugs on his llano system.]
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Tue, 4 Oct 2011 21:23:15 +0000 (17:23 -0400)]
drm/radeon/kms: retry aux transactions if there are status flags
If there are error flags in the aux status, retry the transaction.
This makes aux much more reliable, especially on llano systems.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Gerecke [Wed, 5 Oct 2011 05:50:45 +0000 (22:50 -0700)]
Input: wacom - revert "Cintiq 21UX2 does not have menu strips"
This reverts commit
71c86ce59791bcd67af937bbea719a508079d7c2.
The 21UX2 does have touchstrips, but they are in a somewhat-
hidden location.
Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Acked-by: Ping Cheng <pinglinux@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Yan, Zheng [Sun, 2 Oct 2011 04:21:50 +0000 (04:21 +0000)]
tcp: properly update lost_cnt_hint during shifting
lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb.
lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint;
When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture,
the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false,
tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not
need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to
lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost().
So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan, Zheng [Thu, 29 Sep 2011 17:10:10 +0000 (17:10 +0000)]
tcp: properly handle md5sig_pool references
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
increases use count of md5sig_pool for each peer. This patch
makes tcp_v4_md5_do_add() only increases use count for the first
tcp md5sig peer.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ward [Sun, 18 Sep 2011 12:53:20 +0000 (12:53 +0000)]
macvlan/macvtap: Fix unicast between macvtap interfaces in bridge mode
Packets should always be forwarded to the lowerdev using dev_forward_skb.
vlan->forward is for packets being forwarded directly to another macvlan/
macvtap device (used for multicast in bridge mode).
Reported-and-tested-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 5 Oct 2011 01:11:50 +0000 (18:11 -0700)]
Linux 3.1-rc9
Linus Torvalds [Tue, 4 Oct 2011 17:37:06 +0000 (10:37 -0700)]
Merge git://github.com/davem330/net
* git://github.com/davem330/net:
pch_gbe: Fixed the issue on which a network freezes
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
net: xen-netback: correctly restart Tx after a VM restore/migrate
bonding: properly stop queuing work when requested
can bcm: fix incomplete tx_setup fix
RDSRDMA: Fix cleanup of rds_iw_mr_pool
net: Documentation: Fix type of variables
ibmveth: Fix oops on request_irq failure
ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
cxgb4: Fix EEH on IBM P7IOC
can bcm: fix tx_setup off-by-one errors
MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
dp83640: reduce driver noise
ptp: fix L2 event message recognition
Linus Torvalds [Tue, 4 Oct 2011 16:59:22 +0000 (09:59 -0700)]
Merge branch 'fix/asoc' of git://github.com/tiwai/sound
* 'fix/asoc' of git://github.com/tiwai/sound:
ASoC: omap_mcpdm_remove cannot be __devexit
ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC
ASoC: use a valid device for dev_err() in Zylonite
Linus Torvalds [Tue, 4 Oct 2011 16:54:18 +0000 (09:54 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/kms: fix channel_remap setup (v2)
drm/radeon: Set cursor x/y to 0 when x/yorigin > 0.
drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation.
drm/radeon: Simplify cursor x/yorigin calculation.
drm/radeon/kms: fix cursor image off-by-one error
drm/radeon/kms: Fix logic error in DP HPD handler
drm/radeon/kms: add retry limits for native DP aux defer
drm/radeon/kms: fix regression in DP aux defer handling
Linus Torvalds [Tue, 4 Oct 2011 16:52:56 +0000 (09:52 -0700)]
Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
spi-topcliff-pch: Fix overrun issue
spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs
spi-topcliff-pch: Fix CPU read complete condition issue
spi-topcliff-pch: Fix SSN Control issue
spi-topcliff-pch: add tx-memory clear after complete transmitting
Jon Mason [Mon, 3 Oct 2011 14:50:20 +0000 (09:50 -0500)]
PCI: Disable MPS configuration by default
Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.
Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Deucher [Tue, 4 Oct 2011 14:46:34 +0000 (10:46 -0400)]
drm/radeon/kms: fix channel_remap setup (v2)
Most asics just use the hw default value which requires
no explicit programming. For those that need a different
value, the vbios will program it properly. As such,
there's no need to program these registers explicitly
in the driver. Changing MC_SHARED_CHREMAP requires a reload
of all data in vram otherwise its contents will be scambled.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=40103
v2: drop now unused channel_remap functions.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:38 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix overrun issue
We found that adding load, Rx data sometimes drops.(with DMA transfer mode)
The cause is that before starting Rx-DMA processing, Tx-DMA processing starts.
This causes FIFO overrun occurs.
This patch fixes the issue by modifying FIFO tx-threshold and DMA descriptor
size like below.
Current this patch
Rx-descriptor 4Byte+12Byte*341 --> 12Byte*340-4Byte-12Byte
Rx-threshold (Not modified)
Tx-descriptor 4Byte+12Byte*341 --> 16Byte-12Byte*340
Rx-threshold 12Byte --> 2Byte
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:37 +0000 (17:16 +0900)]
spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs
Add recovery processing in case FIFO overrun error occurs with DMA transfer mode.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:36 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix CPU read complete condition issue
We found Rx data sometimes drops.(with non-DMA transfer mode)
The cause is read complete condition is not true.
This patch fixes the issue.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:35 +0000 (17:16 +0900)]
spi-topcliff-pch: Fix SSN Control issue
During processing 1 command/data series,
SSN should keep LOW.
However, currently, SSN becomes HIGH.
This patch fixes the issue.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tomoya MORINAGA [Tue, 6 Sep 2011 08:16:34 +0000 (17:16 +0900)]
spi-topcliff-pch: add tx-memory clear after complete transmitting
Currently, in case of reading date from SPI flash,
command is sent twice.
The cause is that tx-memory clear processing is missing .
This patch adds the tx-momory clear processing.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Takashi Iwai [Tue, 4 Oct 2011 01:09:14 +0000 (18:09 -0700)]
lis3: fix regression of HP DriveGuard with 8bit chip
Commit
2a7fade7e03 ("hwmon: lis3: Power on corrections") caused a
regression on HP laptops with 8bit chip. Writing CTRL2_BOOT_8B bit seems
clearing the BIOS setup, and no proper interrupt for DriveGuard will be
triggered any more.
Since the init code there is basically only for embedded devices, put a
pdata check so that the problematic initialization will be skipped for
hp_accel stuff.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 3 Oct 2011 19:54:56 +0000 (12:54 -0700)]
Merge branch 'hwmon-for-linus' of git://github.com/groeck/linux
* 'hwmon-for-linus' of git://github.com/groeck/linux:
hwmon: (coretemp) Avoid leaving around dangling pointer
hwmon: (coretemp) Fixup platform device ID change
Linus Torvalds [Mon, 3 Oct 2011 19:53:43 +0000 (12:53 -0700)]
Merge git://github.com/davem330/ide
* git://github.com/davem330/ide:
ide-disk: Fix request requeuing
Linus Torvalds [Mon, 3 Oct 2011 19:17:44 +0000 (12:17 -0700)]
Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
Btrfs: force a page fault if we have a shorty copy on a page boundary
Borislav Petkov [Mon, 3 Oct 2011 18:28:18 +0000 (14:28 -0400)]
ide-disk: Fix request requeuing
Simon Kirby reported that on his RAID setup with idedisk underneath
the box OOMs after a couple of days of runtime. Running with
CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally
allocates an ide_cmd struct. However, ide_requeue_and_plug() can be
called more than once per request, either from the request issue or the
IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn()
repeatedly, allocating a struct ide_cmd everytime and "forgetting" the
previous pointer.
Make sure the code reuses the old allocated chunk.
Reported-and-tested-by: Simon Kirby <sim@hostway.ca>
Cc: <stable@kernel.org> [ 39.x, 3.0.x ]
Link: http://marc.info/?l=linux-kernel&m=131667641517919
Link: http://lkml.kernel.org/r/20110922072643.GA27232@hostway.ca
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiharu Okada [Sun, 25 Sep 2011 21:27:43 +0000 (21:27 +0000)]
pch_gbe: Fixed the issue on which a network freezes
The pch_gbe driver has an issue which a network stops,
when receiving traffic is high.
In the case, The link down and up are necessary to return a network.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiharu Okada [Sun, 25 Sep 2011 21:27:42 +0000 (21:27 +0000)]
pch_gbe: Fixed the issue on which PC was frozen when link was downed.
When a link was downed during network use,
there is an issue on which PC freezes.
This patch fixed this issue.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Sep 2011 10:38:28 +0000 (10:38 +0000)]
make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
This is a minor change.
Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
...) would return total and dropped packets since its last invocation. The
introduction of socket queue overflow reporting [1] changed drop
rate calculation in the normal packet socket path, but not when using a
packet ring. As a result, the getsockopt now returns different statistics
depending on the reception method used. With a ring, it still returns the
count since the last call, as counts are incremented in tpacket_rcv and
reset in getsockopt. Without a ring, it returns 0 if no drops occurred
since the last getsockopt and the total drops over the lifespan of
the socket otherwise. The culprit is this line in packet_rcv, executed
on a drop:
drop_n_acct:
po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);
As it shows, the new drop number it taken from the socket drop counter,
which is not reset at getsockopt. I put together a small example
that demonstrates the issue [2]. It runs for 10 seconds and overflows
the queue/ring on every odd second. The reported drop rates are:
ring: 16, 0, 16, 0, 16, ...
non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.
Note how the even ring counts monotonically increase. Because the
getsockopt adds tp_drops to tp_packets, total counts are similarly
reported cumulatively. Long story short, reinstating the original code, as
the below patch does, fixes the issue at the cost of additional per-packet
cycles. Another solution that does not introduce per-packet overhead
is be to keep the current data path, record the value of sk_drops at
getsockopt() at call N in a new field in struct packetsock and subtract
that when reporting at call N+1. I'll be happy to code that, instead,
it's just more messy.
[1] http://patchwork.ozlabs.org/patch/35665/
[2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel [Fri, 30 Sep 2011 06:37:51 +0000 (06:37 +0000)]
net: xen-netback: correctly restart Tx after a VM restore/migrate
If a VM is saved and restored (or migrated) the netback driver will no
longer process any Tx packets from the frontend. xenvif_up() does not
schedule the processing of any pending Tx requests from the front end
because the carrier is off. Without this initial kick the frontend
just adds Tx requests to the ring without raising an event (until the
ring is full).
This was caused by
47103041e91794acdbc6165da0ae288d844c820b (net:
xen-netback: convert to hw_features) which reordered the calls to
xenvif_up() and netif_carrier_on() in xenvif_connect().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Fri, 23 Sep 2011 10:53:34 +0000 (10:53 +0000)]
bonding: properly stop queuing work when requested
During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.
There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.
This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Liang Zheng <lzheng@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>