Al Viro [Sun, 10 Jun 2012 22:09:36 +0000 (18:09 -0400)]
don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 22:05:36 +0000 (18:05 -0400)]
don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 21:17:17 +0000 (17:17 -0400)]
fs/namei.c: don't pass nameidata to __lookup_hash() and lookup_real()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 21:13:09 +0000 (17:13 -0400)]
stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 22 Jun 2012 08:42:10 +0000 (12:42 +0400)]
fs/namei.c: don't pass namedata to lookup_dcache()
just the flags...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 20:10:59 +0000 (16:10 -0400)]
fs/namei.c: don't pass nameidata to d_revalidate()
since the method wrapped by it doesn't need that anymore...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 20:03:43 +0000 (16:03 -0400)]
stop passing nameidata * to ->d_revalidate()
Just the lookup flags. Die, bastard, die...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 19:36:40 +0000 (15:36 -0400)]
fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 19:33:51 +0000 (15:33 -0400)]
nfs_lookup_verify_inode() - nd is *always* non-NULL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 19:18:15 +0000 (15:18 -0400)]
switch nfs_lookup_check_intent() away from nameidata
just pass the flags
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 18:32:45 +0000 (14:32 -0400)]
do_dentry_open(): take initialization of file->f_path to caller
... and get rid of a couple of arguments and a pointless reassignment
in finish_open() case.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 18:24:38 +0000 (14:24 -0400)]
fold __dentry_open() into its sole caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 18:22:04 +0000 (14:22 -0400)]
switch do_dentry_open() to returning int
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 10:48:09 +0000 (06:48 -0400)]
make finish_no_open() return int
namely, 1 ;-) That's what we want to return from ->atomic_open()
instances after finish_no_open().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 22 Jun 2012 08:41:10 +0000 (12:41 +0400)]
fs/namei.c: get do_last() and friends return int
Same conventions as for ->atomic_open(). Trimmed the
forest of labels a bit, while we are at it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 22 Jun 2012 08:40:19 +0000 (12:40 +0400)]
kill struct opendata
Just pass struct file *. Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int. Next: saner prototypes for parts in
namei.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 09:55:37 +0000 (05:55 -0400)]
kill opendata->{mnt,dentry}
->filp->f_path is there for purpose...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 22 Jun 2012 08:39:14 +0000 (12:39 +0400)]
make ->atomic_open() return int
Change of calling conventions:
old new
NULL 1
file 0
ERR_PTR(-ve) -ve
Caller *knows* that struct file *; no need to return it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 09:04:43 +0000 (05:04 -0400)]
don't modify od->filp at all
make put_filp() conditional on flag set by finish_open()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 09:01:45 +0000 (05:01 -0400)]
->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.
[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:32 +0000 (15:10 +0200)]
vfs: move O_DIRECT check to common code
Perform open_check_o_direct() in a common place in do_last after opening the
file.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:31 +0000 (15:10 +0200)]
vfs: do_last(): clean up retry
Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:30 +0000 (15:10 +0200)]
vfs: do_last(): clean up bool
Consistently use bool for boolean values in do_last().
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:29 +0000 (15:10 +0200)]
vfs: do_last(): clean up labels
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:28 +0000 (15:10 +0200)]
vfs: do_last(): clean up error handling
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:27 +0000 (15:10 +0200)]
vfs: remove open intents from nameidata
All users of open intents have been converted to use ->atomic_{open,create}.
This patch gets rid of nd->intent.open and related infrastructure.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:26 +0000 (15:10 +0200)]
9p: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create. No functionality is changed.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:25 +0000 (15:10 +0200)]
ceph: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:24 +0000 (15:10 +0200)]
ceph: remove unused arg from ceph_lookup_open()
What was the purpose of this?
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:23 +0000 (15:10 +0200)]
cifs: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steve French <sfrench@samba.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:22 +0000 (15:10 +0200)]
fuse: implement i_op->atomic_open()
Add an ->atomic_open implementation which replaces the atomic open+create
operation implemented via ->create. No functionality is changed.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:21 +0000 (15:10 +0200)]
nfs: don't use intents for checking atomic open
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.
It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail. This makes little sense now that this
function isn't used for determining whether to actually open the file or not.
The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:20 +0000 (15:10 +0200)]
nfs: don't use nd->intent.open.flags
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:19 +0000 (15:10 +0200)]
nfs: clean up ->create in nfs_rpc_ops
Don't pass nfs_open_context() to ->create(). Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents. That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.
Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain. So allocate a context here.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:18 +0000 (15:10 +0200)]
nfs: implement i_op->atomic_open()
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:17 +0000 (15:10 +0200)]
vfs: add i_op->atomic_open()
Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation. If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.
i_op->atomic_open() is only called if the last component is negative or needs
lookup. Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open(). If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.
For now leave the old way of using open intents in lookup and revalidate in
place. This will be removed once all the users are converted.
David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:16 +0000 (15:10 +0200)]
vfs: lookup_open(): expand lookup_hash()
Copy __lookup_hash() into lookup_open(). The next patch will insert the atomic
open call just before the real lookup.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:15 +0000 (15:10 +0200)]
vfs: add lookup_open()
Split out lookup + maybe create from do_last(). This is the part under i_mutex
protection.
The function is called lookup_open() and returns a filp even though the open
part is not used yet.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:14 +0000 (15:10 +0200)]
vfs: do_last(): common slow lookup
Make the slow lookup part of O_CREAT and non-O_CREAT opens common.
This allows atomic_open to be hooked into the slow lookup part.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:13 +0000 (15:10 +0200)]
vfs: do_last(): separate O_CREAT specific code
Check O_CREAT on the slow lookup paths where necessary. This allows the rest to
be shared with plain open.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Tue, 5 Jun 2012 13:10:12 +0000 (15:10 +0200)]
vfs: do_last(): inline lookup_slow()
Copy lookup_slow() into do_last().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 10 Jun 2012 08:15:17 +0000 (04:15 -0400)]
namei.c: let follow_link() do put_link() on failure
no need for kludgy "set cookie to ERR_PTR(...) because we failed
before we did actual ->follow_link() and want to suppress put_link()",
no pointless check in put_link() itself.
Callers checked if follow_link() has failed anyway; might as well
break out of their loops if that happened, without bothering
to call put_link() first.
[AV: folded fixes from hch]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 23:52:19 +0000 (19:52 -0400)]
coda: use list_for_each_entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 17:51:19 +0000 (13:51 -0400)]
vfs: switch i_dentry/d_alias to hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 17:19:12 +0000 (13:19 -0400)]
ext4: get rid of open-coded d_find_any_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 17:09:15 +0000 (13:09 -0400)]
ocfs2: use list_for_each_entry in ocfs2_find_local_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 17:06:09 +0000 (13:06 -0400)]
affs: unobfuscate affs_fix_dcache()
and add a comment on what it's doing
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 17:03:04 +0000 (13:03 -0400)]
affs: get rid of open-coded list_for_each_entry()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 15:55:20 +0000 (11:55 -0400)]
vfs: update documentation on ->i_dentry handling
we used to need to clean it in RCU callback freeing an inode;
in 3.2 that requirement went away. Unfortunately, it hadn't
been reflected in Documentation/filesystems/porting.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 15:51:12 +0000 (11:51 -0400)]
adfs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 15:50:36 +0000 (11:50 -0400)]
cifs: don't bother with ->i_dentry in ->destroy_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 15:49:04 +0000 (11:49 -0400)]
qnx6: don't bother with ->i_dentry in inode-freeing callback
we'll initialize it in inode_init_always() when we allocate that
object again.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 05:16:59 +0000 (01:16 -0400)]
get rid of magic in proc_namespace.c
don't rely on proc_mounts->m being the first field; container_of()
is there for purpose. No need to bother with ->private, while
we are at it - the same container_of will do nicely.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2012 04:59:08 +0000 (00:59 -0400)]
get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Julia Lawall [Thu, 7 Jun 2012 22:45:00 +0000 (15:45 -0700)]
fs/direct-io.c: adjust suspicious bit operation
READ is 0, so the result of the bit-and operation is 0. Rewrite with == as
done elsewhere in the same file.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:57 +0000 (18:56 +0300)]
affs: get rid of affs_sync_super
This patch makes affs stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:56 +0000 (18:56 +0300)]
affs: introduce VFS superblock object back-reference
Add an 'sb' VFS superblock back-reference to the 'struct affs_sb_info' data
structure - we will need to find the VFS superblock from a 'struct
affs_sb_info' object in the next patch, so this change is jut a preparation.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:55 +0000 (18:56 +0300)]
affs: stop using lock_super
The VFS's 'lock_super()' and 'unlock_super()' calls are deprecated and unwanted
and just wait for a brave knight who'd kill them. This patch makes AFFS stop
using them and use the buffer-head's own lock instead.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:54 +0000 (18:56 +0300)]
affs: re-structure superblock locking a bit
AFFS wants to serialize the superblock (the root block in AFFS terms) updates
and uses 'lock_super()/unlock_super()' for these purposes. This patch pushes the
locking down to the 'affs_commit_super()' from the callers.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:53 +0000 (18:56 +0300)]
affs: remove useless superblock writeout on remount
We do not need to write out the superblock from '->remount_fs()' because
VFS has already called '->sync_fs()' by this time and the superblock has
already been written out. Thus, remove the 'affs_write_super()'
infocation from 'affs_remount()'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:52 +0000 (18:56 +0300)]
affs: remove useless superblock writeout on unmount
We do not need to write out the superblock from '->put_super()' because VFS has
already called '->sync_fs()' by this time and the superblock has already been
written out. Thus, remove the 'affs_commit_super()' infocation from
'affs_put_super()'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Artem Bityutskiy [Wed, 6 Jun 2012 15:56:51 +0000 (18:56 +0300)]
affs: stop setting bm_flags
AFFS stores values '1' and '2' in 'bm_flags', and I fail to see any logic when
it prefers one or another. AFFS writes '1' only from '->put_super()', while
'->sync_fs()' and '->write_super()' store value '2'. So on the first glance,
it looks like we want to have '1' if we unmount. However, this does not really
happen in these cases:
1. superblock is written via 'write_super()' then we unmount;
2. we re-mount R/O, then unmount.
which are quite typical.
I could not find good documentation describing this field, except of one random
piece of documentation in the internet which says that -1 means that the root
block is valid, which is not consistent with what we have in the Linux AFFS
driver.
Jan Kara commented on this: "I have some vague recollection that on Amiga
boolean was usually encoded as: 0 == false, ~0 == -1 == true. But it has been
ages..."
Thus, my conclusion is that value of '1' is as good as value of '2' and we can
just always use '2'. An Jan Kara suggested to go further: "generally bm_flags
handling looks strange. If they are 0, we mount fs read only and thus cannot
change them. If they are != 0, we write 2 there. So IMHO if you just removed
bm_flags setting, nothing will really happen."
So this patch removes the bm_flags setting completely. This makes the "clean"
argument of the 'affs_commit_super()' function unneeded, so it is also removed.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 14 Jul 2012 00:59:33 +0000 (17:59 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull use-after-free RAID1 bugfix from NeilBrown.
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: fix use-after-free bug in RAID1 data-check code.
Linus Torvalds [Fri, 13 Jul 2012 22:31:21 +0000 (15:31 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull the leap second fixes from Thomas Gleixner:
"It's a rather large series, but well discussed, refined and reviewed.
It got a massive testing by John, Prarit and tip.
In theory we could split it into two parts. The first two patches
f55a6faa3843: hrtimer: Provide clock_was_set_delayed()
4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue
are merely preventing the stuff loops forever issues, which people
have observed.
But there is no point in delaying the other 4 commits which achieve
full correctness into 3.6 as they are tagged for stable anyway. And I
rather prefer to have the full fixes merged in bulk than a "prevent
the observable wreckage and deal with the hidden fallout later"
approach."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Update hrtimer base offsets each hrtimer_interrupt
timekeeping: Provide hrtimer update function
hrtimers: Move lock held region in hrtimer_interrupt()
timekeeping: Maintain ktime_t based offsets for hrtimers
timekeeping: Fix leapsecond triggered load spike issue
hrtimer: Provide clock_was_set_delayed()
Will Drewry [Fri, 13 Jul 2012 17:06:35 +0000 (12:06 -0500)]
x86/vsyscall: allow seccomp filter in vsyscall=emulate
If a seccomp filter program is installed, older static binaries and
distributions with older libc implementations (glibc 2.13 and earlier)
that rely on vsyscall use will be terminated regardless of the filter
program policy when executing time, gettimeofday, or getcpu. This is
only the case when vsyscall emulation is in use (vsyscall=emulate is the
default).
This patch emulates system call entry inside a vsyscall=emulate by
populating regs->ax and regs->orig_ax with the system call number prior
to calling into seccomp such that all seccomp-dependencies function
normally. Additionally, system call return behavior is emulated in line
with other vsyscall entrypoints for the trace/trap cases.
[ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ]
Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 13 Jul 2012 18:01:03 +0000 (11:01 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Please pull one hwmon subsystem fix from Jean Delvare.
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (it87) Preserve configuration register bits on init
Linus Torvalds [Fri, 13 Jul 2012 17:58:45 +0000 (10:58 -0700)]
Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFSv4 mount regression
- Fix O_DIRECT list manipulation snafus
* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix an NFSv4 mount regression
NFS: Fix list manipulation snafus in fs/nfs/direct.c
Dave Jones [Fri, 13 Jul 2012 17:35:36 +0000 (13:35 -0400)]
Remove easily user-triggerable BUG from generic_setlease
This can be trivially triggered from userspace by passing in something unexpected.
kernel BUG at fs/locks.c:1468!
invalid opcode: 0000 [#1] SMP
RIP: 0010:generic_setlease+0xc2/0x100
Call Trace:
__vfs_setlease+0x35/0x40
fcntl_setlease+0x76/0x150
sys_fcntl+0x1c6/0x810
system_call_fastpath+0x1a/0x1f
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 13 Jul 2012 17:33:18 +0000 (10:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"The changes are limited to adding new VID/PID combinations to drivers
to enable support for new versions of hardware, most notably hardware
found in new MacBook Pro Retina boxes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add Andamiro Pump It Up pad
Input: xpad - add signature for Razer Onza Tournament Edition
Input: xpad - handle all variations of Mad Catz Beat Pad
Input: bcm5974 - Add support for 2012 MacBook Pro Retina
HID: add support for 2012 MacBook Pro Retina
Linus Torvalds [Fri, 13 Jul 2012 17:29:41 +0000 (10:29 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- Some regression fixes at the audio part for devices with
cx23885/cx25840
- A DMA corruption fix at cx231xx
- two fixes at the winbond IR driver
- Several fixes for the EXYNOS media driver (s5p)
- two fixes at the OMAP3 preview driver
- one fix at the dvb core failure path
- an include missing (slab.h) at smiapp-core causing compilation
breakage
- em28xx was not loading the IR driver driver anymore.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (31 commits)
[media] Revert "[media] V4L: JPEG class documentation corrections"
[media] s5p-fimc: Add missing FIMC-LITE file operations locking
[media] omap3isp: preview: Fix contrast and brightness handling
[media] omap3isp: preview: Fix output size computation depending on input format
[media] winbond-cir: Initialise timeout, driver_type and allowed_protos
[media] winbond-cir: Fix txandrx module info
[media] cx23885: Silence unknown command warnings
[media] cx23885: add support for HVR-1255 analog (cx23888 variant)
[media] cx23885: make analog support work for HVR_1250 (cx23885 variant)
[media] cx25840: fix vsrc/hsrc usage on cx23888 designs
[media] cx25840: fix regression in HVR-1800 analog audio
[media] cx25840: fix regression in analog support hue/saturation controls
[media] cx25840: fix regression in HVR-1800 analog support
[media] s5p-mfc: Fixed setup of custom controls in decoder and encoder
[media] cx231xx: don't DMA to random addresses
[media] em28xx: fix em28xx-rc load
[media] dvb-core: Release semaphore on error path dvb_register_device()
[media] s5p-fimc: Stop media entity pipeline if fimc_pipeline_validate fails
[media] s5p-fimc: Fix compiler warning in fimc-lite.c
[media] s5p-fimc: media_entity_pipeline_start() may fail
...
Linus Torvalds [Fri, 13 Jul 2012 17:27:25 +0000 (10:27 -0700)]
Merge tag 'mmc-fixes-for-3.5-rc7' of git://git./linux/kernel/git/cjb/mmc
Pull MMC fixes from Chris Ball:
- Revert a patch that made failing to select power class fatal;
it turns out that it fails non-fatally on Tegra boards.
Regression against 3.5-rc1.
- Add the IRQF_ONESHOT flag to the cd-gpio driver, which turned
into a regression in 3.5-rc1 when IRQF_ONESHOT became required
for threaded IRQs with no handler.
* tag 'mmc-fixes-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: cd-gpio: pass IRQF_ONESHOT to request_threaded_irq()
mmc: core: Revert "skip card initialization if power class selection fails"
Linus Torvalds [Fri, 13 Jul 2012 16:56:26 +0000 (09:56 -0700)]
Merge tag 'for-linus-
20120712' of git://git.infradead.org/linux-mtd
Pull late MTD fixes from David Woodhouse:
- fix 'sparse warning fix' regression which totally breaks MXC NAND
- fix GPMI NAND regression when used with UBI
- update/correct sysfs documentation for new 'bitflip_threshold' field
- fix nandsim build failure
* tag 'for-linus-
20120712' of git://git.infradead.org/linux-mtd:
mtd: nandsim: don't open code a do_div helper
mtd: ABI documentation: clarification of bitflip_threshold
mtd: gpmi-nand: fix read page when reading to vmalloced area
mtd: mxc_nand: use 32bit copy functions
Linus Torvalds [Fri, 13 Jul 2012 16:54:26 +0000 (09:54 -0700)]
Merge tag 'mfd-for-linus-3.5' of git://git./linux/kernel/git/sameo/mfd-2.6
Pull MFD Fixes from Samuel Ortiz:
- Three Palmas fixes, One of them being a build error fix.
- Two mc13xx fixes. One for fixing an SPI regmap configuration and
another one for working around an i.Mx hardware bug.
- One omap-usb regression fix.
- One twl6040 build breakage fix.
- One file deletion (ab5500-core.h) that was overlooked during the last
merge window.
* tag 'mfd-for-linus-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Add missing hunk to change palmas irq to clear on read
mfd: Fix palmas regulator pdata missing
mfd: USB: Fix the omap-usb EHCI ULPI PHY reset fix issues.
mfd: Update twl6040 Kconfig to avoid build breakage
mfd: Delete ab5500-core.h
mfd: mc13xxx workaround SPI hardware bug on i.Mx
mfd: Fix mc13xxx SPI regmap
mfd: Add terminating entry for i2c_device_id palmas table
Linus Torvalds [Fri, 13 Jul 2012 16:04:00 +0000 (09:04 -0700)]
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Pull SuperH fixes from Paul Mundt.
* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
SH: Convert out[bwl] macros to inline functions
sh: Fix up se7721 GPIOLIB=y build warnings.
Linus Torvalds [Fri, 13 Jul 2012 15:42:32 +0000 (08:42 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull a couple of KVM fixes from Avi Kivity:
"One is an adjustment for an irq layer change that affected device
assignment, the other a one-liner ppc fix."
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
powerpc/kvm: Fix "PR" KVM implementation of H_CEDE
KVM: Fix device assignment threaded irq handler
Jeff Moyer [Thu, 12 Jul 2012 13:43:14 +0000 (09:43 -0400)]
block: fix infinite loop in __getblk_slow
Commit
080399aaaf35 ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.
The problem was initially reported by Torsten Hilbrich here:
https://lkml.org/lkml/2012/6/18/54
and also reported independently here:
http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
and then Richard W.M. Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue. This patch has been
confirmed to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=835019
The main problem is here, in __getblk_slow:
for (;;) {
struct buffer_head * bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page. I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding. However, we also continue to loop for other cases, like the
block lying beond the end of the disk. So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).
The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org> # 3.0+
[ Jens is on vacation, taking this directly - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yuri Khan [Thu, 12 Jul 2012 05:12:31 +0000 (22:12 -0700)]
Input: xpad - add Andamiro Pump It Up pad
I couldn't find the vendor ID in any of the online databases, but this
mat has a Pump It Up logo on the top side of the controller compartment,
and a disclaimer stating that Andamiro will not be liable on the bottom.
Signed-off-by: Yuri Khan <yurivkhan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jean Delvare [Thu, 12 Jul 2012 20:47:37 +0000 (22:47 +0200)]
hwmon: (it87) Preserve configuration register bits on init
We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.
Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Corey Minyard [Mon, 9 Jul 2012 20:35:20 +0000 (15:35 -0500)]
SH: Convert out[bwl] macros to inline functions
The macros just called BUG(), but that results in unused variable
warnings all over the place, like in the IPMI driver. The build
regression emails were annoying me, so here's the fix. I have
not even compile tested this, but it's rather obvious.
[ port type mangled to unsigned long ]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Linus Torvalds [Wed, 11 Jul 2012 23:17:14 +0000 (16:17 -0700)]
Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6
Pull fbdev fixes from Florian Tobias Schandinat:
"Two fixes for OMAPDSS by Tomi Valkeinen:
- one to avoid warnings when runtime PM is not enabled
- one workaround to dependancy issues during suspend/resume"
* tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6:
OMAPDSS: fix warnings if CONFIG_PM_RUNTIME=n
OMAPDSS: Use PM notifiers for system suspend
Linus Torvalds [Wed, 11 Jul 2012 23:06:54 +0000 (16:06 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge random patches from Andrew Morton.
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits)
memblock: free allocated memblock_reserved_regions later
mm: sparse: fix usemap allocation above node descriptor section
mm: sparse: fix section usemap placement calculation
xtensa: fix incorrect memset
shmem: cleanup shmem_add_to_page_cache
shmem: fix negative rss in memcg memory.stat
tmpfs: revert SEEK_DATA and SEEK_HOLE
drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT
fat: fix non-atomic NFS i_pos read
MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section
sgi-xp: nested calls to spin_lock_irqsave()
fs: ramfs: file-nommu: add SetPageUptodate()
drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning
mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails
h8300/uaccess: add mising __clear_user()
h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()
h8300/time: add missing #include <asm/irq_regs.h>
h8300/signal: fix typo "statis"
h8300/pgtable: add missing #include <asm-generic/pgtable.h>
drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled
...
Yinghai Lu [Wed, 11 Jul 2012 21:02:56 +0000 (14:02 -0700)]
memblock: free allocated memblock_reserved_regions later
memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.
Also tj said there is another bug which could be related to this.
| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.
in that case, when DEBUG_PAGEALLOC, will get panic:
memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
BUG: unable to handle kernel paging request at
ffff88102febd948
IP: [<
ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
PGD
4826063 PUD
cf67a067 PMD
cf7fa067 PTE
800000102febd160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-
20120614-sasha #447
RIP: 0010:[<
ffffffff836a5774>] [<
ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
See the discussion at https://lkml.org/lkml/2012/6/13/469
So try to allocate with PAGE_SIZE alignment and free it later.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 11 Jul 2012 21:02:53 +0000 (14:02 -0700)]
mm: sparse: fix usemap allocation above node descriptor section
After commit
f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.
The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.
Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from
f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 11 Jul 2012 21:02:51 +0000 (14:02 -0700)]
mm: sparse: fix section usemap placement calculation
Commit
238305bb4d41 ("mm: remove sparsemem allocation details from the
bootmem allocator") introduced a bug in the allocation goal calculation
that put section usemaps not in the same section as the node
descriptors, creating unnecessary hotplug dependencies between them:
node 0 must be removed before remove section 16399
node 1 must be removed before remove section 16399
node 2 must be removed before remove section 16399
node 3 must be removed before remove section 16399
node 4 must be removed before remove section 16399
node 5 must be removed before remove section 16399
node 6 must be removed before remove section 16399
The reason is that it applies PAGE_SECTION_MASK to the physical address
of the node descriptor when finding a suitable place to put the usemap,
when this mask is actually intended to be used with PFNs. Because the
PFN mask is wider, the target address will point beyond the wanted
section holding the node descriptor and the node must be offlined before
the section holding the usemap can go.
Fix this by extending the mask to address width before use.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Wed, 11 Jul 2012 21:02:50 +0000 (14:02 -0700)]
xtensa: fix incorrect memset
Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=43871
Reported-by: <dcb314@hotmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:48 +0000 (14:02 -0700)]
shmem: cleanup shmem_add_to_page_cache
shmem_add_to_page_cache() has three callsites, but only one of them wants
the radix_tree_preload() (an exceptional entry guarantees that the radix
tree node is present in the other cases), and only that site can achieve
mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
other cases). We did it this way originally to reflect
add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
preloading and mem_cgroup uncharging to that one caller.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:47 +0000 (14:02 -0700)]
shmem: fix negative rss in memcg memory.stat
When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.
But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.
It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).
The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.
By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously. And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.
It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.
Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.
We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().
[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem. Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked. So it's not an error, and these
residual pages are easily freed once pressure demands.]
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:45 +0000 (14:02 -0700)]
tmpfs: revert SEEK_DATA and SEEK_HOLE
Revert
4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE"). I believe
it's correct, and it's been nice to have from rc1 to rc6; but as the
original commit said:
I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs. This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?
Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
the dumb generic support for v3.5. We can always reinstate it later if
useful, and anyone needing it in a hurry can just get it out of git.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Hilman [Wed, 11 Jul 2012 21:02:44 +0000 (14:02 -0700)]
drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT
Requesting a threaded interrupt without a primary handler and without
IRQF_ONESHOT is dangerous, and after commit
1c6c6952 ("genirq: Reject
bogus threaded irq requests"), these requests are rejected. This causes
->probe() to fail, and the RTC driver not to be availble.
To fix, add IRQF_ONESHOT to the IRQ flags.
Tested on OMAP3730/OveroSTORM and OMAP4430/Panda board using rtcwake to
wake from system suspend multiple times.
Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven J. Magnani [Wed, 11 Jul 2012 21:02:42 +0000 (14:02 -0700)]
fat: fix non-atomic NFS i_pos read
fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic. Make it use the same accessor as the rest of the
FAT code.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Hilman [Wed, 11 Jul 2012 21:02:40 +0000 (14:02 -0700)]
MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section
Add the OMAP CPUFreq driver to the list of files in the OMAP Power
Management section.
I've already been maintaining this driver, this just makes it official.
Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Wed, 11 Jul 2012 21:02:38 +0000 (14:02 -0700)]
sgi-xp: nested calls to spin_lock_irqsave()
The code here has a nested spin_lock_irqsave(). It's not needed since
IRQs are already disabled and it causes a problem because it means that
IRQs won't be enabled again at the end. The second call to
spin_lock_irqsave() will overwrite the value of irq_flags and we can't
restore the proper settings.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Wed, 11 Jul 2012 21:02:35 +0000 (14:02 -0700)]
fs: ramfs: file-nommu: add SetPageUptodate()
There is a bug in the below scenario for !CONFIG_MMU:
1. create a new file
2. mmap the file and write to it
3. read the file can't get the correct value
Because
sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()
which causes the page to be zeroed.
Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benoît Thébaudeau [Wed, 11 Jul 2012 21:02:32 +0000 (14:02 -0700)]
drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning
Fixes
WARNING: at irq/handle.c:146 handle_irq_event_percpu+0x19c/0x1b8()
irq 25 handler mxc_rtc_interrupt+0x0/0xac enabled interrupts
Modules linked in:
(unwind_backtrace+0x0/0xf0) from (warn_slowpath_common+0x4c/0x64)
(warn_slowpath_common+0x4c/0x64) from (warn_slowpath_fmt+0x30/0x40)
(warn_slowpath_fmt+0x30/0x40) from (handle_irq_event_percpu+0x19c/0x1b8)
(handle_irq_event_percpu+0x19c/0x1b8) from (handle_irq_event+0x28/0x38)
(handle_irq_event+0x28/0x38) from (handle_level_irq+0x80/0xc4)
(handle_level_irq+0x80/0xc4) from (generic_handle_irq+0x24/0x38)
(generic_handle_irq+0x24/0x38) from (handle_IRQ+0x30/0x84)
(handle_IRQ+0x30/0x84) from (avic_handle_irq+0x2c/0x4c)
(avic_handle_irq+0x2c/0x4c) from (__irq_svc+0x40/0x60)
Exception stack(0xc050bf60 to 0xc050bfa8)
bf60:
00000001 00000000 003c4208 c0018e20 c050a000 c050a000 c054a4c8 c050a000
bf80:
c05157a8 4117b363 80503bb4 00000000 01000000 c050bfa8 c0018e2c c000e808
bfa0:
60000013 ffffffff
(__irq_svc+0x40/0x60) from (default_idle+0x1c/0x30)
(default_idle+0x1c/0x30) from (cpu_idle+0x68/0xa8)
(cpu_idle+0x68/0xa8) from (start_kernel+0x22c/0x26c)
Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sascha Hauer <kernel@pengutronix.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wen Congyang [Wed, 11 Jul 2012 21:02:31 +0000 (14:02 -0700)]
mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails
We should goto error to release memory resource if hotadd_new_pgdat()
failed.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki ISIMATU <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 11 Jul 2012 21:02:28 +0000 (14:02 -0700)]
h8300/uaccess: add mising __clear_user()
Fix the build error:
include/linux/regset.h: In function 'user_regset_copyout_zero':
include/linux/regset.h:289:3: error: implicit declaration of function '__clear_user' [-Werror=implicit-function-declaration]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 11 Jul 2012 21:02:26 +0000 (14:02 -0700)]
h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()
__gu_val is const if the passed ptr is const, giving:
include/linux/pagemap.h: In function 'fault_in_pages_readable':
include/linux/pagemap.h:442:2: error: assignment of read-only variable '__gu_val'
include/linux/pagemap.h:448:4: error: assignment of read-only variable '__gu_val'
include/linux/pagemap.h: In function 'fault_in_multipages_readable':
include/linux/pagemap.h:499:3: error: assignment of read-only variable '__gu_val'
include/linux/pagemap.h:508:3: error: assignment of read-only variable '__gu_val'
make[4]: *** [init/main.o] Error 1
As we don't care about the actual value of __gu_val in the unhandled
case (it will cause a link error anyway), just remove the assignment.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 11 Jul 2012 21:02:23 +0000 (14:02 -0700)]
h8300/time: add missing #include <asm/irq_regs.h>
Fix the build error:
arch/h8300/kernel/time.c: In function 'h8300_timer_tick':
arch/h8300/kernel/time.c:39:2: error: implicit declaration of function 'get_irq_regs' [-Werror=implicit-function-declaration]
arch/h8300/kernel/time.c:39:42: error: invalid type argument of '->' (have 'int')
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 11 Jul 2012 21:02:22 +0000 (14:02 -0700)]
h8300/signal: fix typo "statis"
The keyword is "static", not "statis":
arch/h8300/kernel/signal.c:455:8: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
arch/h8300/kernel/signal.c: In function 'do_notify_resume':
arch/h8300/kernel/signal.c:511:3: error: implicit declaration of function 'do_signal' [-Werror=implicit-function-declaration]
arch/h8300/kernel/signal.c: At top level:
arch/h8300/kernel/signal.c:414:1: warning: 'handle_signal' defined but not used [-Wunused-function]
Introduced in commit
7ae4e32a6514 ("h8300: switch to saved_sigmask-based
sigsuspend/rt_sigsuspend")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Wed, 11 Jul 2012 21:02:19 +0000 (14:02 -0700)]
h8300/pgtable: add missing #include <asm-generic/pgtable.h>
Fix the h8300 build error:
kernel/sched/core.c: In function 'context_switch':
kernel/sched/core.c:2061:2: error: implicit declaration of function 'arch_start_context_switch' [-Werror=implicit-function-declaration]
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>