Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

f2fs: recover when journal contains deleted files

When recovering a journal file with fsync data for files that have
been deleted, don't bail out on recovery.

Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: continue to mount after failing recovery

When unable to roll forward the journal, we shouldn't bail out and
not mount, we should continue to attempt the mount. Bad recovery data
is likely unrecoverable at this point, and requiring the user to try
to mount again doesn't solve any issues.

Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid deadlock during evict after f2fs_gc

o Deadlock case #1

Thread 1:
- writeback_sb_inodes
- do_writepages
  - f2fs_write_data_pages
   - write_cache_pages
    - f2fs_write_data_page
     - f2fs_balance_fs
      - wait mutex_lock(gc_mutex)

Thread 2:
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
  - f2fs_iget
   - wait iget_locked(inode->i_lock)

Thread 3:
- do_unlinkat
- iput
  - lock(inode->i_lock)
   - evict
    - inode_wait_for_writeback

o Deadlock case #2

Thread 1:
- __writeback_single_inode
: set I_SYNC
  - do_writepages
   - f2fs_write_data_page
    - f2fs_balance_fs
     - f2fs_gc
      - iput
       - evict
        - inode_wait_for_writeback(I_SYNC)

In order to avoid this, even though iput is called with the zero-reference
count, we need to stop the eviction procedure if the inode is on writeback.
So this patch links f2fs_drop_inode which checks the I_SYNC flag.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: modify the number of issued pages to merge IOs

When testing f2fs on an SSD, I found some 128 page IOs followed by 1 page IO
were issued by f2fs_write_node_pages.
This means that there were some mishandling flows which degrades performance.

Previous f2fs_write_node_pages determines the number of pages to be written,
nr_to_write, as follows.

1. The bio_get_nr_vecs returns 129 pages.
2. The bio_alloc makes a room for 128 pages.
3. The initial 128 pages go into one bio.
4. The existing bio is submitted, and a new bio is prepared for the last 1 page.
5. Finally, sync_node_pages submits the last 1 page bio.

The problem is from the use of bio_get_nr_vecs, so this patch replace it
with max_hw_blocks using queue_max_sectors.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix inconsistent using of NM_WOUT_THRESHOLD

try_to_free_nats() is usually called with parameter nr_shrink as
"nm_i->nat_cnt - NM_WOUT_THRESHOLD"
by flush_nat_entries() during checkpointing process.

However, this is inconsistent with the actual threshold check as
"if (nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD)"
, which will ignore the free_nats requests when
NM_WOUT_THRESHOLD < nm_i->nat_cnt < 2 * NM_WOUT_THRESHOLD

So fix the threshold check condition.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check truncation of mapping after lock_page

We call lock_page when we need to update a page after readpage.
Between grab and lock page, the page can be truncated by other thread.
So, we should check the page after lock_page whether it was truncated or not.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: enhance alloc_nid and build_free_nids flows

In order to avoid build_free_nid lock contention, let's change the order of
function calls as follows.

At first, check whether there is enough free nids.
- If available, just get a free nid with spin_lock without any overhead.
- Otherwise, conduct build_free_nids.
: scan nat pages, journal nat entries, and nat cache entries.

We should consider carefullly not to serve free nids intermediately made by
build_free_nids.
We can get stable free nids only after build_free_nids is done.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add a tracepoint on f2fs_new_inode

This can help when debugging the free nid allocation flows.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check nid == 0 in add_free_nid

It is more obvious that add_free_nid checks whether the free nid is zero or not.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add REQ_META about metadata requests for submit

Adding REQ_META for all the metadata requests can help in improving the
FS performance, if the underlying device supports TAGGING.
So, when considering the submit_bio path for all the f2fs requests. We can
add REQ_META for all the META requests.
As a precursor to this change we considered the commit
4265900e0be653f5b78baf2816857ef57cf1332f 'mmc: MMC-4.5 Data Tag Support'

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: give a chance to merge IOs by IO scheduler

Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.

...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...

However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.

...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...

Note that this issue should be addressed in checkpoint, and some readahead
flows too.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid frequent background GC

If there is no victim segments selected by background GC, let's wait
a little bit longer time to collect dirty segments.
By default, let's give 5 minutes.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints to debug checkpoint request

Add tracepoints to debug checkpoint request.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: change expressions]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for write page operations

Add tracepoints to debug the various page write operation
like data pages, meta pages.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: remove unnecessary tracepoints]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints to debug the block allocation

Add tracepoints to debug the block allocation & fallocate.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: enhance information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for GC threads

Add tracepoints for tracing the garbage collector
threads in f2fs with status of collection & type.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: modify slightly to show information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoint for tracing the page i/o

Add tracepoints for page i/o operations and block allocation
tracing during page read operation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for truncate operation

add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.

Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add tracepoints for sync & inode operations

Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: make is_multimedia_file code align with its name

The code conditions put inside the function is_multimedia_file are
reverse to the name i.e, we need to negate the return to actually
check if the file is a multimedia file. So, change the code and usage
path to align both the name and comparision conditions.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix error return code in f2fs_fill_super()

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Introduce by commit c0d39e(f2fs: fix return values from validate superblock)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix typo mistakes

Fix typo mistakes.
1. I think that it should be 'L' instead of 'V'.
2. and try to fix 'Front' instead of 'Frone'

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: write checkpoint before starting FG_GC

In order to be aware of prefree and free sections during FG_GC, let's start with
write_checkpoint().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the logic of IS_DNODE()

If (ofs % (NIDS_PER_BLOCK + 1) == 0), the node is an indirect node block.

Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce a new global lock scheme

In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.

3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.

5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()

Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: move f2fs_balance_fs from truncate to punch_hole

Move the f2fs_balance_fs out of the truncate_hole function and only
perform that in punch_hole use case.  The commit:

  ed60b1644e7f7e5dd67d21caf7e4425dff05dad0

intended to do this but moved it into truncate_hole to cover more
cases.  However, a deadlock scenario is possible when deleting an inode
entry under specific conditions:

f2fs_delete_entry()
     mutex_lock_op(sbi, DENTRY_OPS);
     truncate_hole()
         f2fs_balance_fs()
             mutex_lock(&sbi->gc_mutex);
             f2fs_gc()
                 write_checkpoint()
                     block_operations()
                         mutex_lock_op(sbi, DENTRY_OPS);

Lets move it into the punch_hole case to cover the original intent of
avoiding it during fallocate's expand_inode_data case.

Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reduce redundant spin_lock operations

This patch reduces redundant spin_lock operations in alloc_nid_failed().
The alloc_nid_failed() does not need to delete entry and add one again
by triggering spin_lock and spin_unlock redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: update f2fs.txt related with discard at mkfs

o mkfs.f2fs supports no discard option.
o fixed volume label size in 512 bytes.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: add NULL pointer check

Commit - fa9150a84c - replaces a call to generic_writepages() in
f2fs_write_data_pages() with write_cache_pages(), with a function pointer
argument pointing to routine: __f2fs_writepage.

  -> https://git.kernel.org/linus/fa9150a84ca333f68127097c4fa1eda4b3913a22

  This patch adds a NULL pointer check in f2fs_write_data_pages() to avoid
  a possible NULL pointer dereference, in case if - mapping->a_ops->writepage -
  is NULL.

Signed-off-by: P J P <ppandit@redhat.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the bitmap consistency of dirty segments

Like below, there are 8 segment bitmaps for SSR victim candidates.

enum dirty_type {
DIRTY_HOT_DATA, /* dirty segments assigned as hot data logs */
DIRTY_WARM_DATA, /* dirty segments assigned as warm data logs */
DIRTY_COLD_DATA, /* dirty segments assigned as cold data logs */
DIRTY_HOT_NODE, /* dirty segments assigned as hot node logs */
DIRTY_WARM_NODE, /* dirty segments assigned as warm node logs */
DIRTY_COLD_NODE, /* dirty segments assigned as cold node logs */
DIRTY, /* to count # of dirty segments */
PRE, /* to count # of entirely obsolete segments */
NR_DIRTY_TYPE
};

The upper 6 bitmaps indicates segments dirtied by active log areas respectively.
And, the DIRTY bitmap integrates all the 6 bitmaps.

For example,
o DIRTY_HOT_DATA : 1010000
o DIRTY_WARM_DATA: 0100000
o DIRTY_COLD_DATA: 0001000
o DIRTY_HOT_NODE : 0000010
o DIRTY_WARM_NODE: 0000001
o DIRTY_COLD_NODE: 0000000
In this case,
o DIRTY : 1111011,

which means that we should guarantee the consistency between DIRTY and other
bitmaps concreately.

However, the SSR mode selects victims freely from any log types, which can set
multiple bits across the various bitmap types.

So, this patch eliminates this inconsistency.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid race for summary information

In order to do GC more reliably, I'd like to lock the vicitm summary page
until its GC is completed, and also prevent any checkpoint process.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: allocate remained free segments in the LFS mode

This patch adds a new condition that allocates free segments in the current
active section even if SSR is needed.
Otherwise, f2fs cannot allocate remained free segments in the section since
SSR finds dirty segments only.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check completion of foreground GC

The foreground GCs are triggered under not enough free sections.
So, we should not skip moving valid blocks in the victim segments.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: change GC bitmaps to apply the section granularity

This patch removes a bitmap for victim segments selected by foreground GC, and
modifies the other bitmap for victim segments selected by background GC.

1) foreground GC bitmap
: We don't need to manage this, since we just only one previous victim section
   number instead of the whole victim history.
   The f2fs uses the victim section number in order not to allocate currently
   GC'ed section to current active logs.

2) background GC bitmap
: This bitmap is used to avoid selecting victims repeatedly by background GCs.
   In addition, the victims are able to be selected by foreground GCs, since
   there is no need to read victim blocks during foreground GCs.

   By the fact that the foreground GC reclaims segments in a section unit, it'd
   be better to manage this bitmap based on the section granularity.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: allocate new segment aligned with sections

When allocating a new segment under the LFS mode, we should keep the section
boundary.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: remove redundant lock_page calls

In get_node_page, we do not need to call lock_page all the time.

If the node page is cached as uptodate,

1. grab_cache_page locks the page,
2. read_node_page unlocks the page, and
3. lock_page is called for further process.

Let's avoid this.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce TOTAL_SECS macro

Let's use a macro to get the total number of sections.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: do not use duplicate names in a macro

A macro should not use duplicate parameter names.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: use kmemdup

Use kmemdup instead of kzalloc and memcpy.

Signed-off-by: Alexandru Gheorghiu <gheorghiuandru@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix to give correct parent inode number for roll forward

When we recover fsync'ed data after power-off-recovery, we should guarantee
that any parent inode number should be correct for each direct inode blocks.

So, let's make the following rules.

- The fsync should do checkpoint to all the inodes that were experienced hard
links.

- So, the only normal files can be recovered by roll-forward.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: remain nat cache entries for further free nid allocation

In the checkpoint flow, the f2fs investigates the total nat cache entries.
Previously, if an entry has NULL_ADDR, f2fs drops the entry and adds the
obsolete nid to the free nid list.
However, this free nid will be reused sooner, resulting in its nat entry miss.
In order to avoid this, we don't need to drop the nat cache entry at this moment.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: do not skip writing file meta during fsync

This patch removes data_version check flow during the fsync call.
The original purpose for the use of data_version was to avoid writng inode
pages redundantly by the fsync calls repeatedly.
However, when user can modify file meta and then call fsync, we should not
skip fsync procedure.
So, let's remove this condition check and hope that user triggers in right
manner.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix the recovery flow to handle errors correctly

We should handle errors during the recovery flow correctly.
For example, if we get -ENOMEM, we should report a mount failure instead of
conducting the remained mount procedure.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix typo in comments

Correct spelling typo in comments

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inode

In function check_nid_range, there is no need to trigger BUG_ON and make kernel stop.
Instead it could just check and indicate the inode number to be EINVAL.
Update the return path in do_read_inode to use the return from check_nid_range.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
[Jaegeuk: replace BUG_ON with WARN_ON]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix return values from validate superblock

validate super block is not returning with proper values.
When failure from sb_bread it should reflect there is an EIO otherwise
it should return of EINVAL.
Returning, '1' is not conveying proper message as the return type.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reorganize f2fs_setxattr

make use of F2FS_NAME_LEN for name length checking,
change return conditions at few places, by assigning
storing the errorvalue in 'error' and making a common
exit path.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: notify when discard is not supported

Change f2fs so that a warning is emitted when an attempt is made to
mount a filesystem with the unsupported discard option.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix to call WRITE_FLUSH at the end of fsync

The fsync call should be ended after flushing the in-device caches.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix not to allocate max_nid

The build_free_nid should not add free nids over nm_i->max_nid.
But, there was a hole that invalid free nid was added by the following scenario.

Let's suppose nm_i->max_nid = 150 and the last NAT page has 100 ~ 200 nids.

build_free_nids
  - get_current_nat_page loads the last NAT page
  - scan_nat_page can add 100 ~ 200 nids
    -> Bug here!
So, when scanning an NAT page, we should check each candidate whether it is
over max_nid or not.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix return value of releasepage for node and data

If the return value of releasepage is equal to zero, the page cannot be reclaimed.
Instead, we should return 1 in order to reclaim clean pages.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: scan next nat page to reuse free nids in there

When we build new free nids, let's scan the just next NAT page instead of
skipping a couple of previously scanned pages in order to reuse free nids in
there.
Otherwise, we can use too much wide range of nids even though several nids were
deallocated, and also their node pages can be cached in the node_inode's address
space.
This means that we can retain lots of clean pages in the main memory, which
induces mm's reclaiming overhead.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: should check the node page was truncated first

Currently, f2fs doesn't reclaim any node pages.
However, if we found that a node page was truncated by checking its block
address with zero during f2fs_write_node_page, we should not skip that node
page and return zero to reclaim it.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: reduce unncessary locking pages during read

This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.

[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: avoid extra ++ while returning from get_node_path

In all the breaking conditions in get_node_path, 'n' is used to
track index in offset[] array, but while breaking out also, in all
paths n++ is done.
So, remove the ++ from breaking paths. Also, avoid
reset of 'level=0' in first case.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: align f2fs maximum name length to linux based filesystem

The maximum filename length supported in linux is 255 characters.
So let's follow that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: optimize and change return path in lookup_free_nid_list

Optimize and change return path in lookup_free_nid_list

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: optimize get node page readahead part

We can remove the call to find_get_page to get a page from the cache
and check for up-to-date, instead we can make use of grab_cache_page
part itself to fetch the page from the cache.
So, removing the call and moving the PageUptodate at proper place, also
taken care of moving the lock_page condition in the page_hit part.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: check the level before calling get_nid function

The caller of get_nid should be careful not to put lower value than
NODE_DIR1_BLOCK in case of level is zero.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: introduce readahead mode of node pages

Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
ALLOC_NODE, /* allocate a new node page if needed */
LOOKUP_NODE, /* look up a node without readahead */
LOOKUP_NODE_RA, /*
* look up a node with readahead called
* by get_datablock_ro.
*/
}

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>

f2fs: read with READ_SYNC when getting dnode page

The get_node_page_ra tries to:
1. grab or read a target node page for the given nid,
2. then, call ra_node_page to read other adjacent node pages in advance.

So, when we try to read a target node page by #1, we should submit bio with
READ_SYNC instead of READA.
And, in #2, READA should be used.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>

f2fs: fix to unlock node page when it was truncated

If the node page was truncated, its block address became zero.
This means that we don't need to write the node page, but have to unlock
NODE_WRITE, decrease the number of dirty node pages, and then unlock_page
before returning the f2fs_write_node_page with zero.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

f2fs: fix overflow when calculating utilization on 32-bit

Use div_u64 to fix overflow when calculating utilization.
*long int* is 4-bytes on 32-bit so (user blocks * 100) might be
overflow if disk size is over e.g. 512GB.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Several boot fixes (MacBook, legacy EFI bootloaders), another
  please-don't-brick fix, and some minor stuff."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Do not try to sync identity map for non-mapped pages
  x86, doc: Be explicit about what the x86 struct boot_params requires
  x86: Don't clear efi_info even if the sentinel hits
  x86, mm: Make sure to find a 2M free block for the first mapped area
  x86: Fix 32-bit *_cpu_data initializers
  efivarfs: return accurate error code in efivarfs_fill_super()
  efivars: efivarfs_valid_name() should handle pstore syntax
  efi: be more paranoid about available space when creating variables
  iommu, x86: Add DMA remap fault reason
  x86, smpboot: Remove unused variable

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Misc radeon, nouveau, mgag200 and intel fixes.

  The intel fixes should contain the fix for the touchpad on the
  Chromebook - hey I'm an input maintainer now!"

Hate to pee on your parade, Dave, but I don't think being an input
maintainer is necessarily something to strive for..

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (25 commits)
  drm/tegra: drop "select DRM_HDMI"
  drm: Documentation typo fixes
  drm/mgag200: Bug fix: Renesas board now selects native resolution.
  drm/mgag200: Reject modes that are too big for VRAM
  drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used
  drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
  drm/radeon: skip MC reset as it's probably not hung
  drm/radeon: add primary dac adj quirk for R200 board
  drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled
  drm/i915: Turn off hsync and vsync on ADPA when disabling crt
  drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
  drm/i915: also disable south interrupts when handling them
  drm/i915: enable irqs earlier when resuming
  drm/i915: Increase the RC6p threshold.
  DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
  drm/nv50-: prevent some races between modesetting and page flipping
  drm/nouveau/i2c: drop parent refcount when creating ports
  drm/nv84: fix regression in page flipping
  drm/nouveau: Fix typo in init_idx_addr_latched().
  drm/nouveau: Disable AGP on PowerPC again.
  ...

Merge tag 'pm+acpi-3.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael J Wysocki:

- Two fixes for the new intel_pstate driver from Dirk Brandewie.

- Fix for incorrect usage of the .find_bridge() callback from struct
   acpi_bus_type in the USB core and subsequent removal of that callback
   from Rafael J Wysocki.

- ACPI processor driver cleanups from Chen Gang and Syam Sidhardhan.

- ACPI initialization and error messages fix from Joe Perches.

- Operating Performance Points documentation improvement from Nishanth
   Menon.

- Fixes for memory leaks and potential concurrency issues and sysfs
  attributes leaks during device removal in the core device PM QoS code
  from Rafael J Wysocki.

- Calxeda Highbank cpufreq driver simplification from Emilio López.

- cpufreq comment cleanup from Namhyung Kim.

- Fix for a section mismatch in Calxeda Highbank interprocessor
   communication code from Mark Langsdorf (this is not a PM fix strictly
   speaking, but the code in question went in through the PM tree).

* tag 'pm+acpi-3.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq / intel_pstate: Do not load on VM that does not report max P state.
  cpufreq / intel_pstate: Fix intel_pstate_init() error path
  ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
  ACPI / glue: Add .match() callback to struct acpi_bus_type
  ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
  ACPI / processor: Remove redundant NULL check before kfree
  ACPI / Sleep: Avoid interleaved message on errors
  PM / QoS: Remove device PM QoS sysfs attributes at the right place
  PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
  cpufreq: highbank: do not initialize array with a loop
  PM / OPP: improve introductory documentation
  cpufreq: Fix a typo in comment
  mailbox, pl320-ipc: remove __init from probe function

drm/tegra: drop "select DRM_HDMI"

Commit ac24c2204a76e5b42aa103bf963ae0eda1b827f3 ("drm/tegra: Use generic
HDMI infoframe helpers") added "select DRM_HDMI" to the DRM_TEGRA
Kconfig entry. But there is no Kconfig symbol named DRM_HDMI. The select
statement for that symbol is a nop. Drop it.

What was needed to use HDMI functionality was to select HDMI (which this
entry already did through depending on DRM) and to include linux/hdmi.h
(which this commit also did).

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: Documentation typo fixes

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: Bug fix: Renesas board now selects native resolution.

Renesas boards were consistently defaulting to the 1024x768 resolution,
regardless of the native resolution of the monitor plugged in.  It was
determined that the EDID of the monitor was not being read.  Since the
DAC is a shared line, in order to read from or write to it we must take
control of the DAC clock.  This can be done by setting the proper
register to one.

This bug fix sets the register MGA1064_GEN_IO_CTL2 to one.  The DAC
control line can be used to determine whether or not a new monitor has
been plugged in.  But since the hotplug feature is not one we will
support, it has been decided to simply leave the register set to one.

Signed-off-by: Julia Lemire <jlemire@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: Reject modes that are too big for VRAM

A monitor or a user could request a resolution greater than the
available VRAM for the backing framebuffer. This change checks the
required framebuffer size against the max VRAM size and rejects modes
if they are too big. This change can also remove a mode request passed
in via the video= parameter.

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/mgag200: 'fbdev_list' in 'struct mga_fbdev' is not used

Signed-off-by: Christopher Harvey <charvey@matrox.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux into drm-next

Alex writes:
  Radeon fixes pull.  Not much to it.
  - fix some splatter if the interrupt handler isn't registered
  - Add a quirk for an old R200 board to fix washed out colors on the DAC
  - Don't try and soft reset the MC when we reset the GPU.  It usually doesn't
    need it and doesn't always work reliably.
  - A CS checker fix from Marek

* 'drm-fixes-3.9' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK
  drm/radeon: skip MC reset as it's probably not hung
  drm/radeon: add primary dac adj quirk for R200 board
  drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"Mainly a group of fixes, the only exception is the wiring up of the
  kcmp syscall now that those patches went in during the last merge
  window."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations
  ARM: 7667/1: perf: Fix section mismatch on armpmu_init()
  ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor
  ARM: 7665/1: Wire up kcmp syscall
  ARM: 7664/1: perf: remove erroneous semicolon from event initialisation
  ARM: 7663/1: perf: fix ARMv7 EVTYPE_MASK to include NSH bit
  ARM: 7662/1: hw_breakpoint: reset debug logic on secondary CPUs in s2ram resume
  ARM: 7661/1: mm: perform explicit branch predictor maintenance when required
  ARM: 7660/1: tlb: add branch predictor maintenance operations
  ARM: 7659/1: mm: make mm->context.id an atomic64_t variable
  ARM: 7658/1: mm: fix race updating mm->context.id on ASID rollover
  ARM: 7657/1: head: fix swapper and idmap population with LPAE and big-endian
  ARM: 7655/1: smp_twd: make twd_local_timer_of_register() no-op for nosmp
  ARM: 7652/1: mm: fix missing use of 'asid' to get asid value from mm->context.id
  ARM: 7642/1: netx: bump IRQ offset to 64

Merge tag 'efi-for-3.9-rc2' into x86/urgent

EFI changes for v3.9-rc2,

  * Make the EFI variable code more paranoid about running out of
    space in NVRAM, since this is the root cause of the recent issue
    where machines refuse to boot - from Matthew Garrett.

  * Some efivarfs patches that fix regressions introduced in v3.9-rc1.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Do not try to sync identity map for non-mapped pages

kernel_map_sync_memtype() is called from a variety of contexts.  The
pat.c code that calls it seems to ensure that it is not called for
non-ram areas by checking via pat_pagerange_is_ram().  It is important
that it only be called on the actual identity map because there *IS*
no map to sync for highmem pages, or for memory holes.

The ioremap.c uses are not as careful as those from pat.c, and call
kernel_map_sync_memtype() on PCI space which is in the middle of the
kernel identity map _range_, but is not actually mapped.

This patch adds a check to kernel_map_sync_memtype() which probably
duplicates some of the checks already in pat.c.  But, it is necessary
for the ioremap.c uses and shouldn't hurt other callers.

I have reproduced this bug and this patch fixes it for me and the
original bug reporter:

https://lkml.org/lkml/2013/2/5/396

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20130307163151.D9B58C4E@kernel.stglabs.ibm.com
Signed-off-by: Dave Hansen <dave@sr71.net>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

Merge tag 'regulator-3.9-rc1' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A few small things here and there, nothing major here really.  The
  conversion of twl4030ldo_ops to get_voltage_sel is a fix, as covered
  in the commit log it fixes inconsistency in handling of the IS_UNSUP()
  feature in the driver."

* tag 'regulator-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: fixed regulator_bulk_enable unwinding code
  regulator: twl: Convert twl4030ldo_ops to get_voltage_sel
  regulator: palmas: fix number of SMPS voltages
  regulator: core: fix documentation error in regulator_allow_bypass
  regulator: core: update kernel documentation for regulator_desc
  regulator: db8500-prcmu - remove incorrect __exit markup

Merge tag 'regmap-v3.9-rc1' of git://git./linux/kernel/git/broonie/regmap

Pull regmap PM fix from Mark Brown:
"A simple fix to stop us leaking a runtime PM reference in the case
where we fail to enable a device."

* tag 'regmap-v3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: call pm_runtime_put in pm_runtime_get_sync failed case

Merge tag 'ecryptfs-3.9-rc2-fixes' of git://git./linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
"Minor code cleanups and new Kconfig option to disable /dev/ecryptfs

  The code cleanups fix up W=1 compiler warnings and some unnecessary
  checks.  The new Kconfig option, defaulting to N, allows the rarely
  used eCryptfs kernel to userspace communication channel to be compiled
  out.  This may be the first step in it being eventually removed."

Hmm.  I'm not sure whether these should be called "fixes", and it
probably should have gone in the merge window.  But I'll let it slide.

* tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: allow userspace messaging to be disabled
  eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid()
  ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check
  eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check
  eCryptfs: remove unneeded checks in virt_to_scatterlist()
  eCryptfs: Fix -Wmissing-prototypes warnings
  eCryptfs: Fix -Wunused-but-set-variable warnings
  eCryptfs: initialize payload_len in keystore.c

Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon patches from Guenter Roeck:
"Bug fixes for sht15 and ltc2978 driver plus some documentation
  updates"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (sht15) Check return value of regulator_enable()
  hwmon: (adt7410) Document ADT7420 support
  hwmon: (pmbus/ltc2978) Use detected chip ID to select supported functionality
  hwmon: (pmbus/ltc2978) Fix peak attribute handling
  hwmon: (pmbus/ltc2978) Update datasheet links
  hwmon: Update my e-mail address in driver documentation

drm/radeon: don't check mipmap alignment if MIP_ADDRESS is FMASK

The MIP_ADDRESS state has 2 meanings. If the texture has one sample
per pixel, it's a pointer to the mipmap chain. If the texture has
multiple samples per pixel, it's a pointer to FMASK, a metadata buffer
needed for reading compressed MSAA textures. The mipmap
alignment rules do not apply to FMASK.

Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: skip MC reset as it's probably not hung

The MC is mostly likely busy (e.g., display requests), not hung
so no need to reset it. Doing an MC reset is tricky and not
particularly reliable. Fixes hangs in certain cases.

Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add primary dac adj quirk for R200 board

vbios values are wrong leading to colors that are
too bright. Use the default values instead.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/radeon: don't set hpd, afmt interrupts when interrupts are disabled

Avoids splatter if the interrupt handler is not registered due
to acceleration being disabled.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org

ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

x86, doc: Be explicit about what the x86 struct boot_params requires

If the sentinel triggers, we do not want the boot loader authors to
just poke it and make the error go away, we want them to actually fix
the problem.

This should help avoid making the incorrect change in non-compliant
bootloaders.

[ hpa: dropped the Documentation/x86/boot.txt hunk pending
clarifications ]

Signed-off-by: Peter Jones <pjones@redhat.com>
Link: http://lkml.kernel.org/r/1362592823-28967-1-git-send-email-pjones@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Don't clear efi_info even if the sentinel hits

When boot_params->sentinel is set, all we really know is that some
undefined set of fields in struct boot_params contain garbage. In the
particular case of efi_info, however, there is a private magic for
that substructure, so it is generally safe to leave it even if the
bootloader is broken.

kexec (for which we did the initial analysis) did not initialize this
field, but of course all the EFI bootloaders do, and most EFI
bootloaders are broken in this respect (and should be fixed.)

Reported-by: Robin Holt <holt@sgi.com>
Link: http://lkml.kernel.org/r/CA%2B5PVA51-FT14p4CRYKbicykugVb=PiaEycdQ57CK2km_OQuRQ@mail.gmail.com
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86, mm: Make sure to find a 2M free block for the first mapped area

Henrik reported that his MacAir 3.1 would not boot with

| commit 8d57470d8f859635deffe3919d7d4867b488b85a
| Date: Fri Nov 16 19:38:58 2012 -0800
|
| x86, mm: setup page table in top-down

It turns out that we do not calculate the real_end properly:
We try to get 2M size with 4K alignment, and later will round down
to 2M, so we will get less then 2M for first mapping, in extreme
case could be only 4K only. In Henrik's system it has (1M-32K) as
last usable rage is [mem 0x7f9db000-0x7fef8fff].

The problem is exposed when EFI booting have several holes and it
will force mapping to use PTE instead as we only map usable areas.

To fix it, just make it be 2M aligned, so we can be guaranteed to be
able to use large pages to map it.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Bisected-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/CAE9FiQX4nQ7_1kg5RL_vh56rmcSHXUi1ExrZX7CwED4NGMnHfg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

x86: Fix 32-bit *_cpu_data initializers

The commit 27be457000211a6903968dfce06d5f73f051a217
('x86 idle: remove 32-bit-only "no-hlt" parameter, hlt_works_ok
flag') removed the hlt_works_ok flag from struct cpuinfo_x86, but
boot_cpu_data and new_cpu_data initializers were not changed
causing setting f00f_bug flag, instead of fdiv_bug.

If CONFIG_X86_F00F_BUG is not set the f00f_bug flag is never
cleared.

To avoid such problems in future C99-style initialization is now
used.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: len.brown@intel.com
Link: http://lkml.kernel.org/r/1362266082-2227-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next

A bunch of fixes, nothing truely horrible:
- Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs
  and subsequent fallout (Paulo)
- Fixup off-by-one in our hsw id table (Kenneth)
- Fixup ilk rc6 support (disabled by default), regression introduced in
  3.8
- g4x plane w/a from Egbert Eich
- gen2/3/4 dpms suspend/standy fixes for VGA outputs from Patrik Jakobsson
- Workaround dying ivb machines with less aggressive rc6 values (Stéphane
  Marchesin)

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Turn off hsync and vsync on ADPA when disabling crt
  drm/i915: Fix incorrect definition of ADPA HSYNC and VSYNC bits
  drm/i915: also disable south interrupts when handling them
  drm/i915: enable irqs earlier when resuming
  drm/i915: Increase the RC6p threshold.
  DRM/i915: On G45 enable cursor plane briefly after enabling the display plane.
  drm/i915: Fix Haswell/CRW PCI IDs.
  drm/i915: Don't clobber crtc->fb when queue_flip fails
  drm/i915: wait_event_timeout's timeout is in jiffies
  drm/i915: Fix missing variable initilization

ARM: 7667/1: perf: Fix section mismatch on armpmu_init()

WARNING: vmlinux.o(.text+0xfb80): Section mismatch in reference
from the function armpmu_register() to the function
.init.text:armpmu_init()
The function armpmu_register() references
the function __init armpmu_init().
This is often because armpmu_register lacks a __init
annotation or the annotation of armpmu_init is wrong.

Just drop the __init marking on armpmu_init() because
armpmu_register() no longer has an __init marking.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ARM: 7666/1: decompressor: add -mno-single-pic-base for building the decompressor

Before jumping to (position independent) C-code from the decompressor's
assembler world we set-up the C environment. This setup currently does not
set r9, which for arm-none-uclinux-uclibceabi toolchains is by default
expected to be the PIC offset base register (IE should point to the
beginning of the GOT).

Currently, therefore, in order to build working kernels that use the
decompressor it is necessary to use an arm-linux-gnueabi toolchain, or
similar. uClinux toolchains cause a prefetch abort to occur at the beginning
of the decompress_kernel function.

This patch allows uClinux toolchains to build bootable zImages by forcing
the -mno-single-pic-base option, which ensures that the location of the GOT
is re-derived each time it is required, and r9 becomes free for use as a
general purpose register.

This has a small (4% in instruction terms) advantage over the alternative of
setting r9 to point to the GOT before calling into the C-world.

Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge branch 'pm-fixes' into fixes

* pm-fixes:
  cpufreq / intel_pstate: Do not load on VM that does not report max P state.
  cpufreq / intel_pstate: Fix intel_pstate_init() error path
  PM / QoS: Remove device PM QoS sysfs attributes at the right place
  PM / QoS: Fix concurrency issues and memory leaks in device PM QoS
  cpufreq: highbank: do not initialize array with a loop
  PM / OPP: improve introductory documentation
  cpufreq: Fix a typo in comment
  mailbox, pl320-ipc: remove __init from probe function

Merge branch 'acpi-fixes' into fixes

* acpi-fixes:
  ACPI / glue: Drop .find_bridge() callback from struct acpi_bus_type
  ACPI / glue: Add .match() callback to struct acpi_bus_type
  ACPI / porocessor: Beautify code, pr->id is u32 which is never < 0
  ACPI / processor: Remove redundant NULL check before kfree
  ACPI / Sleep: Avoid interleaved message on errors

cpufreq / intel_pstate: Do not load on VM that does not report max P state.

It seems some VMs support the P state MSRs but return zeros. Fail
gracefully if we are running in this environment.

References: https://bugzilla.redhat.com/show_bug.cgi?id=916833
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

cpufreq / intel_pstate: Fix intel_pstate_init() error path

If cpufreq_register_driver() fails just free memory that has been
allocated and return. intel_pstate_exit() function is removed since we
are built-in only now there is no reason for a module exit procedure.

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

drm/i915: Turn off hsync and vsync on ADPA when disabling crt

According to PRM we need to disable hsync and vsync even though ADPA is
disabled. The previous code did infact do the opposite so we fix it.

Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56359
Tested-by: max <manikulin@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

efivarfs: return accurate error code in efivarfs_fill_super()

Joseph was hitting a failure case when mounting efivarfs which
resulted in an incorrect error message,

$ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory

triggered when efivarfs_valid_name() returned -EINVAL.

Make sure we pass accurate return values up the stack if
efivarfs_fill_super() fails to build inodes for EFI variables.

Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

efivars: efivarfs_valid_name() should handle pstore syntax

Stricter validation was introduced with commit da27a24383b2b
("efivarfs: guid part of filenames are case-insensitive") and commit
47f531e8ba3b ("efivarfs: Validate filenames much more aggressively"),
which is necessary for the guid portion of efivarfs filenames, but we
don't need to be so strict with the first part, the variable name. The
UEFI specification doesn't impose any constraints on variable names
other than they be a NULL-terminated string.

The above commits caused a regression that resulted in users seeing
the following message,

  $ sudo mount -v /sys/firmware/efi/efivars mount: Cannot allocate memory

whenever pstore EFI variables were present in the variable store,
since their variable names failed to pass the following check,

    /* GUID should be right after the first '-' */
    if (s - 1 != strchr(str, '-'))

as a typical pstore filename is of the form, dump-type0-10-1-<guid>.
The fix is trivial since the guid portion of the filename is GUID_LEN
bytes, we can use (len - GUID_LEN) to ensure the '-' character is
where we expect it to be.

(The bogus ENOMEM error value will be fixed in a separate patch.)

Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

efi: be more paranoid about available space when creating variables

UEFI variables are typically stored in flash. For various reasons, avaiable
space is typically not reclaimed immediately upon the deletion of a
variable - instead, the system will garbage collect during initialisation
after a reboot.

Some systems appear to handle this garbage collection extremely poorly,
failing if more than 50% of the system flash is in use. This can result in
the machine refusing to boot. The safest thing to do for the moment is to
forbid writes if they'd end up using more than half of the storage space.
We can make this more finegrained later if we come up with a method for
identifying the broken machines.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>