Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
authorDavid S. Miller <davem@davemloft.net>
Tue, 14 Apr 2015 22:51:19 +0000 (18:51 -0400)
committerDavid S. Miller <davem@davemloft.net>
Tue, 14 Apr 2015 22:51:19 +0000 (18:51 -0400)
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A final pull request, I know it's very late but this time I think it's worth a
bit of rush.

The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.

This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.

This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.

Patrick McHardy says:

====================
netfilter: nf_tables: concatenation support

The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).

The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.

Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.

Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.

As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.

The patches are split up as follows:

 * the first five patches add length validation to register loads and
   stores to make sure we stay within bounds and prepare the validation
   functions for the new addressing mode

 * the next patches prepare for changing to 32 bit addressing by
   introducing a struct nft_regs, which holds the verdict register as
   well as the data registers. The verdict members are moved to a new
   struct nft_verdict to allow to pull struct nft_data out of the stack.

 * the next patches contain preparatory conversions of expressions and
   sets to use 32 bit addressing

 * the next patch introduces so far unused register conversion helpers
   for parsing and dumping register numbers over netlink

 * following is the real conversion to 32 bit addressing, consisting of
   replacing struct nft_data in struct nft_regs by an array of u32s and
   actually translating and validating the new register numbers.

 * the final two patches add support for variable sized data items and
   variable sized keys / data in set elements

The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================

Patrick McHardy says:

====================
netfilter: nf_tables: dynamic stateful expression instantiation

The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.

Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.

In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.

We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.

The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:

1. Limit the rate of new SSH connections per host, similar to iptables
   hashlimit:

        flow ip saddr timeout 60s \
        limit 10/second \
        accept

2. Account network traffic between each set of /24 networks:

        flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
        counter

3. Account traffic to each host per user:

        flow skuid . ip daddr \
        counter

4. Account traffic for each combination of source address and TCP flags:

        flow ip saddr . tcp flags \
        counter

The resulting set content after a Xmas-scan look like this:

{
        192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
        192.168.122.1 . ack : counter packets 74 bytes 3848,
        192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}

In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
135 files changed:
Documentation/networking/rds.txt
arch/s390/hypfs/inode.c
crypto/algif_hash.c
crypto/algif_skcipher.c
drivers/char/mem.c
drivers/char/tile-srom.c
drivers/infiniband/hw/ipath/ipath_file_ops.c
drivers/infiniband/hw/qib/qib_file_ops.c
drivers/misc/mei/amthif.c
drivers/misc/mei/main.c
drivers/misc/mei/pci-me.c
drivers/net/ethernet/broadcom/bgmac.c
drivers/net/ethernet/broadcom/bgmac.h
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
drivers/net/ethernet/chelsio/cxgb4/sge.c
drivers/net/ethernet/emulex/benet/be.h
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/e1000e/e1000.h
drivers/net/ethernet/intel/e1000e/netdev.c
drivers/net/ethernet/toshiba/Kconfig
drivers/net/hyperv/hyperv_net.h
drivers/net/hyperv/netvsc.c
drivers/net/hyperv/netvsc_drv.c
drivers/net/phy/Kconfig
drivers/net/phy/broadcom.c
drivers/net/usb/usbnet.c
drivers/net/vxlan.c
drivers/scsi/sg.c
drivers/staging/unisys/include/timskmod.h
drivers/usb/gadget/function/f_fs.c
drivers/usb/gadget/legacy/inode.c
drivers/vhost/net.c
fs/9p/vfs_addr.c
fs/affs/file.c
fs/afs/misc.c
fs/afs/rxrpc.c
fs/afs/write.c
fs/aio.c
fs/bfs/inode.c
fs/block_dev.c
fs/btrfs/file.c
fs/btrfs/inode.c
fs/ceph/file.c
fs/direct-io.c
fs/ecryptfs/file.c
fs/ext2/inode.c
fs/ext3/inode.c
fs/ext4/file.c
fs/ext4/indirect.c
fs/ext4/inode.c
fs/ext4/page-io.c
fs/f2fs/data.c
fs/fat/inode.c
fs/fuse/cuse.c
fs/fuse/dev.c
fs/fuse/file.c
fs/fuse/fuse_i.h
fs/gfs2/aops.c
fs/gfs2/file.c
fs/hfs/inode.c
fs/hfsplus/inode.c
fs/jfs/inode.c
fs/nfs/direct.c
fs/nfs/file.c
fs/nilfs2/inode.c
fs/ntfs/file.c
fs/ntfs/inode.c
fs/ocfs2/aops.c
fs/ocfs2/aops.h
fs/ocfs2/file.c
fs/pipe.c
fs/read_write.c
fs/reiserfs/inode.c
fs/splice.c
fs/ubifs/file.c
fs/udf/file.c
fs/udf/inode.c
fs/xfs/xfs_aops.c
fs/xfs/xfs_file.c
include/linux/aio.h
include/linux/brcmphy.h
include/linux/fs.h
include/linux/net.h
include/linux/netdevice.h
include/linux/netlink.h
include/linux/rtnetlink.h
include/linux/socket.h
include/linux/uio.h
include/net/compat.h
include/net/inet_timewait_sock.h
include/net/sock.h
include/rxrpc/packet.h
kernel/printk/printk.c
kernel/sysctl.c
lib/iov_iter.c
mm/filemap.c
mm/page_io.c
mm/shmem.c
net/compat.c
net/core/datagram.c
net/core/dev.c
net/dccp/minisocks.c
net/ipv4/fou.c
net/ipv4/geneve.c
net/ipv4/inet_diag.c
net/ipv4/inet_hashtables.c
net/ipv4/inet_timewait_sock.c
net/ipv4/proc.c
net/ipv4/raw.c
net/ipv4/tcp.c
net/ipv4/tcp_input.c
net/ipv4/tcp_ipv4.c
net/ipv4/tcp_minisocks.c
net/ipv4/tcp_output.c
net/ipv6/inet6_hashtables.c
net/ipv6/ip6_vti.c
net/ipv6/tcp_ipv6.c
net/netfilter/nfnetlink_log.c
net/netfilter/nfnetlink_queue_core.c
net/netfilter/xt_TPROXY.c
net/nfc/netlink.c
net/rds/connection.c
net/rds/rds.h
net/rds/send.c
net/rxrpc/ar-input.c
net/rxrpc/ar-internal.h
net/rxrpc/ar-local.c
net/rxrpc/ar-output.c
net/sched/sch_ingress.c
net/sched/sch_netem.c
net/socket.c
net/sunrpc/svcsock.c
net/xfrm/xfrm_input.c
security/selinux/nlmsgtab.c
sound/core/pcm_native.c

index c67077cbeb800a1265f1e2ba801fb293fa0d43f8..e1a3d59bbe0f5f7e75889776dfb0979a7ef20cff 100644 (file)
@@ -62,11 +62,10 @@ Socket Interface
 ================
 
   AF_RDS, PF_RDS, SOL_RDS
-        These constants haven't been assigned yet, because RDS isn't in
-        mainline yet. Currently, the kernel module assigns some constant
-        and publishes it to user space through two sysctl files
-                /proc/sys/net/rds/pf_rds
-                /proc/sys/net/rds/sol_rds
+       AF_RDS and PF_RDS are the domain type to be used with socket(2)
+       to create RDS sockets. SOL_RDS is the socket-level to be used
+       with setsockopt(2) and getsockopt(2) for RDS specific socket
+       options.
 
   fd = socket(PF_RDS, SOCK_SEQPACKET, 0);
         This creates a new, unbound RDS socket.
index 99824ff8dd354e74ff421a2c9bb59243e045d541..df7d8cbee377a229c5f609d92ea7b72045c9cc86 100644 (file)
@@ -21,7 +21,7 @@
 #include <linux/module.h>
 #include <linux/seq_file.h>
 #include <linux/mount.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include <asm/ebcdic.h>
 #include "hypfs.h"
 
index 0a465e0f301291f3e63b4306905f25bcffa50284..1396ad0787fc6b84ebdd9a9d552ce7eaccf1e698 100644 (file)
@@ -56,8 +56,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
 
        ctx->more = 0;
 
-       while (iov_iter_count(&msg->msg_iter)) {
-               int len = iov_iter_count(&msg->msg_iter);
+       while (msg_data_left(msg)) {
+               int len = msg_data_left(msg);
 
                if (len > limit)
                        len = limit;
index 0aa02635ceda67a0591c6eb2a25a1d789a8a2012..945075292bc9584e57f4612bb1b7549a8e9e9b22 100644 (file)
@@ -106,7 +106,7 @@ static void skcipher_async_cb(struct crypto_async_request *req, int err)
        atomic_dec(&ctx->inflight);
        skcipher_free_async_sgls(sreq);
        kfree(req);
-       aio_complete(iocb, err, err);
+       iocb->ki_complete(iocb, err, err);
 }
 
 static inline int skcipher_sndbuf(struct sock *sk)
@@ -641,7 +641,7 @@ static int skcipher_recvmsg_sync(struct socket *sock, struct msghdr *msg,
        long copied = 0;
 
        lock_sock(sk);
-       while (iov_iter_count(&msg->msg_iter)) {
+       while (msg_data_left(msg)) {
                sgl = list_first_entry(&ctx->tsgl,
                                       struct skcipher_sg_list, list);
                sg = sgl->sg;
@@ -655,7 +655,7 @@ static int skcipher_recvmsg_sync(struct socket *sock, struct msghdr *msg,
                                goto unlock;
                }
 
-               used = min_t(unsigned long, ctx->used, iov_iter_count(&msg->msg_iter));
+               used = min_t(unsigned long, ctx->used, msg_data_left(msg));
 
                used = af_alg_make_sg(&ctx->rsgl, &msg->msg_iter, used);
                err = used;
index 297110c12635d3b8c36b53a7f089aa2666024eca..9c4fd7a8e2e5c466e6df5144082a22b868214d9f 100644 (file)
@@ -26,7 +26,7 @@
 #include <linux/pfn.h>
 #include <linux/export.h>
 #include <linux/io.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 #include <linux/uaccess.h>
 
index 02e76ac6d282d5a26a31598b05b4d89a6fefd671..69f6b4acc377143d87a54d5e46dd9710f4d53f73 100644 (file)
@@ -27,7 +27,6 @@
 #include <linux/types.h>       /* size_t */
 #include <linux/proc_fs.h>
 #include <linux/fcntl.h>       /* O_ACCMODE */
-#include <linux/aio.h>
 #include <linux/pagemap.h>
 #include <linux/hugetlb.h>
 #include <linux/uaccess.h>
index 6d7f453b4d05ef7da7f74aeafe22608b85dc00fc..aed8afee56da16a6a3609a247c9bea2c54060c44 100644 (file)
@@ -40,7 +40,6 @@
 #include <linux/slab.h>
 #include <linux/highmem.h>
 #include <linux/io.h>
-#include <linux/aio.h>
 #include <linux/jiffies.h>
 #include <linux/cpu.h>
 #include <asm/pgtable.h>
index 41937c6f888af13deadb6c7b25678cfc34596cf8..14046f5a37fa332cf5e5b25ba1a86a5fe7918188 100644 (file)
@@ -39,7 +39,6 @@
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
 #include <linux/io.h>
-#include <linux/aio.h>
 #include <linux/jiffies.h>
 #include <asm/pgtable.h>
 #include <linux/delay.h>
index c4cb9a984a5fb3965bba581eab0dd5c9096ec851..40ea639fa413a92f0239e1835894e73d36ddc1c1 100644 (file)
@@ -19,7 +19,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
-#include <linux/aio.h>
 #include <linux/ioctl.h>
 #include <linux/cdev.h>
 #include <linux/list.h>
index 3c019c0e60eb859ede0621f26c4d71d72fca8abe..47680c84801c766f158bf65c2e2dd3893fdff300 100644 (file)
@@ -22,7 +22,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
-#include <linux/aio.h>
 #include <linux/poll.h>
 #include <linux/init.h>
 #include <linux/ioctl.h>
index bd3039ab8f98e67e86de56ef8429dfd164c89d86..af44ee26075d8b520401a2cc52150b74d45157f0 100644 (file)
@@ -21,7 +21,6 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/fcntl.h>
-#include <linux/aio.h>
 #include <linux/pci.h>
 #include <linux/poll.h>
 #include <linux/ioctl.h>
index fa8f9e147c34b6ed2c49e4c7679cfb9bcac34298..5cb93d1f50a482c1aa3f7f855ca77d1d8b385740 100644 (file)
@@ -123,7 +123,7 @@ bgmac_dma_tx_add_buf(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
        struct bgmac_dma_desc *dma_desc;
        u32 ctl1;
 
-       if (i == ring->num_slots - 1)
+       if (i == BGMAC_TX_RING_SLOTS - 1)
                ctl0 |= BGMAC_DESC_CTL0_EOT;
 
        ctl1 = len & BGMAC_DESC_CTL1_LEN;
@@ -142,11 +142,10 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
 {
        struct device *dma_dev = bgmac->core->dma_dev;
        struct net_device *net_dev = bgmac->net_dev;
-       struct bgmac_slot_info *slot = &ring->slots[ring->end];
-       int free_slots;
+       int index = ring->end % BGMAC_TX_RING_SLOTS;
+       struct bgmac_slot_info *slot = &ring->slots[index];
        int nr_frags;
        u32 flags;
-       int index = ring->end;
        int i;
 
        if (skb->len > BGMAC_DESC_CTL1_LEN) {
@@ -159,12 +158,10 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
 
        nr_frags = skb_shinfo(skb)->nr_frags;
 
-       if (ring->start <= ring->end)
-               free_slots = ring->start - ring->end + BGMAC_TX_RING_SLOTS;
-       else
-               free_slots = ring->start - ring->end;
-
-       if (free_slots <= nr_frags + 1) {
+       /* ring->end - ring->start will return the number of valid slots,
+        * even when ring->end overflows
+        */
+       if (ring->end - ring->start + nr_frags + 1 >= BGMAC_TX_RING_SLOTS) {
                bgmac_err(bgmac, "TX ring is full, queue should be stopped!\n");
                netif_stop_queue(net_dev);
                return NETDEV_TX_BUSY;
@@ -200,7 +197,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
        }
 
        slot->skb = skb;
-
+       ring->end += nr_frags + 1;
        netdev_sent_queue(net_dev, skb->len);
 
        wmb();
@@ -208,13 +205,12 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
        /* Increase ring->end to point empty slot. We tell hardware the first
         * slot it should *not* read.
         */
-       ring->end = (index + 1) % BGMAC_TX_RING_SLOTS;
        bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_INDEX,
                    ring->index_base +
-                   ring->end * sizeof(struct bgmac_dma_desc));
+                   (ring->end % BGMAC_TX_RING_SLOTS) *
+                   sizeof(struct bgmac_dma_desc));
 
-       free_slots -= nr_frags + 1;
-       if (free_slots < 8)
+       if (ring->end - ring->start >= BGMAC_TX_RING_SLOTS - 8)
                netif_stop_queue(net_dev);
 
        return NETDEV_TX_OK;
@@ -256,17 +252,17 @@ static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
        empty_slot &= BGMAC_DMA_TX_STATDPTR;
        empty_slot /= sizeof(struct bgmac_dma_desc);
 
-       while (ring->start != empty_slot) {
-               struct bgmac_slot_info *slot = &ring->slots[ring->start];
-               u32 ctl1 = le32_to_cpu(ring->cpu_base[ring->start].ctl1);
-               int len = ctl1 & BGMAC_DESC_CTL1_LEN;
+       while (ring->start != ring->end) {
+               int slot_idx = ring->start % BGMAC_TX_RING_SLOTS;
+               struct bgmac_slot_info *slot = &ring->slots[slot_idx];
+               u32 ctl1;
+               int len;
 
-               if (!slot->dma_addr) {
-                       bgmac_err(bgmac, "Hardware reported transmission for empty TX ring slot %d! End of ring: %d\n",
-                                 ring->start, ring->end);
-                       goto next;
-               }
+               if (slot_idx == empty_slot)
+                       break;
 
+               ctl1 = le32_to_cpu(ring->cpu_base[slot_idx].ctl1);
+               len = ctl1 & BGMAC_DESC_CTL1_LEN;
                if (ctl1 & BGMAC_DESC_CTL0_SOF)
                        /* Unmap no longer used buffer */
                        dma_unmap_single(dma_dev, slot->dma_addr, len,
@@ -284,10 +280,8 @@ static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
                        slot->skb = NULL;
                }
 
-next:
                slot->dma_addr = 0;
-               if (++ring->start >= BGMAC_TX_RING_SLOTS)
-                       ring->start = 0;
+               ring->start++;
                freed = true;
        }
 
@@ -352,13 +346,13 @@ static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
                return -ENOMEM;
 
        /* Poison - if everything goes fine, hardware will overwrite it */
-       rx = buf;
+       rx = buf + BGMAC_RX_BUF_OFFSET;
        rx->len = cpu_to_le16(0xdead);
        rx->flags = cpu_to_le16(0xbeef);
 
        /* Map skb for the DMA */
-       dma_addr = dma_map_single(dma_dev, buf, BGMAC_RX_BUF_SIZE,
-                                 DMA_FROM_DEVICE);
+       dma_addr = dma_map_single(dma_dev, buf + BGMAC_RX_BUF_OFFSET,
+                                 BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
        if (dma_mapping_error(dma_dev, dma_addr)) {
                bgmac_err(bgmac, "DMA mapping error\n");
                put_page(virt_to_head_page(buf));
@@ -372,13 +366,23 @@ static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
        return 0;
 }
 
+static void bgmac_dma_rx_update_index(struct bgmac *bgmac,
+                                     struct bgmac_dma_ring *ring)
+{
+       dma_wmb();
+
+       bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_INDEX,
+                   ring->index_base +
+                   ring->end * sizeof(struct bgmac_dma_desc));
+}
+
 static void bgmac_dma_rx_setup_desc(struct bgmac *bgmac,
                                    struct bgmac_dma_ring *ring, int desc_idx)
 {
        struct bgmac_dma_desc *dma_desc = ring->cpu_base + desc_idx;
        u32 ctl0 = 0, ctl1 = 0;
 
-       if (desc_idx == ring->num_slots - 1)
+       if (desc_idx == BGMAC_RX_RING_SLOTS - 1)
                ctl0 |= BGMAC_DESC_CTL0_EOT;
        ctl1 |= BGMAC_RX_BUF_SIZE & BGMAC_DESC_CTL1_LEN;
        /* Is there any BGMAC device that requires extension? */
@@ -390,6 +394,21 @@ static void bgmac_dma_rx_setup_desc(struct bgmac *bgmac,
        dma_desc->addr_high = cpu_to_le32(upper_32_bits(ring->slots[desc_idx].dma_addr));
        dma_desc->ctl0 = cpu_to_le32(ctl0);
        dma_desc->ctl1 = cpu_to_le32(ctl1);
+
+       ring->end = desc_idx;
+}
+
+static void bgmac_dma_rx_poison_buf(struct device *dma_dev,
+                                   struct bgmac_slot_info *slot)
+{
+       struct bgmac_rx_header *rx = slot->buf + BGMAC_RX_BUF_OFFSET;
+
+       dma_sync_single_for_cpu(dma_dev, slot->dma_addr, BGMAC_RX_BUF_SIZE,
+                               DMA_FROM_DEVICE);
+       rx->len = cpu_to_le16(0xdead);
+       rx->flags = cpu_to_le16(0xbeef);
+       dma_sync_single_for_device(dma_dev, slot->dma_addr, BGMAC_RX_BUF_SIZE,
+                                  DMA_FROM_DEVICE);
 }
 
 static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
@@ -404,64 +423,53 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
        end_slot &= BGMAC_DMA_RX_STATDPTR;
        end_slot /= sizeof(struct bgmac_dma_desc);
 
-       ring->end = end_slot;
-
-       while (ring->start != ring->end) {
+       while (ring->start != end_slot) {
                struct device *dma_dev = bgmac->core->dma_dev;
                struct bgmac_slot_info *slot = &ring->slots[ring->start];
-               struct bgmac_rx_header *rx = slot->buf;
+               struct bgmac_rx_header *rx = slot->buf + BGMAC_RX_BUF_OFFSET;
                struct sk_buff *skb;
                void *buf = slot->buf;
+               dma_addr_t dma_addr = slot->dma_addr;
                u16 len, flags;
 
-               /* Unmap buffer to make it accessible to the CPU */
-               dma_sync_single_for_cpu(dma_dev, slot->dma_addr,
-                                       BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+               do {
+                       /* Prepare new skb as replacement */
+                       if (bgmac_dma_rx_skb_for_slot(bgmac, slot)) {
+                               bgmac_dma_rx_poison_buf(dma_dev, slot);
+                               break;
+                       }
 
-               /* Get info from the header */
-               len = le16_to_cpu(rx->len);
-               flags = le16_to_cpu(rx->flags);
+                       /* Unmap buffer to make it accessible to the CPU */
+                       dma_unmap_single(dma_dev, dma_addr,
+                                        BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
 
-               do {
-                       dma_addr_t old_dma_addr = slot->dma_addr;
-                       int err;
+                       /* Get info from the header */
+                       len = le16_to_cpu(rx->len);
+                       flags = le16_to_cpu(rx->flags);
 
                        /* Check for poison and drop or pass the packet */
                        if (len == 0xdead && flags == 0xbeef) {
                                bgmac_err(bgmac, "Found poisoned packet at slot %d, DMA issue!\n",
                                          ring->start);
-                               dma_sync_single_for_device(dma_dev,
-                                                          slot->dma_addr,
-                                                          BGMAC_RX_BUF_SIZE,
-                                                          DMA_FROM_DEVICE);
+                               put_page(virt_to_head_page(buf));
                                break;
                        }
 
-                       /* Omit CRC. */
-                       len -= ETH_FCS_LEN;
-
-                       /* Prepare new skb as replacement */
-                       err = bgmac_dma_rx_skb_for_slot(bgmac, slot);
-                       if (err) {
-                               /* Poison the old skb */
-                               rx->len = cpu_to_le16(0xdead);
-                               rx->flags = cpu_to_le16(0xbeef);
-
-                               dma_sync_single_for_device(dma_dev,
-                                                          slot->dma_addr,
-                                                          BGMAC_RX_BUF_SIZE,
-                                                          DMA_FROM_DEVICE);
+                       if (len > BGMAC_RX_ALLOC_SIZE) {
+                               bgmac_err(bgmac, "Found oversized packet at slot %d, DMA issue!\n",
+                                         ring->start);
+                               put_page(virt_to_head_page(buf));
                                break;
                        }
-                       bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
 
-                       /* Unmap old skb, we'll pass it to the netfif */
-                       dma_unmap_single(dma_dev, old_dma_addr,
-                                        BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
+                       /* Omit CRC. */
+                       len -= ETH_FCS_LEN;
 
                        skb = build_skb(buf, BGMAC_RX_ALLOC_SIZE);
-                       skb_put(skb, BGMAC_RX_FRAME_OFFSET + len);
-                       skb_pull(skb, BGMAC_RX_FRAME_OFFSET);
+                       skb_put(skb, BGMAC_RX_FRAME_OFFSET +
+                               BGMAC_RX_BUF_OFFSET + len);
+                       skb_pull(skb, BGMAC_RX_FRAME_OFFSET +
+                                BGMAC_RX_BUF_OFFSET);
 
                        skb_checksum_none_assert(skb);
                        skb->protocol = eth_type_trans(skb, bgmac->net_dev);
@@ -469,6 +477,8 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
                        handled++;
                } while (0);
 
+               bgmac_dma_rx_setup_desc(bgmac, ring, ring->start);
+
                if (++ring->start >= BGMAC_RX_RING_SLOTS)
                        ring->start = 0;
 
@@ -476,6 +486,8 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct bgmac_dma_ring *ring,
                        break;
        }
 
+       bgmac_dma_rx_update_index(bgmac, ring);
+
        return handled;
 }
 
@@ -509,7 +521,7 @@ static void bgmac_dma_tx_ring_free(struct bgmac *bgmac,
        struct bgmac_slot_info *slot;
        int i;
 
-       for (i = 0; i < ring->num_slots; i++) {
+       for (i = 0; i < BGMAC_TX_RING_SLOTS; i++) {
                int len = dma_desc[i].ctl1 & BGMAC_DESC_CTL1_LEN;
 
                slot = &ring->slots[i];
@@ -534,21 +546,22 @@ static void bgmac_dma_rx_ring_free(struct bgmac *bgmac,
        struct bgmac_slot_info *slot;
        int i;
 
-       for (i = 0; i < ring->num_slots; i++) {
+       for (i = 0; i < BGMAC_RX_RING_SLOTS; i++) {
                slot = &ring->slots[i];
-               if (!slot->buf)
+               if (!slot->dma_addr)
                        continue;
 
-               if (slot->dma_addr)
-                       dma_unmap_single(dma_dev, slot->dma_addr,
-                                        BGMAC_RX_BUF_SIZE,
-                                        DMA_FROM_DEVICE);
+               dma_unmap_single(dma_dev, slot->dma_addr,
+                                BGMAC_RX_BUF_SIZE,
+                                DMA_FROM_DEVICE);
                put_page(virt_to_head_page(slot->buf));
+               slot->dma_addr = 0;
        }
 }
 
 static void bgmac_dma_ring_desc_free(struct bgmac *bgmac,
-                                    struct bgmac_dma_ring *ring)
+                                    struct bgmac_dma_ring *ring,
+                                    int num_slots)
 {
        struct device *dma_dev = bgmac->core->dma_dev;
        int size;
@@ -557,23 +570,33 @@ static void bgmac_dma_ring_desc_free(struct bgmac *bgmac,
            return;
 
        /* Free ring of descriptors */
-       size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+       size = num_slots * sizeof(struct bgmac_dma_desc);
        dma_free_coherent(dma_dev, size, ring->cpu_base,
                          ring->dma_base);
 }
 
-static void bgmac_dma_free(struct bgmac *bgmac)
+static void bgmac_dma_cleanup(struct bgmac *bgmac)
 {
        int i;
 
-       for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
+       for (i = 0; i < BGMAC_MAX_TX_RINGS; i++)
                bgmac_dma_tx_ring_free(bgmac, &bgmac->tx_ring[i]);
-               bgmac_dma_ring_desc_free(bgmac, &bgmac->tx_ring[i]);
-       }
-       for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
+
+       for (i = 0; i < BGMAC_MAX_RX_RINGS; i++)
                bgmac_dma_rx_ring_free(bgmac, &bgmac->rx_ring[i]);
-               bgmac_dma_ring_desc_free(bgmac, &bgmac->rx_ring[i]);
-       }
+}
+
+static void bgmac_dma_free(struct bgmac *bgmac)
+{
+       int i;
+
+       for (i = 0; i < BGMAC_MAX_TX_RINGS; i++)
+               bgmac_dma_ring_desc_free(bgmac, &bgmac->tx_ring[i],
+                                        BGMAC_TX_RING_SLOTS);
+
+       for (i = 0; i < BGMAC_MAX_RX_RINGS; i++)
+               bgmac_dma_ring_desc_free(bgmac, &bgmac->rx_ring[i],
+                                        BGMAC_RX_RING_SLOTS);
 }
 
 static int bgmac_dma_alloc(struct bgmac *bgmac)
@@ -596,11 +619,10 @@ static int bgmac_dma_alloc(struct bgmac *bgmac)
 
        for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
                ring = &bgmac->tx_ring[i];
-               ring->num_slots = BGMAC_TX_RING_SLOTS;
                ring->mmio_base = ring_base[i];
 
                /* Alloc ring of descriptors */
-               size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+               size = BGMAC_TX_RING_SLOTS * sizeof(struct bgmac_dma_desc);
                ring->cpu_base = dma_zalloc_coherent(dma_dev, size,
                                                     &ring->dma_base,
                                                     GFP_KERNEL);
@@ -621,14 +643,11 @@ static int bgmac_dma_alloc(struct bgmac *bgmac)
        }
 
        for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
-               int j;
-
                ring = &bgmac->rx_ring[i];
-               ring->num_slots = BGMAC_RX_RING_SLOTS;
                ring->mmio_base = ring_base[i];
 
                /* Alloc ring of descriptors */
-               size = ring->num_slots * sizeof(struct bgmac_dma_desc);
+               size = BGMAC_RX_RING_SLOTS * sizeof(struct bgmac_dma_desc);
                ring->cpu_base = dma_zalloc_coherent(dma_dev, size,
                                                     &ring->dma_base,
                                                     GFP_KERNEL);
@@ -645,15 +664,6 @@ static int bgmac_dma_alloc(struct bgmac *bgmac)
                        ring->index_base = lower_32_bits(ring->dma_base);
                else
                        ring->index_base = 0;
-
-               /* Alloc RX slots */
-               for (j = 0; j < ring->num_slots; j++) {
-                       err = bgmac_dma_rx_skb_for_slot(bgmac, &ring->slots[j]);
-                       if (err) {
-                               bgmac_err(bgmac, "Can't allocate skb for slot in RX ring\n");
-                               goto err_dma_free;
-                       }
-               }
        }
 
        return 0;
@@ -663,10 +673,10 @@ err_dma_free:
        return -ENOMEM;
 }
 
-static void bgmac_dma_init(struct bgmac *bgmac)
+static int bgmac_dma_init(struct bgmac *bgmac)
 {
        struct bgmac_dma_ring *ring;
-       int i;
+       int i, err;
 
        for (i = 0; i < BGMAC_MAX_TX_RINGS; i++) {
                ring = &bgmac->tx_ring[i];
@@ -698,16 +708,24 @@ static void bgmac_dma_init(struct bgmac *bgmac)
                if (ring->unaligned)
                        bgmac_dma_rx_enable(bgmac, ring);
 
-               for (j = 0; j < ring->num_slots; j++)
-                       bgmac_dma_rx_setup_desc(bgmac, ring, j);
-
-               bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_RX_INDEX,
-                           ring->index_base +
-                           ring->num_slots * sizeof(struct bgmac_dma_desc));
-
                ring->start = 0;
                ring->end = 0;
+               for (j = 0; j < BGMAC_RX_RING_SLOTS; j++) {
+                       err = bgmac_dma_rx_skb_for_slot(bgmac, &ring->slots[j]);
+                       if (err)
+                               goto error;
+
+                       bgmac_dma_rx_setup_desc(bgmac, ring, j);
+               }
+
+               bgmac_dma_rx_update_index(bgmac, ring);
        }
+
+       return 0;
+
+error:
+       bgmac_dma_cleanup(bgmac);
+       return err;
 }
 
 /**************************************************
@@ -1115,8 +1133,6 @@ static void bgmac_chip_reset(struct bgmac *bgmac)
        bgmac_phy_init(bgmac);
 
        netdev_reset_queue(bgmac->net_dev);
-
-       bgmac->int_status = 0;
 }
 
 static void bgmac_chip_intrs_on(struct bgmac *bgmac)
@@ -1185,11 +1201,8 @@ static void bgmac_enable(struct bgmac *bgmac)
 }
 
 /* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipinit */
-static void bgmac_chip_init(struct bgmac *bgmac, bool full_init)
+static void bgmac_chip_init(struct bgmac *bgmac)
 {
-       struct bgmac_dma_ring *ring;
-       int i;
-
        /* 1 interrupt per received frame */
        bgmac_write(bgmac, BGMAC_INT_RECV_LAZY, 1 << BGMAC_IRL_FC_SHIFT);
 
@@ -1207,16 +1220,7 @@ static void bgmac_chip_init(struct bgmac *bgmac, bool full_init)
 
        bgmac_write(bgmac, BGMAC_RXMAX_LENGTH, 32 + ETHER_MAX_LEN);
 
-       if (full_init) {
-               bgmac_dma_init(bgmac);
-               if (1) /* FIXME: is there any case we don't want IRQs? */
-                       bgmac_chip_intrs_on(bgmac);
-       } else {
-               for (i = 0; i < BGMAC_MAX_RX_RINGS; i++) {
-                       ring = &bgmac->rx_ring[i];
-                       bgmac_dma_rx_enable(bgmac, ring);
-               }
-       }
+       bgmac_chip_intrs_on(bgmac);
 
        bgmac_enable(bgmac);
 }
@@ -1231,14 +1235,13 @@ static irqreturn_t bgmac_interrupt(int irq, void *dev_id)
        if (!int_status)
                return IRQ_NONE;
 
-       /* Ack */
-       bgmac_write(bgmac, BGMAC_INT_STATUS, int_status);
+       int_status &= ~(BGMAC_IS_TX0 | BGMAC_IS_RX);
+       if (int_status)
+               bgmac_err(bgmac, "Unknown IRQs: 0x%08X\n", int_status);
 
        /* Disable new interrupts until handling existing ones */
        bgmac_chip_intrs_off(bgmac);
 
-       bgmac->int_status = int_status;
-
        napi_schedule(&bgmac->napi);
 
        return IRQ_HANDLED;
@@ -1247,25 +1250,17 @@ static irqreturn_t bgmac_interrupt(int irq, void *dev_id)
 static int bgmac_poll(struct napi_struct *napi, int weight)
 {
        struct bgmac *bgmac = container_of(napi, struct bgmac, napi);
-       struct bgmac_dma_ring *ring;
        int handled = 0;
 
-       if (bgmac->int_status & BGMAC_IS_TX0) {
-               ring = &bgmac->tx_ring[0];
-               bgmac_dma_tx_free(bgmac, ring);
-               bgmac->int_status &= ~BGMAC_IS_TX0;
-       }
+       /* Ack */
+       bgmac_write(bgmac, BGMAC_INT_STATUS, ~0);
 
-       if (bgmac->int_status & BGMAC_IS_RX) {
-               ring = &bgmac->rx_ring[0];
-               handled += bgmac_dma_rx_read(bgmac, ring, weight);
-               bgmac->int_status &= ~BGMAC_IS_RX;
-       }
+       bgmac_dma_tx_free(bgmac, &bgmac->tx_ring[0]);
+       handled += bgmac_dma_rx_read(bgmac, &bgmac->rx_ring[0], weight);
 
-       if (bgmac->int_status) {
-               bgmac_err(bgmac, "Unknown IRQs: 0x%08X\n", bgmac->int_status);
-               bgmac->int_status = 0;
-       }
+       /* Poll again if more events arrived in the meantime */
+       if (bgmac_read(bgmac, BGMAC_INT_STATUS) & (BGMAC_IS_TX0 | BGMAC_IS_RX))
+               return handled;
 
        if (handled < weight) {
                napi_complete(napi);
@@ -1285,23 +1280,27 @@ static int bgmac_open(struct net_device *net_dev)
        int err = 0;
 
        bgmac_chip_reset(bgmac);
+
+       err = bgmac_dma_init(bgmac);
+       if (err)
+               return err;
+
        /* Specs say about reclaiming rings here, but we do that in DMA init */
-       bgmac_chip_init(bgmac, true);
+       bgmac_chip_init(bgmac);
 
        err = request_irq(bgmac->core->irq, bgmac_interrupt, IRQF_SHARED,
                          KBUILD_MODNAME, net_dev);
        if (err < 0) {
                bgmac_err(bgmac, "IRQ request error: %d!\n", err);
-               goto err_out;
+               bgmac_dma_cleanup(bgmac);
+               return err;
        }
        napi_enable(&bgmac->napi);
 
        phy_start(bgmac->phy_dev);
 
        netif_carrier_on(net_dev);
-
-err_out:
-       return err;
+       return 0;
 }
 
 static int bgmac_stop(struct net_device *net_dev)
@@ -1317,6 +1316,7 @@ static int bgmac_stop(struct net_device *net_dev)
        free_irq(bgmac->core->irq, net_dev);
 
        bgmac_chip_reset(bgmac);
+       bgmac_dma_cleanup(bgmac);
 
        return 0;
 }
index 3ad965fe7fcc8f7e998eafcbe418aa65f171b760..db27febbb215cdc68ecfa2e3c0cda54c6a4fbf33 100644 (file)
 #define BGMAC_MAX_RX_RINGS                     1
 
 #define BGMAC_TX_RING_SLOTS                    128
-#define BGMAC_RX_RING_SLOTS                    512 - 1         /* Why -1? Well, Broadcom does that... */
+#define BGMAC_RX_RING_SLOTS                    512
 
 #define BGMAC_RX_HEADER_LEN                    28              /* Last 24 bytes are unused. Well... */
 #define BGMAC_RX_FRAME_OFFSET                  30              /* There are 2 unused bytes between header and real data */
+#define BGMAC_RX_BUF_OFFSET                    (NET_SKB_PAD + NET_IP_ALIGN - \
+                                                BGMAC_RX_FRAME_OFFSET)
 #define BGMAC_RX_MAX_FRAME_SIZE                        1536            /* Copied from b44/tg3 */
 #define BGMAC_RX_BUF_SIZE                      (BGMAC_RX_FRAME_OFFSET + BGMAC_RX_MAX_FRAME_SIZE)
-#define BGMAC_RX_ALLOC_SIZE                    (SKB_DATA_ALIGN(BGMAC_RX_BUF_SIZE) + \
+#define BGMAC_RX_ALLOC_SIZE                    (SKB_DATA_ALIGN(BGMAC_RX_BUF_SIZE + BGMAC_RX_BUF_OFFSET) + \
                                                 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define BGMAC_BFL_ENETROBO                     0x0010          /* has ephy roboswitch spi */
@@ -414,14 +416,13 @@ enum bgmac_dma_ring_type {
  * empty.
  */
 struct bgmac_dma_ring {
-       u16 num_slots;
-       u16 start;
-       u16 end;
+       u32 start;
+       u32 end;
 
-       u16 mmio_base;
        struct bgmac_dma_desc *cpu_base;
        dma_addr_t dma_base;
        u32 index_base; /* Used for unaligned rings only, otherwise 0 */
+       u16 mmio_base;
        bool unaligned;
 
        struct bgmac_slot_info slots[BGMAC_RX_RING_SLOTS];
@@ -452,7 +453,6 @@ struct bgmac {
 
        /* Int */
        u32 int_mask;
-       u32 int_status;
 
        /* Current MAC state */
        int mac_speed;
index 24e10ea3d5efa64c5280348acd2eefa0f04205cc..6de0544041568fa002db03eda7fb5fe7f379442d 100644 (file)
@@ -724,7 +724,8 @@ static irqreturn_t t4_nondata_intr(int irq, void *cookie)
                adap->swintr = 1;
                t4_write_reg(adap, MYPF_REG(PL_PF_INT_CAUSE_A), v);
        }
-       t4_slow_intr_handler(adap);
+       if (adap->flags & MASTER_PF)
+               t4_slow_intr_handler(adap);
        return IRQ_HANDLED;
 }
 
index e622214e2eca03266235de22adbd5412b0b7f4d3..0d2eddab04efbf7b2a0e1054ea46848273c97933 100644 (file)
  */
 #define NOMEM_TMR_IDX (SGE_NTIMERS - 1)
 
-/*
- * An FL with <= FL_STARVE_THRES buffers is starving and a periodic timer will
- * attempt to refill it.
- */
-#define FL_STARVE_THRES 4
-
 /*
  * Suspend an Ethernet Tx queue with fewer available descriptors than this.
  * This is the same as calc_tx_descs() for a TSO packet with
  * Max Tx descriptor space we allow for an Ethernet packet to be inlined
  * into a WR.
  */
-#define MAX_IMM_TX_PKT_LEN 128
+#define MAX_IMM_TX_PKT_LEN 256
 
 /*
  * Max size of a WR sent through a control Tx queue.
@@ -248,9 +242,21 @@ static inline unsigned int fl_cap(const struct sge_fl *fl)
        return fl->size - 8;   /* 1 descriptor = 8 buffers */
 }
 
-static inline bool fl_starving(const struct sge_fl *fl)
+/**
+ *     fl_starving - return whether a Free List is starving.
+ *     @adapter: pointer to the adapter
+ *     @fl: the Free List
+ *
+ *     Tests specified Free List to see whether the number of buffers
+ *     available to the hardware has falled below our "starvation"
+ *     threshold.
+ */
+static inline bool fl_starving(const struct adapter *adapter,
+                              const struct sge_fl *fl)
 {
-       return fl->avail - fl->pend_cred <= FL_STARVE_THRES;
+       const struct sge *s = &adapter->sge;
+
+       return fl->avail - fl->pend_cred <= s->fl_starve_thres;
 }
 
 static int map_skb(struct device *dev, const struct sk_buff *skb,
@@ -586,8 +592,10 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
        unsigned int cred = q->avail;
        __be64 *d = &q->desc[q->pidx];
        struct rx_sw_desc *sd = &q->sdesc[q->pidx];
+       int node;
 
        gfp |= __GFP_NOWARN;
+       node = dev_to_node(adap->pdev_dev);
 
        if (s->fl_pg_order == 0)
                goto alloc_small_pages;
@@ -596,7 +604,7 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
         * Prefer large buffers
         */
        while (n) {
-               pg = __dev_alloc_pages(gfp, s->fl_pg_order);
+               pg = alloc_pages_node(node, gfp | __GFP_COMP, s->fl_pg_order);
                if (unlikely(!pg)) {
                        q->large_alloc_failed++;
                        break;       /* fall back to single pages */
@@ -626,7 +634,7 @@ static unsigned int refill_fl(struct adapter *adap, struct sge_fl *q, int n,
 
 alloc_small_pages:
        while (n--) {
-               pg = __dev_alloc_page(gfp);
+               pg = alloc_pages_node(node, gfp, 0);
                if (unlikely(!pg)) {
                        q->alloc_failed++;
                        break;
@@ -655,7 +663,7 @@ out:        cred = q->avail - cred;
        q->pend_cred += cred;
        ring_fl_db(adap, q);
 
-       if (unlikely(fl_starving(q))) {
+       if (unlikely(fl_starving(adap, q))) {
                smp_wmb();
                set_bit(q->cntxt_id - adap->sge.egr_start,
                        adap->sge.starving_fl);
@@ -722,6 +730,22 @@ static void *alloc_ring(struct device *dev, size_t nelem, size_t elem_size,
  */
 static inline unsigned int sgl_len(unsigned int n)
 {
+       /* A Direct Scatter Gather List uses 32-bit lengths and 64-bit PCI DMA
+        * addresses.  The DSGL Work Request starts off with a 32-bit DSGL
+        * ULPTX header, then Length0, then Address0, then, for 1 <= i <= N,
+        * repeated sequences of { Length[i], Length[i+1], Address[i],
+        * Address[i+1] } (this ensures that all addresses are on 64-bit
+        * boundaries).  If N is even, then Length[N+1] should be set to 0 and
+        * Address[N+1] is omitted.
+        *
+        * The following calculation incorporates all of the above.  It's
+        * somewhat hard to follow but, briefly: the "+2" accounts for the
+        * first two flits which include the DSGL header, Length0 and
+        * Address0; the "(3*(n-1))/2" covers the main body of list entries (3
+        * flits for every pair of the remaining N) +1 if (n-1) is odd; and
+        * finally the "+((n-1)&1)" adds the one remaining flit needed if
+        * (n-1) is odd ...
+        */
        n--;
        return (3 * n) / 2 + (n & 1) + 2;
 }
@@ -769,12 +793,30 @@ static inline unsigned int calc_tx_flits(const struct sk_buff *skb)
        unsigned int flits;
        int hdrlen = is_eth_imm(skb);
 
+       /* If the skb is small enough, we can pump it out as a work request
+        * with only immediate data.  In that case we just have to have the
+        * TX Packet header plus the skb data in the Work Request.
+        */
+
        if (hdrlen)
                return DIV_ROUND_UP(skb->len + hdrlen, sizeof(__be64));
 
+       /* Otherwise, we're going to have to construct a Scatter gather list
+        * of the skb body and fragments.  We also include the flits necessary
+        * for the TX Packet Work Request and CPL.  We always have a firmware
+        * Write Header (incorporated as part of the cpl_tx_pkt_lso and
+        * cpl_tx_pkt structures), followed by either a TX Packet Write CPL
+        * message or, if we're doing a Large Send Offload, an LSO CPL message
+        * with an embedded TX Packet Write CPL message.
+        */
        flits = sgl_len(skb_shinfo(skb)->nr_frags + 1) + 4;
        if (skb_shinfo(skb)->gso_size)
-               flits += 2;
+               flits += (sizeof(struct fw_eth_tx_pkt_wr) +
+                         sizeof(struct cpl_tx_pkt_lso_core) +
+                         sizeof(struct cpl_tx_pkt_core)) / sizeof(__be64);
+       else
+               flits += (sizeof(struct fw_eth_tx_pkt_wr) +
+                         sizeof(struct cpl_tx_pkt_core)) / sizeof(__be64);
        return flits;
 }
 
@@ -2196,7 +2238,8 @@ static irqreturn_t t4_intr_msi(int irq, void *cookie)
 {
        struct adapter *adap = cookie;
 
-       t4_slow_intr_handler(adap);
+       if (adap->flags & MASTER_PF)
+               t4_slow_intr_handler(adap);
        process_intrq(adap);
        return IRQ_HANDLED;
 }
@@ -2211,7 +2254,8 @@ static irqreturn_t t4_intr_intx(int irq, void *cookie)
        struct adapter *adap = cookie;
 
        t4_write_reg(adap, MYPF_REG(PCIE_PF_CLI_A), 0);
-       if (t4_slow_intr_handler(adap) | process_intrq(adap))
+       if (((adap->flags & MASTER_PF) && t4_slow_intr_handler(adap)) |
+           process_intrq(adap))
                return IRQ_HANDLED;
        return IRQ_NONE;             /* probably shared interrupt */
 }
@@ -2248,7 +2292,7 @@ static void sge_rx_timer_cb(unsigned long data)
                        clear_bit(id, s->starving_fl);
                        smp_mb__after_atomic();
 
-                       if (fl_starving(fl)) {
+                       if (fl_starving(adap, fl)) {
                                rxq = container_of(fl, struct sge_eth_rxq, fl);
                                if (napi_reschedule(&rxq->rspq.napi))
                                        fl->starving++;
index 4b0494b9cc7cf034e8ebdc190d08e46a8a1e790e..1bf1cdce74ac3591d4a2011e6be9399c4a5cdf57 100644 (file)
@@ -99,6 +99,7 @@
 #define BE_NAPI_WEIGHT         64
 #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
 #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
+#define MAX_NUM_POST_ERX_DB    255u
 
 #define MAX_VFS                        30 /* Max VFs supported by BE3 FW */
 #define FW_VER_LEN             32
index 5ff7fba9b67c9d39043d1094193db714f7625a6b..fb0bc3c3620e9cf87983b1c425e0f24d431bffc9 100644 (file)
@@ -2122,7 +2122,7 @@ static void be_post_rx_frags(struct be_rx_obj *rxo, gfp_t gfp, u32 frags_needed)
                if (rxo->rx_post_starved)
                        rxo->rx_post_starved = false;
                do {
-                       notify = min(256u, posted);
+                       notify = min(MAX_NUM_POST_ERX_DB, posted);
                        be_rxq_notify(adapter, rxq->id, notify);
                        posted -= notify;
                } while (posted);
index a69f09e37b5893a02b2e0d521806769ea6f951c5..5d9ceb17b4cbad4f7e89cf0bb050e915f5b8d285 100644 (file)
@@ -343,6 +343,7 @@ struct e1000_adapter {
        struct timecounter tc;
        struct ptp_clock *ptp_clock;
        struct ptp_clock_info ptp_clock_info;
+       struct pm_qos_request pm_qos_req;
 
        u16 eee_advert;
 };
index 74ec185a697facce09174abda046db2b08495203..c509a5c900f5253973b24c9f966cb95f2fe1a2bc 100644 (file)
@@ -3297,9 +3297,9 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
                        ew32(RXDCTL(0), rxdctl | 0x3);
                }
 
-               pm_qos_update_request(&adapter->netdev->pm_qos_req, lat);
+               pm_qos_update_request(&adapter->pm_qos_req, lat);
        } else {
-               pm_qos_update_request(&adapter->netdev->pm_qos_req,
+               pm_qos_update_request(&adapter->pm_qos_req,
                                      PM_QOS_DEFAULT_VALUE);
        }
 
@@ -4403,7 +4403,7 @@ static int e1000_open(struct net_device *netdev)
                e1000_update_mng_vlan(adapter);
 
        /* DMA latency requirement to workaround jumbo issue */
-       pm_qos_add_request(&adapter->netdev->pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
+       pm_qos_add_request(&adapter->pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
                           PM_QOS_DEFAULT_VALUE);
 
        /* before we allocate an interrupt, we must be ready to handle it.
@@ -4514,7 +4514,7 @@ static int e1000_close(struct net_device *netdev)
            !test_bit(__E1000_TESTING, &adapter->state))
                e1000e_release_hw_control(adapter);
 
-       pm_qos_remove_request(&adapter->netdev->pm_qos_req);
+       pm_qos_remove_request(&adapter->pm_qos_req);
 
        pm_runtime_put_sync(&pdev->dev);
 
index 74acb5cf6099b262fa03549f7c15d2da533ad22d..5d244b6b5e3a08b48c5af68fbd0c4f65058f6fd1 100644 (file)
@@ -5,7 +5,7 @@
 config NET_VENDOR_TOSHIBA
        bool "Toshiba devices"
        default y
-       depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB || MIPS) || PPC_PS3
+       depends on PCI && (PPC_IBM_CELL_BLADE || MIPS) || PPC_PS3
        ---help---
          If you have a network (Ethernet) card belonging to this class, say Y
          and read the Ethernet-HOWTO, available from
@@ -42,7 +42,7 @@ config GELIC_WIRELESS
 
 config SPIDER_NET
        tristate "Spider Gigabit Ethernet driver"
-       depends on PCI && (PPC_IBM_CELL_BLADE || PPC_CELLEB)
+       depends on PCI && PPC_IBM_CELL_BLADE
        select FW_LOADER
        select SUNGEM_PHY
        ---help---
index f0b8b3e0ed7cdf8387d010b4967ca9b172e2c011..a10b31664709f51215435d94a9439c65d311221b 100644 (file)
@@ -132,6 +132,8 @@ struct hv_netvsc_packet {
 
        bool is_data_pkt;
        bool xmit_more; /* from skb */
+       bool cp_partial; /* partial copy into send buffer */
+
        u16 vlan_tci;
 
        u16 q_idx;
@@ -146,6 +148,9 @@ struct hv_netvsc_packet {
        /* This points to the memory after page_buf */
        struct rndis_message *rndis_msg;
 
+       u32 rmsg_size; /* RNDIS header and PPI size */
+       u32 rmsg_pgcnt; /* page count of RNDIS header and PPI */
+
        u32 total_data_buflen;
        /* Points to the send/receive buffer where the ethernet frame is */
        void *data;
index 4d4d497d5762896d037f7c8e447451f24dac824e..2e8ad0636b466668e8939e4eabe442160e6c2402 100644 (file)
@@ -703,15 +703,18 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
        u32 msg_size = 0;
        u32 padding = 0;
        u32 remain = packet->total_data_buflen % net_device->pkt_align;
+       u32 page_count = packet->cp_partial ? packet->rmsg_pgcnt :
+               packet->page_buf_cnt;
 
        /* Add padding */
-       if (packet->is_data_pkt && packet->xmit_more && remain) {
+       if (packet->is_data_pkt && packet->xmit_more && remain &&
+           !packet->cp_partial) {
                padding = net_device->pkt_align - remain;
                packet->rndis_msg->msg_len += padding;
                packet->total_data_buflen += padding;
        }
 
-       for (i = 0; i < packet->page_buf_cnt; i++) {
+       for (i = 0; i < page_count; i++) {
                char *src = phys_to_virt(packet->page_buf[i].pfn << PAGE_SHIFT);
                u32 offset = packet->page_buf[i].offset;
                u32 len = packet->page_buf[i].len;
@@ -739,6 +742,7 @@ static inline int netvsc_send_pkt(
        struct net_device *ndev = net_device->ndev;
        u64 req_id;
        int ret;
+       struct hv_page_buffer *pgbuf;
 
        nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
        if (packet->is_data_pkt) {
@@ -766,8 +770,10 @@ static inline int netvsc_send_pkt(
                return -ENODEV;
 
        if (packet->page_buf_cnt) {
+               pgbuf = packet->cp_partial ? packet->page_buf +
+                       packet->rmsg_pgcnt : packet->page_buf;
                ret = vmbus_sendpacket_pagebuffer(out_channel,
-                                                 packet->page_buf,
+                                                 pgbuf,
                                                  packet->page_buf_cnt,
                                                  &nvmsg,
                                                  sizeof(struct nvsp_message),
@@ -824,6 +830,7 @@ int netvsc_send(struct hv_device *device,
        unsigned long flag;
        struct multi_send_data *msdp;
        struct hv_netvsc_packet *msd_send = NULL, *cur_send = NULL;
+       bool try_batch;
 
        net_device = get_outbound_net_device(device);
        if (!net_device)
@@ -837,6 +844,7 @@ int netvsc_send(struct hv_device *device,
        }
        packet->channel = out_channel;
        packet->send_buf_index = NETVSC_INVALID_INDEX;
+       packet->cp_partial = false;
 
        msdp = &net_device->msd[q_idx];
 
@@ -845,12 +853,18 @@ int netvsc_send(struct hv_device *device,
        if (msdp->pkt)
                msd_len = msdp->pkt->total_data_buflen;
 
-       if (packet->is_data_pkt && msd_len > 0 &&
-           msdp->count < net_device->max_pkt &&
-           msd_len + pktlen + net_device->pkt_align <
+       try_batch = packet->is_data_pkt && msd_len > 0 && msdp->count <
+                   net_device->max_pkt;
+
+       if (try_batch && msd_len + pktlen + net_device->pkt_align <
            net_device->send_section_size) {
                section_index = msdp->pkt->send_buf_index;
 
+       } else if (try_batch && msd_len + packet->rmsg_size <
+                  net_device->send_section_size) {
+               section_index = msdp->pkt->send_buf_index;
+               packet->cp_partial = true;
+
        } else if (packet->is_data_pkt && pktlen + net_device->pkt_align <
                   net_device->send_section_size) {
                section_index = netvsc_get_next_send_section(net_device);
@@ -866,22 +880,26 @@ int netvsc_send(struct hv_device *device,
                netvsc_copy_to_send_buf(net_device,
                                        section_index, msd_len,
                                        packet);
-               if (!packet->part_of_skb) {
-                       skb = (struct sk_buff *)
-                               (unsigned long)
-                               packet->send_completion_tid;
-
-                       packet->send_completion_tid = 0;
-               }
 
-               packet->page_buf_cnt = 0;
                packet->send_buf_index = section_index;
-               packet->total_data_buflen += msd_len;
+
+               if (packet->cp_partial) {
+                       packet->page_buf_cnt -= packet->rmsg_pgcnt;
+                       packet->total_data_buflen = msd_len + packet->rmsg_size;
+               } else {
+                       packet->page_buf_cnt = 0;
+                       packet->total_data_buflen += msd_len;
+                       if (!packet->part_of_skb) {
+                               skb = (struct sk_buff *)(unsigned long)packet->
+                                      send_completion_tid;
+                               packet->send_completion_tid = 0;
+                       }
+               }
 
                if (msdp->pkt)
                        netvsc_xmit_completion(msdp->pkt);
 
-               if (packet->xmit_more) {
+               if (packet->xmit_more && !packet->cp_partial) {
                        msdp->pkt = packet;
                        msdp->count++;
                } else {
index 448716787e73c237f5de26f072ce03a28ab4986e..a3a9d3898a6e8a80ddb21cb11864c09006e47554 100644 (file)
@@ -277,15 +277,16 @@ static u32 fill_pg_buf(struct page *page, u32 offset, u32 len,
 }
 
 static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
-                          struct hv_page_buffer *pb)
+                          struct hv_netvsc_packet *packet)
 {
+       struct hv_page_buffer *pb = packet->page_buf;
        u32 slots_used = 0;
        char *data = skb->data;
        int frags = skb_shinfo(skb)->nr_frags;
        int i;
 
        /* The packet is laid out thus:
-        * 1. hdr
+        * 1. hdr: RNDIS header and PPI
         * 2. skb linear data
         * 3. skb fragment data
         */
@@ -294,6 +295,9 @@ static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
                                        offset_in_page(hdr),
                                        len, &pb[slots_used]);
 
+       packet->rmsg_size = len;
+       packet->rmsg_pgcnt = slots_used;
+
        slots_used += fill_pg_buf(virt_to_page(data),
                                offset_in_page(data),
                                skb_headlen(skb), &pb[slots_used]);
@@ -578,7 +582,7 @@ do_send:
        rndis_msg->msg_len += rndis_msg_size;
        packet->total_data_buflen = rndis_msg->msg_len;
        packet->page_buf_cnt = init_page_array(rndis_msg, rndis_msg_size,
-                                       skb, &page_buf[0]);
+                                              skb, packet);
 
        ret = netvsc_send(net_device_ctx->device_ctx, packet);
 
index 16adbc481772babbcb6113759fc47ad07e9dda20..8fadaa14b9f0fbd97b689d7bab562eccd30d17bf 100644 (file)
@@ -68,8 +68,8 @@ config SMSC_PHY
 config BROADCOM_PHY
        tristate "Drivers for Broadcom PHYs"
        ---help---
-         Currently supports the BCM5411, BCM5421, BCM5461, BCM5464, BCM5481
-         and BCM5482 PHYs.
+         Currently supports the BCM5411, BCM5421, BCM5461, BCM54616S, BCM5464,
+         BCM5481 and BCM5482 PHYs.
 
 config BCM63XX_PHY
        tristate "Drivers for Broadcom 63xx SOCs internal PHY"
index a52afb26421b8e975f3742d6ab77a2eddc133d8c..9c71295f2fefb5789693370c649c7ca9ceb34d92 100644 (file)
@@ -548,6 +548,19 @@ static struct phy_driver broadcom_drivers[] = {
        .ack_interrupt  = bcm54xx_ack_interrupt,
        .config_intr    = bcm54xx_config_intr,
        .driver         = { .owner = THIS_MODULE },
+}, {
+       .phy_id         = PHY_ID_BCM54616S,
+       .phy_id_mask    = 0xfffffff0,
+       .name           = "Broadcom BCM54616S",
+       .features       = PHY_GBIT_FEATURES |
+                         SUPPORTED_Pause | SUPPORTED_Asym_Pause,
+       .flags          = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
+       .config_init    = bcm54xx_config_init,
+       .config_aneg    = genphy_config_aneg,
+       .read_status    = genphy_read_status,
+       .ack_interrupt  = bcm54xx_ack_interrupt,
+       .config_intr    = bcm54xx_config_intr,
+       .driver         = { .owner = THIS_MODULE },
 }, {
        .phy_id         = PHY_ID_BCM5464,
        .phy_id_mask    = 0xfffffff0,
@@ -660,6 +673,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] = {
        { PHY_ID_BCM5411, 0xfffffff0 },
        { PHY_ID_BCM5421, 0xfffffff0 },
        { PHY_ID_BCM5461, 0xfffffff0 },
+       { PHY_ID_BCM54616S, 0xfffffff0 },
        { PHY_ID_BCM5464, 0xfffffff0 },
        { PHY_ID_BCM5482, 0xfffffff0 },
        { PHY_ID_BCM5482, 0xfffffff0 },
index 777757ae19732ab10ab2283645464a87a02c7b20..733f4feb2ef3c5f11bbf99af962ecbb77253314b 100644 (file)
@@ -1072,7 +1072,7 @@ static void __handle_set_rx_mode(struct usbnet *dev)
  * especially now that control transfers can be queued.
  */
 static void
-kevent (struct work_struct *work)
+usbnet_deferred_kevent (struct work_struct *work)
 {
        struct usbnet           *dev =
                container_of(work, struct usbnet, kevent);
@@ -1626,7 +1626,7 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
        skb_queue_head_init(&dev->rxq_pause);
        dev->bh.func = usbnet_bh;
        dev->bh.data = (unsigned long) dev;
-       INIT_WORK (&dev->kevent, kevent);
+       INIT_WORK (&dev->kevent, usbnet_deferred_kevent);
        init_usb_anchor(&dev->deferred);
        dev->delay.function = usbnet_bh;
        dev->delay.data = (unsigned long) dev;
index 577c9b071ad9e8568d955a39ce00eb185e52e186..154116aafd0d8c5cb6caab9056a2245cbc3c783b 100644 (file)
@@ -1699,12 +1699,6 @@ static int vxlan6_xmit_skb(struct dst_entry *dst, struct sock *sk,
                }
        }
 
-       skb = iptunnel_handle_offloads(skb, udp_sum, type);
-       if (IS_ERR(skb)) {
-               err = -EINVAL;
-               goto err;
-       }
-
        skb_scrub_packet(skb, xnet);
 
        min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
@@ -1724,6 +1718,12 @@ static int vxlan6_xmit_skb(struct dst_entry *dst, struct sock *sk,
                goto err;
        }
 
+       skb = iptunnel_handle_offloads(skb, udp_sum, type);
+       if (IS_ERR(skb)) {
+               err = -EINVAL;
+               goto err;
+       }
+
        vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
        vxh->vx_flags = htonl(VXLAN_HF_VNI);
        vxh->vx_vni = md->vni;
@@ -1784,10 +1784,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
                }
        }
 
-       skb = iptunnel_handle_offloads(skb, udp_sum, type);
-       if (IS_ERR(skb))
-               return PTR_ERR(skb);
-
        min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
                        + VXLAN_HLEN + sizeof(struct iphdr)
                        + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
@@ -1803,6 +1799,10 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
        if (WARN_ON(!skb))
                return -ENOMEM;
 
+       skb = iptunnel_handle_offloads(skb, udp_sum, type);
+       if (IS_ERR(skb))
+               return PTR_ERR(skb);
+
        vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
        vxh->vx_flags = htonl(VXLAN_HF_VNI);
        vxh->vx_vni = md->vni;
index 2270bd51f9c2c240c669e562eb77052f89425a83..d383f84869aa45475bb25f149123092a0f3beb0c 100644 (file)
@@ -33,7 +33,6 @@ static int sg_version_num = 30536;    /* 2 digits for each component */
 #include <linux/sched.h>
 #include <linux/string.h>
 #include <linux/mm.h>
-#include <linux/aio.h>
 #include <linux/errno.h>
 #include <linux/mtio.h>
 #include <linux/ioctl.h>
@@ -51,6 +50,7 @@ static int sg_version_num = 30536;    /* 2 digits for each component */
 #include <linux/mutex.h>
 #include <linux/atomic.h>
 #include <linux/ratelimit.h>
+#include <linux/uio.h>
 
 #include "scsi.h"
 #include <scsi/scsi_dbg.h>
index 4019a0d63645fe53380caec301679a68f9979e87..52648d4d99220cc6916cef894299dd4102248097 100644 (file)
@@ -46,7 +46,6 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/fcntl.h>
-#include <linux/aio.h>
 #include <linux/workqueue.h>
 #include <linux/kthread.h>
 #include <linux/seq_file.h>
index 175c9956cbe3a36949526029103d38b4c97225c3..a12315a78248d4a40349586062e043b8574d1308 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/export.h>
 #include <linux/hid.h>
 #include <linux/module.h>
+#include <linux/uio.h>
 #include <asm/unaligned.h>
 
 #include <linux/usb/composite.h>
@@ -655,9 +656,10 @@ static void ffs_user_copy_worker(struct work_struct *work)
                unuse_mm(io_data->mm);
        }
 
-       aio_complete(io_data->kiocb, ret, ret);
+       io_data->kiocb->ki_complete(io_data->kiocb, ret, ret);
 
-       if (io_data->ffs->ffs_eventfd && !io_data->kiocb->ki_eventfd)
+       if (io_data->ffs->ffs_eventfd &&
+           !(io_data->kiocb->ki_flags & IOCB_EVENTFD))
                eventfd_signal(io_data->ffs->ffs_eventfd, 1);
 
        usb_ep_free_request(io_data->ep, io_data->req);
index 200f9a584064fd9199ba99ff75a2e26a33c788f7..662ef2c1c62b67d0340cbf8593d515f14f2568dc 100644 (file)
@@ -26,6 +26,7 @@
 #include <linux/poll.h>
 #include <linux/mmu_context.h>
 #include <linux/aio.h>
+#include <linux/uio.h>
 
 #include <linux/device.h>
 #include <linux/moduleparam.h>
@@ -469,7 +470,7 @@ static void ep_user_copy_worker(struct work_struct *work)
                ret = -EFAULT;
 
        /* completing the iocb can drop the ctx and mm, don't touch mm after */
-       aio_complete(iocb, ret, ret);
+       iocb->ki_complete(iocb, ret, ret);
 
        kfree(priv->buf);
        kfree(priv->to_free);
@@ -497,7 +498,8 @@ static void ep_aio_complete(struct usb_ep *ep, struct usb_request *req)
                kfree(priv);
                iocb->private = NULL;
                /* aio_complete() reports bytes-transferred _and_ faults */
-               aio_complete(iocb, req->actual ? req->actual : req->status,
+
+               iocb->ki_complete(iocb, req->actual ? req->actual : req->status,
                                req->status);
        } else {
                /* ep_copy_to_user() won't report both; we hide some faults */
index 18f05bff8826672a17a645d8fddc2a7df489e302..7d137a43cc86842ed98bc27e63d6dfcb2042f177 100644 (file)
@@ -357,13 +357,13 @@ static void handle_tx(struct vhost_net *net)
                iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
                iov_iter_advance(&msg.msg_iter, hdr_size);
                /* Sanity check */
-               if (!iov_iter_count(&msg.msg_iter)) {
+               if (!msg_data_left(&msg)) {
                        vq_err(vq, "Unexpected header len for TX: "
                               "%zd expected %zd\n",
                               len, hdr_size);
                        break;
                }
-               len = iov_iter_count(&msg.msg_iter);
+               len = msg_data_left(&msg);
 
                zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
                                   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
index eb14e055ea83e8509e7ea6ae569e3c1966d3b896..ff1a5bac420098d528a46c6cccecfa1caa73c421 100644 (file)
@@ -33,7 +33,7 @@
 #include <linux/pagemap.h>
 #include <linux/idr.h>
 #include <linux/sched.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include <net/9p/9p.h>
 #include <net/9p/client.h>
 
index a91795e01a7ff0c0e85abf1bdf69f3d1d828b231..3aa7eb66547ea31b7556c1502d3900dce0aa80dd 100644 (file)
@@ -12,7 +12,7 @@
  *  affs regular file handling primitives
  */
 
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include "affs.h"
 
 static struct buffer_head *affs_get_extblock_slow(struct inode *inode, u32 ext);
index 0dd4dafee10b391f7d92b89f8c4e491ed5f09810..91ea1aa0d8b3ab0a817b525e9f9b3deec98f775f 100644 (file)
 int afs_abort_to_error(u32 abort_code)
 {
        switch (abort_code) {
+       /* low errno codes inserted into abort namespace */
        case 13:                return -EACCES;
        case 27:                return -EFBIG;
        case 30:                return -EROFS;
+
+       /* VICE "special error" codes; 101 - 111 */
        case VSALVAGE:          return -EIO;
        case VNOVNODE:          return -ENOENT;
        case VNOVOL:            return -ENOMEDIUM;
@@ -36,11 +39,18 @@ int afs_abort_to_error(u32 abort_code)
        case VOVERQUOTA:        return -EDQUOT;
        case VBUSY:             return -EBUSY;
        case VMOVED:            return -ENXIO;
-       case 0x2f6df0a:         return -EWOULDBLOCK;
+
+       /* Unified AFS error table; ET "uae" == 0x2f6df00 */
+       case 0x2f6df00:         return -EPERM;
+       case 0x2f6df01:         return -ENOENT;
+       case 0x2f6df04:         return -EIO;
+       case 0x2f6df0a:         return -EAGAIN;
+       case 0x2f6df0b:         return -ENOMEM;
        case 0x2f6df0c:         return -EACCES;
        case 0x2f6df0f:         return -EBUSY;
        case 0x2f6df10:         return -EEXIST;
        case 0x2f6df11:         return -EXDEV;
+       case 0x2f6df12:         return -ENODEV;
        case 0x2f6df13:         return -ENOTDIR;
        case 0x2f6df14:         return -EISDIR;
        case 0x2f6df15:         return -EINVAL;
@@ -54,8 +64,12 @@ int afs_abort_to_error(u32 abort_code)
        case 0x2f6df23:         return -ENAMETOOLONG;
        case 0x2f6df24:         return -ENOLCK;
        case 0x2f6df26:         return -ENOTEMPTY;
+       case 0x2f6df28:         return -EWOULDBLOCK;
+       case 0x2f6df69:         return -ENOTCONN;
+       case 0x2f6df6c:         return -ETIMEDOUT;
        case 0x2f6df78:         return -EDQUOT;
 
+       /* RXKAD abort codes; from include/rxrpc/packet.h.  ET "RXK" == 0x1260B00 */
        case RXKADINCONSISTENCY: return -EPROTO;
        case RXKADPACKETSHORT:  return -EPROTO;
        case RXKADLEVELFAIL:    return -EKEYREJECTED;
index dbc732e9a5c01eb18ab91af910a997881dfe5fd8..3a57a1b0fb510b8c8167835ca62eb06d0c4b53ca 100644 (file)
@@ -770,15 +770,12 @@ static int afs_deliver_cm_op_id(struct afs_call *call, struct sk_buff *skb,
 void afs_send_empty_reply(struct afs_call *call)
 {
        struct msghdr msg;
-       struct kvec iov[1];
 
        _enter("");
 
-       iov[0].iov_base         = NULL;
-       iov[0].iov_len          = 0;
        msg.msg_name            = NULL;
        msg.msg_namelen         = 0;
-       iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, iov, 0, 0);     /* WTF? */
+       iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, NULL, 0, 0);
        msg.msg_control         = NULL;
        msg.msg_controllen      = 0;
        msg.msg_flags           = 0;
index c13cb08964eda91afe26754733054147e220ecb7..0714abcd7f32321754287e46aec129196832e2ef 100644 (file)
@@ -14,7 +14,6 @@
 #include <linux/pagemap.h>
 #include <linux/writeback.h>
 #include <linux/pagevec.h>
-#include <linux/aio.h>
 #include "internal.h"
 
 static int afs_write_back_from_locked_page(struct afs_writeback *wb,
index f8e52a1854c1ab383e32383ac65a0f167e385793..435ca29eca31431649b4df125e933488b8d4073e 100644 (file)
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -151,6 +151,38 @@ struct kioctx {
        unsigned                id;
 };
 
+/*
+ * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
+ * cancelled or completed (this makes a certain amount of sense because
+ * successful cancellation - io_cancel() - does deliver the completion to
+ * userspace).
+ *
+ * And since most things don't implement kiocb cancellation and we'd really like
+ * kiocb completion to be lockless when possible, we use ki_cancel to
+ * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
+ * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
+ */
+#define KIOCB_CANCELLED                ((void *) (~0ULL))
+
+struct aio_kiocb {
+       struct kiocb            common;
+
+       struct kioctx           *ki_ctx;
+       kiocb_cancel_fn         *ki_cancel;
+
+       struct iocb __user      *ki_user_iocb;  /* user's aiocb */
+       __u64                   ki_user_data;   /* user's data for completion */
+
+       struct list_head        ki_list;        /* the aio core uses this
+                                                * for cancellation */
+
+       /*
+        * If the aio_resfd field of the userspace iocb is not zero,
+        * this is the underlying eventfd context to deliver events to.
+        */
+       struct eventfd_ctx      *ki_eventfd;
+};
+
 /*------ sysctl variables----*/
 static DEFINE_SPINLOCK(aio_nr_lock);
 unsigned long aio_nr;          /* current system wide number of aio requests */
@@ -220,7 +252,7 @@ static int __init aio_setup(void)
        if (IS_ERR(aio_mnt))
                panic("Failed to create aio fs mount.");
 
-       kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
+       kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
        kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
 
        pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
@@ -480,8 +512,9 @@ static int aio_setup_ring(struct kioctx *ctx)
 #define AIO_EVENTS_FIRST_PAGE  ((PAGE_SIZE - sizeof(struct aio_ring)) / sizeof(struct io_event))
 #define AIO_EVENTS_OFFSET      (AIO_EVENTS_PER_PAGE - AIO_EVENTS_FIRST_PAGE)
 
-void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel)
+void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
+       struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
        struct kioctx *ctx = req->ki_ctx;
        unsigned long flags;
 
@@ -496,7 +529,7 @@ void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel)
 }
 EXPORT_SYMBOL(kiocb_set_cancel_fn);
 
-static int kiocb_cancel(struct kiocb *kiocb)
+static int kiocb_cancel(struct aio_kiocb *kiocb)
 {
        kiocb_cancel_fn *old, *cancel;
 
@@ -514,7 +547,7 @@ static int kiocb_cancel(struct kiocb *kiocb)
                cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
        } while (cancel != old);
 
-       return cancel(kiocb);
+       return cancel(&kiocb->common);
 }
 
 static void free_ioctx(struct work_struct *work)
@@ -550,13 +583,13 @@ static void free_ioctx_reqs(struct percpu_ref *ref)
 static void free_ioctx_users(struct percpu_ref *ref)
 {
        struct kioctx *ctx = container_of(ref, struct kioctx, users);
-       struct kiocb *req;
+       struct aio_kiocb *req;
 
        spin_lock_irq(&ctx->ctx_lock);
 
        while (!list_empty(&ctx->active_reqs)) {
                req = list_first_entry(&ctx->active_reqs,
-                                      struct kiocb, ki_list);
+                                      struct aio_kiocb, ki_list);
 
                list_del_init(&req->ki_list);
                kiocb_cancel(req);
@@ -778,22 +811,6 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
        return 0;
 }
 
-/* wait_on_sync_kiocb:
- *     Waits on the given sync kiocb to complete.
- */
-ssize_t wait_on_sync_kiocb(struct kiocb *req)
-{
-       while (!req->ki_ctx) {
-               set_current_state(TASK_UNINTERRUPTIBLE);
-               if (req->ki_ctx)
-                       break;
-               io_schedule();
-       }
-       __set_current_state(TASK_RUNNING);
-       return req->ki_user_data;
-}
-EXPORT_SYMBOL(wait_on_sync_kiocb);
-
 /*
  * exit_aio: called when the last user of mm goes away.  At this point, there is
  * no way for any new requests to be submited or any of the io_* syscalls to be
@@ -948,9 +965,9 @@ static void user_refill_reqs_available(struct kioctx *ctx)
  *     Allocate a slot for an aio request.
  * Returns NULL if no requests are free.
  */
-static inline struct kiocb *aio_get_req(struct kioctx *ctx)
+static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 {
-       struct kiocb *req;
+       struct aio_kiocb *req;
 
        if (!get_reqs_available(ctx)) {
                user_refill_reqs_available(ctx);
@@ -971,10 +988,10 @@ out_put:
        return NULL;
 }
 
-static void kiocb_free(struct kiocb *req)
+static void kiocb_free(struct aio_kiocb *req)
 {
-       if (req->ki_filp)
-               fput(req->ki_filp);
+       if (req->common.ki_filp)
+               fput(req->common.ki_filp);
        if (req->ki_eventfd != NULL)
                eventfd_ctx_put(req->ki_eventfd);
        kmem_cache_free(kiocb_cachep, req);
@@ -1010,8 +1027,9 @@ out:
 /* aio_complete
  *     Called when the io request on the given iocb is complete.
  */
-void aio_complete(struct kiocb *iocb, long res, long res2)
+static void aio_complete(struct kiocb *kiocb, long res, long res2)
 {
+       struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
        struct kioctx   *ctx = iocb->ki_ctx;
        struct aio_ring *ring;
        struct io_event *ev_page, *event;
@@ -1025,13 +1043,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
         *    ref, no other paths have a way to get another ref
         *  - the sync task helpfully left a reference to itself in the iocb
         */
-       if (is_sync_kiocb(iocb)) {
-               iocb->ki_user_data = res;
-               smp_wmb();
-               iocb->ki_ctx = ERR_PTR(-EXDEV);
-               wake_up_process(iocb->ki_obj.tsk);
-               return;
-       }
+       BUG_ON(is_sync_kiocb(kiocb));
 
        if (iocb->ki_list.next) {
                unsigned long flags;
@@ -1057,7 +1069,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
        ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
        event = ev_page + pos % AIO_EVENTS_PER_PAGE;
 
-       event->obj = (u64)(unsigned long)iocb->ki_obj.user;
+       event->obj = (u64)(unsigned long)iocb->ki_user_iocb;
        event->data = iocb->ki_user_data;
        event->res = res;
        event->res2 = res2;
@@ -1066,7 +1078,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
        flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 
        pr_debug("%p[%u]: %p: %p %Lx %lx %lx\n",
-                ctx, tail, iocb, iocb->ki_obj.user, iocb->ki_user_data,
+                ctx, tail, iocb, iocb->ki_user_iocb, iocb->ki_user_data,
                 res, res2);
 
        /* after flagging the request as done, we
@@ -1113,7 +1125,6 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
 
        percpu_ref_put(&ctx->reqs);
 }
-EXPORT_SYMBOL(aio_complete);
 
 /* aio_read_events_ring
  *     Pull an event off of the ioctx's event ring.  Returns the number of
@@ -1344,12 +1355,13 @@ typedef ssize_t (rw_iter_op)(struct kiocb *, struct iov_iter *);
 static ssize_t aio_setup_vectored_rw(struct kiocb *kiocb,
                                     int rw, char __user *buf,
                                     unsigned long *nr_segs,
+                                    size_t *len,
                                     struct iovec **iovec,
                                     bool compat)
 {
        ssize_t ret;
 
-       *nr_segs = kiocb->ki_nbytes;
+       *nr_segs = *len;
 
 #ifdef CONFIG_COMPAT
        if (compat)
@@ -1364,21 +1376,22 @@ static ssize_t aio_setup_vectored_rw(struct kiocb *kiocb,
        if (ret < 0)
                return ret;
 
-       /* ki_nbytes now reflect bytes instead of segs */
-       kiocb->ki_nbytes = ret;
+       /* len now reflect bytes instead of segs */
+       *len = ret;
        return 0;
 }
 
 static ssize_t aio_setup_single_vector(struct kiocb *kiocb,
                                       int rw, char __user *buf,
                                       unsigned long *nr_segs,
+                                      size_t len,
                                       struct iovec *iovec)
 {
-       if (unlikely(!access_ok(!rw, buf, kiocb->ki_nbytes)))
+       if (unlikely(!access_ok(!rw, buf, len)))
                return -EFAULT;
 
        iovec->iov_base = buf;
-       iovec->iov_len = kiocb->ki_nbytes;
+       iovec->iov_len = len;
        *nr_segs = 1;
        return 0;
 }
@@ -1388,7 +1401,7 @@ static ssize_t aio_setup_single_vector(struct kiocb *kiocb,
  *     Performs the initial checks and io submission.
  */
 static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
-                           char __user *buf, bool compat)
+                           char __user *buf, size_t len, bool compat)
 {
        struct file *file = req->ki_filp;
        ssize_t ret;
@@ -1423,21 +1436,21 @@ rw_common:
                if (!rw_op && !iter_op)
                        return -EINVAL;
 
-               ret = (opcode == IOCB_CMD_PREADV ||
-                      opcode == IOCB_CMD_PWRITEV)
-                       ? aio_setup_vectored_rw(req, rw, buf, &nr_segs,
-                                               &iovec, compat)
-                       : aio_setup_single_vector(req, rw, buf, &nr_segs,
-                                                 iovec);
+               if (opcode == IOCB_CMD_PREADV || opcode == IOCB_CMD_PWRITEV)
+                       ret = aio_setup_vectored_rw(req, rw, buf, &nr_segs,
+                                               &len, &iovec, compat);
+               else
+                       ret = aio_setup_single_vector(req, rw, buf, &nr_segs,
+                                                 len, iovec);
                if (!ret)
-                       ret = rw_verify_area(rw, file, &req->ki_pos, req->ki_nbytes);
+                       ret = rw_verify_area(rw, file, &req->ki_pos, len);
                if (ret < 0) {
                        if (iovec != inline_vecs)
                                kfree(iovec);
                        return ret;
                }
 
-               req->ki_nbytes = ret;
+               len = ret;
 
                /* XXX: move/kill - rw_verify_area()? */
                /* This matches the pread()/pwrite() logic */
@@ -1450,7 +1463,7 @@ rw_common:
                        file_start_write(file);
 
                if (iter_op) {
-                       iov_iter_init(&iter, rw, iovec, nr_segs, req->ki_nbytes);
+                       iov_iter_init(&iter, rw, iovec, nr_segs, len);
                        ret = iter_op(req, &iter);
                } else {
                        ret = rw_op(req, iovec, nr_segs, req->ki_pos);
@@ -1500,7 +1513,7 @@ rw_common:
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
                         struct iocb *iocb, bool compat)
 {
-       struct kiocb *req;
+       struct aio_kiocb *req;
        ssize_t ret;
 
        /* enforce forwards compatibility on users */
@@ -1523,11 +1536,14 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
        if (unlikely(!req))
                return -EAGAIN;
 
-       req->ki_filp = fget(iocb->aio_fildes);
-       if (unlikely(!req->ki_filp)) {
+       req->common.ki_filp = fget(iocb->aio_fildes);
+       if (unlikely(!req->common.ki_filp)) {
                ret = -EBADF;
                goto out_put_req;
        }
+       req->common.ki_pos = iocb->aio_offset;
+       req->common.ki_complete = aio_complete;
+       req->common.ki_flags = 0;
 
        if (iocb->aio_flags & IOCB_FLAG_RESFD) {
                /*
@@ -1542,6 +1558,8 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
                        req->ki_eventfd = NULL;
                        goto out_put_req;
                }
+
+               req->common.ki_flags |= IOCB_EVENTFD;
        }
 
        ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1550,13 +1568,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
                goto out_put_req;
        }
 
-       req->ki_obj.user = user_iocb;
+       req->ki_user_iocb = user_iocb;
        req->ki_user_data = iocb->aio_data;
-       req->ki_pos = iocb->aio_offset;
-       req->ki_nbytes = iocb->aio_nbytes;
 
-       ret = aio_run_iocb(req, iocb->aio_lio_opcode,
+       ret = aio_run_iocb(&req->common, iocb->aio_lio_opcode,
                           (char __user *)(unsigned long)iocb->aio_buf,
+                          iocb->aio_nbytes,
                           compat);
        if (ret)
                goto out_put_req;
@@ -1643,10 +1660,10 @@ SYSCALL_DEFINE3(io_submit, aio_context_t, ctx_id, long, nr,
 /* lookup_kiocb
  *     Finds a given iocb for cancellation.
  */
-static struct kiocb *lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb,
-                                 u32 key)
+static struct aio_kiocb *
+lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb, u32 key)
 {
-       struct list_head *pos;
+       struct aio_kiocb *kiocb;
 
        assert_spin_locked(&ctx->ctx_lock);
 
@@ -1654,9 +1671,8 @@ static struct kiocb *lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb,
                return NULL;
 
        /* TODO: use a hash or array, this sucks. */
-       list_for_each(pos, &ctx->active_reqs) {
-               struct kiocb *kiocb = list_kiocb(pos);
-               if (kiocb->ki_obj.user == iocb)
+       list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
+               if (kiocb->ki_user_iocb == iocb)
                        return kiocb;
        }
        return NULL;
@@ -1676,7 +1692,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
                struct io_event __user *, result)
 {
        struct kioctx *ctx;
-       struct kiocb *kiocb;
+       struct aio_kiocb *kiocb;
        u32 key;
        int ret;
 
index 90bc079d9982928b7a9b5bcb6ad3efd6ebf1375f..fdcb4d69f430db6370e1eed7c1c04c9a3f333746 100644 (file)
@@ -15,6 +15,7 @@
 #include <linux/buffer_head.h>
 #include <linux/vfs.h>
 #include <linux/writeback.h>
+#include <linux/uio.h>
 #include <asm/uaccess.h>
 #include "bfs.h"
 
index 975266be67d319aa019a48e94cfda0a3ca8ce1e0..2e522aed6584d3a7837155897b2a47085b5ed303 100644 (file)
@@ -27,7 +27,6 @@
 #include <linux/namei.h>
 #include <linux/log2.h>
 #include <linux/cleancache.h>
-#include <linux/aio.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
index 30982bbd31c30c2b154836b0f51b94c37e22923c..aee18f84e3159c1e369a0c14eff0baad8ddbb59a 100644 (file)
@@ -24,7 +24,6 @@
 #include <linux/string.h>
 #include <linux/backing-dev.h>
 #include <linux/mpage.h>
-#include <linux/aio.h>
 #include <linux/falloc.h>
 #include <linux/swap.h>
 #include <linux/writeback.h>
@@ -32,6 +31,7 @@
 #include <linux/compat.h>
 #include <linux/slab.h>
 #include <linux/btrfs.h>
+#include <linux/uio.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
index d2e732d7af524640bc2c197da3e7123182b4537e..686331f22b15ce0fcc8233c2529a50c2eb6190c7 100644 (file)
@@ -32,7 +32,6 @@
 #include <linux/writeback.h>
 #include <linux/statfs.h>
 #include <linux/compat.h>
-#include <linux/aio.h>
 #include <linux/bit_spinlock.h>
 #include <linux/xattr.h>
 #include <linux/posix_acl.h>
@@ -43,6 +42,7 @@
 #include <linux/btrfs.h>
 #include <linux/blkdev.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/uio.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
index d533075a823d5eb92e709547b8fe790c59cba981..139f2fea91a0fe8472cf138e900bfe580db43f50 100644 (file)
@@ -7,7 +7,6 @@
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/writeback.h>
-#include <linux/aio.h>
 #include <linux/falloc.h>
 
 #include "super.h"
@@ -808,7 +807,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
        struct file *filp = iocb->ki_filp;
        struct ceph_file_info *fi = filp->private_data;
-       size_t len = iocb->ki_nbytes;
+       size_t len = iov_iter_count(to);
        struct inode *inode = file_inode(filp);
        struct ceph_inode_info *ci = ceph_inode(inode);
        struct page *pinned_page = NULL;
index e181b6b2e297fb5d3bd03a07efe382f0dd204972..6fb00e3f1059791d21b4ffc80671f3d051ecbc8e 100644 (file)
@@ -37,7 +37,6 @@
 #include <linux/uio.h>
 #include <linux/atomic.h>
 #include <linux/prefetch.h>
-#include <linux/aio.h>
 
 /*
  * How many user pages to map in one call to get_user_pages().  This determines
@@ -265,7 +264,7 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret,
                                ret = err;
                }
 
-               aio_complete(dio->iocb, ret, 0);
+               dio->iocb->ki_complete(dio->iocb, ret, 0);
        }
 
        kmem_cache_free(dio_cache, dio);
@@ -1056,7 +1055,7 @@ static inline int drop_refcount(struct dio *dio)
         * operation.  AIO can if it was a broken operation described above or
         * in fact if all the bios race to complete before we get here.  In
         * that case dio_complete() translates the EIOCBQUEUED into the proper
-        * return code that the caller will hand to aio_complete().
+        * return code that the caller will hand to ->complete().
         *
         * This is managed by the bio_lock instead of being an atomic_t so that
         * completion paths can drop their ref and use the remaining count to
index fd39bad6f1bdf8bbcb4321a8fc8ff1934d67167c..79675089443df98c23de1602d3ca59a887cbb4d4 100644 (file)
@@ -31,7 +31,6 @@
 #include <linux/security.h>
 #include <linux/compat.h>
 #include <linux/fs_stack.h>
-#include <linux/aio.h>
 #include "ecryptfs_kernel.h"
 
 /**
@@ -52,12 +51,6 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
        struct file *file = iocb->ki_filp;
 
        rc = generic_file_read_iter(iocb, to);
-       /*
-        * Even though this is a async interface, we need to wait
-        * for IO to finish to update atime
-        */
-       if (-EIOCBQUEUED == rc)
-               rc = wait_on_sync_kiocb(iocb);
        if (rc >= 0) {
                path = ecryptfs_dentry_to_lower_path(file->f_path.dentry);
                touch_atime(path);
index 6434bc00012517a30ace1cb97f2160b0c48eea3a..df9d6afbc5d5eb745e00a9a2575b28890e94d7b7 100644 (file)
@@ -31,7 +31,7 @@
 #include <linux/mpage.h>
 #include <linux/fiemap.h>
 #include <linux/namei.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include "ext2.h"
 #include "acl.h"
 #include "xattr.h"
index 2c6ccc49ba279cacf77fe6609fe44a50b970898c..db07ffbe7c85cdabbe89d49b3b2294dd8a1d84cf 100644 (file)
@@ -27,7 +27,7 @@
 #include <linux/writeback.h>
 #include <linux/mpage.h>
 #include <linux/namei.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include "ext3.h"
 #include "xattr.h"
 #include "acl.h"
index 33a09da16c9ce1e8049fdcacdf3e8833410fd78f..598abbbe678619c347dbe53d8dfe2c8cd6c86c65 100644 (file)
@@ -23,9 +23,9 @@
 #include <linux/jbd2.h>
 #include <linux/mount.h>
 #include <linux/path.h>
-#include <linux/aio.h>
 #include <linux/quotaops.h>
 #include <linux/pagevec.h>
+#include <linux/uio.h>
 #include "ext4.h"
 #include "ext4_jbd2.h"
 #include "xattr.h"
index 45fe924f82bce2ff76e3e74b45ec1833729433ea..740c7871c11770a683395989df5548d3d3357c22 100644 (file)
@@ -20,9 +20,9 @@
  *     (sct@redhat.com), 1993, 1998
  */
 
-#include <linux/aio.h>
 #include "ext4_jbd2.h"
 #include "truncate.h"
+#include <linux/uio.h>
 
 #include <trace/events/ext4.h>
 
index 5cb9a212b86f3efd69ca604df07dc20b901dabb1..a3f451370bef4b49a23343daf63af2459de0741c 100644 (file)
@@ -37,7 +37,6 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/ratelimit.h>
-#include <linux/aio.h>
 #include <linux/bitops.h>
 
 #include "ext4_jbd2.h"
index b24a2541a9baaa0d4c22e80a75050af2517a417d..464984261e698af8317621c45b8d2089551bc790 100644 (file)
@@ -18,7 +18,6 @@
 #include <linux/pagevec.h>
 #include <linux/mpage.h>
 #include <linux/namei.h>
-#include <linux/aio.h>
 #include <linux/uio.h>
 #include <linux/bio.h>
 #include <linux/workqueue.h>
index 985ed023a750170b924455ea23e2684c50baeba4..497f8515d2056283d040b912dd638e65a4576fe7 100644 (file)
 #include <linux/f2fs_fs.h>
 #include <linux/buffer_head.h>
 #include <linux/mpage.h>
-#include <linux/aio.h>
 #include <linux/writeback.h>
 #include <linux/backing-dev.h>
 #include <linux/blkdev.h>
 #include <linux/bio.h>
 #include <linux/prefetch.h>
+#include <linux/uio.h>
 
 #include "f2fs.h"
 #include "node.h"
index 497c7c5263c7ca3962c385605fbbb558d351f759..8521207de22935464f074b70448cae781ed403e1 100644 (file)
@@ -19,7 +19,6 @@
 #include <linux/mpage.h>
 #include <linux/buffer_head.h>
 #include <linux/mount.h>
-#include <linux/aio.h>
 #include <linux/vfs.h>
 #include <linux/parser.h>
 #include <linux/uio.h>
index 28d0c7abba1c2fa7748d3b1c2874b855427b3897..b3fa0503223411ff97b3ab5c2f775310b1f3cccb 100644 (file)
@@ -38,7 +38,6 @@
 #include <linux/device.h>
 #include <linux/file.h>
 #include <linux/fs.h>
-#include <linux/aio.h>
 #include <linux/kdev_t.h>
 #include <linux/kthread.h>
 #include <linux/list.h>
@@ -48,6 +47,7 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/module.h>
+#include <linux/uio.h>
 
 #include "fuse_i.h"
 
index 39706c57ad3cb157d81594065a15f154f61d7bd8..95a2797eef66d8db6edb1c7c4310be292744a427 100644 (file)
@@ -19,7 +19,6 @@
 #include <linux/pipe_fs_i.h>
 #include <linux/swap.h>
 #include <linux/splice.h>
-#include <linux/aio.h>
 
 MODULE_ALIAS_MISCDEV(FUSE_MINOR);
 MODULE_ALIAS("devname:fuse");
index c01ec3bdcfd81090fae2cb26ae166f351d4505eb..ff102cbf16eab45bdd74eb7cffeafc99a4d7b064 100644 (file)
@@ -15,8 +15,8 @@
 #include <linux/module.h>
 #include <linux/compat.h>
 #include <linux/swap.h>
-#include <linux/aio.h>
 #include <linux/falloc.h>
+#include <linux/uio.h>
 
 static const struct file_operations fuse_direct_io_file_operations;
 
@@ -528,6 +528,17 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
        }
 }
 
+static ssize_t fuse_get_res_by_io(struct fuse_io_priv *io)
+{
+       if (io->err)
+               return io->err;
+
+       if (io->bytes >= 0 && io->write)
+               return -EIO;
+
+       return io->bytes < 0 ? io->size : io->bytes;
+}
+
 /**
  * In case of short read, the caller sets 'pos' to the position of
  * actual end of fuse request in IO request. Otherwise, if bytes_requested
@@ -546,6 +557,7 @@ static void fuse_release_user_pages(struct fuse_req *req, int write)
  */
 static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
 {
+       bool is_sync = is_sync_kiocb(io->iocb);
        int left;
 
        spin_lock(&io->lock);
@@ -555,30 +567,24 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
                io->bytes = pos;
 
        left = --io->reqs;
+       if (!left && is_sync)
+               complete(io->done);
        spin_unlock(&io->lock);
 
-       if (!left) {
-               long res;
+       if (!left && !is_sync) {
+               ssize_t res = fuse_get_res_by_io(io);
 
-               if (io->err)
-                       res = io->err;
-               else if (io->bytes >= 0 && io->write)
-                       res = -EIO;
-               else {
-                       res = io->bytes < 0 ? io->size : io->bytes;
+               if (res >= 0) {
+                       struct inode *inode = file_inode(io->iocb->ki_filp);
+                       struct fuse_conn *fc = get_fuse_conn(inode);
+                       struct fuse_inode *fi = get_fuse_inode(inode);
 
-                       if (!is_sync_kiocb(io->iocb)) {
-                               struct inode *inode = file_inode(io->iocb->ki_filp);
-                               struct fuse_conn *fc = get_fuse_conn(inode);
-                               struct fuse_inode *fi = get_fuse_inode(inode);
-
-                               spin_lock(&fc->lock);
-                               fi->attr_version = ++fc->attr_version;
-                               spin_unlock(&fc->lock);
-                       }
+                       spin_lock(&fc->lock);
+                       fi->attr_version = ++fc->attr_version;
+                       spin_unlock(&fc->lock);
                }
 
-               aio_complete(io->iocb, res, 0);
+               io->iocb->ki_complete(io->iocb, res, 0);
                kfree(io);
        }
 }
@@ -2801,6 +2807,7 @@ static ssize_t
 fuse_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
                        loff_t offset)
 {
+       DECLARE_COMPLETION_ONSTACK(wait);
        ssize_t ret = 0;
        struct file *file = iocb->ki_filp;
        struct fuse_file *ff = file->private_data;
@@ -2852,6 +2859,9 @@ fuse_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
        if (!is_sync_kiocb(iocb) && (offset + count > i_size) && rw == WRITE)
                io->async = false;
 
+       if (io->async && is_sync_kiocb(iocb))
+               io->done = &wait;
+
        if (rw == WRITE)
                ret = __fuse_direct_write(io, iter, &pos);
        else
@@ -2864,11 +2874,12 @@ fuse_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter,
                if (!is_sync_kiocb(iocb))
                        return -EIOCBQUEUED;
 
-               ret = wait_on_sync_kiocb(iocb);
-       } else {
-               kfree(io);
+               wait_for_completion(&wait);
+               ret = fuse_get_res_by_io(io);
        }
 
+       kfree(io);
+
        if (rw == WRITE) {
                if (ret > 0)
                        fuse_write_update_size(inode, pos);
index 1cdfb07c1376b4f4b5633e86fdbdfc4320953de2..7354dc142a50845a62e9a413d82d185afc1f5b0d 100644 (file)
@@ -263,6 +263,7 @@ struct fuse_io_priv {
        int err;
        struct kiocb *iocb;
        struct file *file;
+       struct completion *done;
 };
 
 /**
index 4ad4f94edebe25cc8afa3fa7c4ec35913cb00642..fe6634d25d1ddb591a60a40f32f054cae89ed5f2 100644 (file)
@@ -20,7 +20,7 @@
 #include <linux/swap.h>
 #include <linux/gfs2_ondisk.h>
 #include <linux/backing-dev.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include <trace/events/writeback.h>
 
 #include "gfs2.h"
index 3e32bb8e2d7e573df59dc360a1ac38fe4ec759bd..f6fc412b1100e7f1340f6f479ebbcf8e18edca6c 100644 (file)
@@ -25,7 +25,6 @@
 #include <asm/uaccess.h>
 #include <linux/dlm.h>
 #include <linux/dlm_plock.h>
-#include <linux/aio.h>
 #include <linux/delay.h>
 
 #include "gfs2.h"
index d0929bc817826e012cc829bb0f021832eea24379..98d4ea45bb70aad886641f66d81e0007f0d0e34d 100644 (file)
@@ -14,7 +14,7 @@
 #include <linux/pagemap.h>
 #include <linux/mpage.h>
 #include <linux/sched.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 #include "hfs_fs.h"
 #include "btree.h"
index 0cf786f2d046f9fbae9b110a2a2d212c008fb3aa..f541196d4ee910a3f9ec6ce26841937c9fa7eb73 100644 (file)
@@ -14,7 +14,7 @@
 #include <linux/pagemap.h>
 #include <linux/mpage.h>
 #include <linux/sched.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 #include "hfsplus_fs.h"
 #include "hfsplus_raw.h"
index bd3df1ca3c9b7f955571c056f86f98e97beda7b9..3197aed106148d8b0839b80405ecad125c14e7aa 100644 (file)
@@ -22,8 +22,8 @@
 #include <linux/buffer_head.h>
 #include <linux/pagemap.h>
 #include <linux/quotaops.h>
+#include <linux/uio.h>
 #include <linux/writeback.h>
-#include <linux/aio.h>
 #include "jfs_incore.h"
 #include "jfs_inode.h"
 #include "jfs_filsys.h"
index e907c8cf732e3cff6bc9711ccf0b20c9261cdca2..c3929fb2ab26c2971e2e4a9f09e0f5a88387c144 100644 (file)
@@ -265,7 +265,7 @@ ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter, loff_t
 
        return -EINVAL;
 #else
-       VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
+       VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
 
        if (rw == READ)
                return nfs_file_direct_read(iocb, iter, pos);
@@ -393,7 +393,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq, bool write)
                long res = (long) dreq->error;
                if (!res)
                        res = (long) dreq->count;
-               aio_complete(dreq->iocb, res, 0);
+               dreq->iocb->ki_complete(dreq->iocb, res, 0);
        }
 
        complete_all(&dreq->completion);
index e679d24c39d3a57d5ef510a22d5ccbe2832c5335..37b15582e0de960a966e80fcda3aa1680caa094d 100644 (file)
@@ -26,7 +26,6 @@
 #include <linux/nfs_mount.h>
 #include <linux/mm.h>
 #include <linux/pagemap.h>
-#include <linux/aio.h>
 #include <linux/gfp.h>
 #include <linux/swap.h>
 
index 8b5969538f39229cede14416a067d2e056c1a677..ab4987bc637f8b084298086cf48f00e31bfe1298 100644 (file)
@@ -26,7 +26,7 @@
 #include <linux/mpage.h>
 #include <linux/pagemap.h>
 #include <linux/writeback.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 #include "nilfs.h"
 #include "btnode.h"
 #include "segment.h"
index 1da9b2d184dc4e32d9ac9a95eb0ee2553c5a1e46..f16f2d8401febeaf4911491e92f8deac377e8a93 100644 (file)
@@ -28,7 +28,6 @@
 #include <linux/swap.h>
 #include <linux/uio.h>
 #include <linux/writeback.h>
-#include <linux/aio.h>
 
 #include <asm/page.h>
 #include <asm/uaccess.h>
index 898b9949d36357a8b7998600f3fdbacaa498d08f..1d0c21df0d805cd73248afd42dc05c1108c49700 100644 (file)
@@ -28,7 +28,6 @@
 #include <linux/quotaops.h>
 #include <linux/slab.h>
 #include <linux/log2.h>
-#include <linux/aio.h>
 
 #include "aops.h"
 #include "attrib.h"
index 44db1808cdb598df6b91548410b3634480c06c31..e1bf18c5d25e1cc907abf53e4ac984870dc0584e 100644 (file)
@@ -29,6 +29,7 @@
 #include <linux/mpage.h>
 #include <linux/quotaops.h>
 #include <linux/blkdev.h>
+#include <linux/uio.h>
 
 #include <cluster/masklog.h>
 
index 6cae155d54df0d68be4f90f4754d15c30302159c..dd59599b022d5ab26dffd82807d048cac170a154 100644 (file)
@@ -22,7 +22,7 @@
 #ifndef OCFS2_AOPS_H
 #define OCFS2_AOPS_H
 
-#include <linux/aio.h>
+#include <linux/fs.h>
 
 handle_t *ocfs2_start_walk_page_trans(struct inode *inode,
                                                         struct page *page,
index 46e0d4e857c7f493f512196603d3725ca8d3dfaa..266845de210016e5cc25a71599116f7f1f3b3290 100644 (file)
@@ -2280,7 +2280,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb *iocb,
                file->f_path.dentry->d_name.name,
                (unsigned int)from->nr_segs);   /* GRRRRR */
 
-       if (iocb->ki_nbytes == 0)
+       if (count == 0)
                return 0;
 
        appending = file->f_flags & O_APPEND ? 1 : 0;
@@ -2330,8 +2330,7 @@ relock:
        }
 
        can_do_direct = direct_io;
-       ret = ocfs2_prepare_inode_for_write(file, ppos,
-                                           iocb->ki_nbytes, appending,
+       ret = ocfs2_prepare_inode_for_write(file, ppos, count, appending,
                                            &can_do_direct, &has_refcount);
        if (ret < 0) {
                mlog_errno(ret);
@@ -2339,8 +2338,7 @@ relock:
        }
 
        if (direct_io && !is_sync_kiocb(iocb))
-               unaligned_dio = ocfs2_is_io_unaligned(inode, iocb->ki_nbytes,
-                                                     *ppos);
+               unaligned_dio = ocfs2_is_io_unaligned(inode, count, *ppos);
 
        /*
         * We can't complete the direct I/O as requested, fall back to
index 21981e58e2a634c09b9ebb9b327860d849fb6b53..2d084f2d0b83c698a7df720c35d2fdbeadb65fcb 100644 (file)
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -21,7 +21,6 @@
 #include <linux/audit.h>
 #include <linux/syscalls.h>
 #include <linux/fcntl.h>
-#include <linux/aio.h>
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
index 8e1b68786d663d4be5551efcd7b0bf7d5ed8b192..99a6ef946d0182711542b77f17b22bc84cfe7526 100644 (file)
@@ -9,7 +9,6 @@
 #include <linux/fcntl.h>
 #include <linux/file.h>
 #include <linux/uio.h>
-#include <linux/aio.h>
 #include <linux/fsnotify.h>
 #include <linux/security.h>
 #include <linux/export.h>
@@ -343,13 +342,10 @@ ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos)
 
        init_sync_kiocb(&kiocb, file);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = iov_iter_count(iter);
 
        iter->type |= READ;
        ret = file->f_op->read_iter(&kiocb, iter);
-       if (ret == -EIOCBQUEUED)
-               ret = wait_on_sync_kiocb(&kiocb);
-
+       BUG_ON(ret == -EIOCBQUEUED);
        if (ret > 0)
                *ppos = kiocb.ki_pos;
        return ret;
@@ -366,13 +362,10 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos)
 
        init_sync_kiocb(&kiocb, file);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = iov_iter_count(iter);
 
        iter->type |= WRITE;
        ret = file->f_op->write_iter(&kiocb, iter);
-       if (ret == -EIOCBQUEUED)
-               ret = wait_on_sync_kiocb(&kiocb);
-
+       BUG_ON(ret == -EIOCBQUEUED);
        if (ret > 0)
                *ppos = kiocb.ki_pos;
        return ret;
@@ -426,11 +419,9 @@ ssize_t do_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *pp
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
 
        ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
-       if (-EIOCBQUEUED == ret)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
@@ -446,12 +437,10 @@ ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *p
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
        iov_iter_init(&iter, READ, &iov, 1, len);
 
        ret = filp->f_op->read_iter(&kiocb, &iter);
-       if (-EIOCBQUEUED == ret)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
@@ -510,11 +499,9 @@ ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, lof
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
 
        ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
-       if (-EIOCBQUEUED == ret)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
@@ -530,12 +517,10 @@ ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, lo
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
        iov_iter_init(&iter, WRITE, &iov, 1, len);
 
        ret = filp->f_op->write_iter(&kiocb, &iter);
-       if (-EIOCBQUEUED == ret)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
@@ -719,12 +704,10 @@ static ssize_t do_iter_readv_writev(struct file *filp, int rw, const struct iove
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
 
        iov_iter_init(&iter, rw, iov, nr_segs, len);
        ret = fn(&kiocb, &iter);
-       if (ret == -EIOCBQUEUED)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
@@ -737,11 +720,9 @@ static ssize_t do_sync_readv_writev(struct file *filp, const struct iovec *iov,
 
        init_sync_kiocb(&kiocb, filp);
        kiocb.ki_pos = *ppos;
-       kiocb.ki_nbytes = len;
 
        ret = fn(&kiocb, iov, nr_segs, kiocb.ki_pos);
-       if (ret == -EIOCBQUEUED)
-               ret = wait_on_sync_kiocb(&kiocb);
+       BUG_ON(ret == -EIOCBQUEUED);
        *ppos = kiocb.ki_pos;
        return ret;
 }
index e72401e1f9956238064c91805279233a721bffe1..9312b7842e036f64ac02135102b445f0769e7702 100644 (file)
@@ -18,7 +18,7 @@
 #include <linux/writeback.h>
 #include <linux/quotaops.h>
 #include <linux/swap.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 int reiserfs_commit_write(struct file *f, struct page *page,
                          unsigned from, unsigned to);
index 7968da96bebbb5d1cd087cbfa2ece65c09cc8b4a..4bbfa95b5bfea8b20aa557cff35c2efd610289da 100644 (file)
@@ -32,7 +32,6 @@
 #include <linux/gfp.h>
 #include <linux/socket.h>
 #include <linux/compat.h>
-#include <linux/aio.h>
 #include "internal.h"
 
 /*
index e627c0acf6264f6aabc4b2777ab79214e9c32e64..c3d15fe834033d4d080ea408dbbb961f1e4719c6 100644 (file)
@@ -50,7 +50,6 @@
  */
 
 #include "ubifs.h"
-#include <linux/aio.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/slab.h>
index 08f3555fbeac3f6ceeda033cb8f6ec82557623d0..7f885cc8b0b798dca3239a9f72f87741fd8b023f 100644 (file)
@@ -34,7 +34,7 @@
 #include <linux/errno.h>
 #include <linux/pagemap.h>
 #include <linux/buffer_head.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 #include "udf_i.h"
 #include "udf_sb.h"
@@ -122,7 +122,7 @@ static ssize_t udf_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
        struct file *file = iocb->ki_filp;
        struct inode *inode = file_inode(file);
        int err, pos;
-       size_t count = iocb->ki_nbytes;
+       size_t count = iov_iter_count(from);
        struct udf_inode_info *iinfo = UDF_I(inode);
 
        mutex_lock(&inode->i_mutex);
index a445d599098d7ad1ccace2a81a86a0bc563af391..9c1fbd23913db541c3facc1342614793b1403bfd 100644 (file)
@@ -38,7 +38,7 @@
 #include <linux/slab.h>
 #include <linux/crc-itu-t.h>
 #include <linux/mpage.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 #include "udf_i.h"
 #include "udf_sb.h"
index 3a9b7a1b8704be66439ea797dd2183c035a929e5..4f8cdc59bc38154b45f1adfd69c0a371df4394e3 100644 (file)
@@ -31,7 +31,6 @@
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
 #include "xfs_bmap_btree.h"
-#include <linux/aio.h>
 #include <linux/gfp.h>
 #include <linux/mpage.h>
 #include <linux/pagevec.h>
index a2e1cb8a568bf9d45e32c43539a2e6f8b56d83f4..f44212fae65327347db1ba9f8ec1739d30650f44 100644 (file)
@@ -38,7 +38,6 @@
 #include "xfs_icache.h"
 #include "xfs_pnfs.h"
 
-#include <linux/aio.h>
 #include <linux/dcache.h>
 #include <linux/falloc.h>
 #include <linux/pagevec.h>
index d9c92daa3944e43a13f285a7baaa1443868cce3d..9eb42dbc5582ace99283629f0905861ac820c7d5 100644 (file)
@@ -1,86 +1,23 @@
 #ifndef __LINUX__AIO_H
 #define __LINUX__AIO_H
 
-#include <linux/list.h>
-#include <linux/workqueue.h>
 #include <linux/aio_abi.h>
-#include <linux/uio.h>
-#include <linux/rcupdate.h>
-
-#include <linux/atomic.h>
 
 struct kioctx;
 struct kiocb;
+struct mm_struct;
 
 #define KIOCB_KEY              0
 
-/*
- * We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
- * cancelled or completed (this makes a certain amount of sense because
- * successful cancellation - io_cancel() - does deliver the completion to
- * userspace).
- *
- * And since most things don't implement kiocb cancellation and we'd really like
- * kiocb completion to be lockless when possible, we use ki_cancel to
- * synchronize cancellation and completion - we only set it to KIOCB_CANCELLED
- * with xchg() or cmpxchg(), see batch_complete_aio() and kiocb_cancel().
- */
-#define KIOCB_CANCELLED                ((void *) (~0ULL))
-
 typedef int (kiocb_cancel_fn)(struct kiocb *);
 
-struct kiocb {
-       struct file             *ki_filp;
-       struct kioctx           *ki_ctx;        /* NULL for sync ops */
-       kiocb_cancel_fn         *ki_cancel;
-       void                    *private;
-
-       union {
-               void __user             *user;
-               struct task_struct      *tsk;
-       } ki_obj;
-
-       __u64                   ki_user_data;   /* user's data for completion */
-       loff_t                  ki_pos;
-       size_t                  ki_nbytes;      /* copy of iocb->aio_nbytes */
-
-       struct list_head        ki_list;        /* the aio core uses this
-                                                * for cancellation */
-
-       /*
-        * If the aio_resfd field of the userspace iocb is not zero,
-        * this is the underlying eventfd context to deliver events to.
-        */
-       struct eventfd_ctx      *ki_eventfd;
-};
-
-static inline bool is_sync_kiocb(struct kiocb *kiocb)
-{
-       return kiocb->ki_ctx == NULL;
-}
-
-static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
-{
-       *kiocb = (struct kiocb) {
-                       .ki_ctx = NULL,
-                       .ki_filp = filp,
-                       .ki_obj.tsk = current,
-               };
-}
-
 /* prototypes */
 #ifdef CONFIG_AIO
-extern ssize_t wait_on_sync_kiocb(struct kiocb *iocb);
-extern void aio_complete(struct kiocb *iocb, long res, long res2);
-struct mm_struct;
 extern void exit_aio(struct mm_struct *mm);
 extern long do_io_submit(aio_context_t ctx_id, long nr,
                         struct iocb __user *__user *iocbpp, bool compat);
 void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel);
 #else
-static inline ssize_t wait_on_sync_kiocb(struct kiocb *iocb) { return 0; }
-static inline void aio_complete(struct kiocb *iocb, long res, long res2) { }
-struct mm_struct;
 static inline void exit_aio(struct mm_struct *mm) { }
 static inline long do_io_submit(aio_context_t ctx_id, long nr,
                                struct iocb __user * __user *iocbpp,
@@ -89,11 +26,6 @@ static inline void kiocb_set_cancel_fn(struct kiocb *req,
                                       kiocb_cancel_fn *cancel) { }
 #endif /* CONFIG_AIO */
 
-static inline struct kiocb *list_kiocb(struct list_head *h)
-{
-       return list_entry(h, struct kiocb, ki_list);
-}
-
 /* for sysctl: */
 extern unsigned long aio_nr;
 extern unsigned long aio_max_nr;
index cab60661752237f736c817588d1d0e1a01469cdf..ae2982c0f7a60ed93339e767feaf1fc89aa02134 100644 (file)
@@ -11,6 +11,7 @@
 #define PHY_ID_BCM5421                 0x002060e0
 #define PHY_ID_BCM5464                 0x002060b0
 #define PHY_ID_BCM5461                 0x002060c0
+#define PHY_ID_BCM54616S               0x03625d10
 #define PHY_ID_BCM57780                        0x03625d90
 
 #define PHY_ID_BCM7250                 0xae025280
index f4131e8ead74965a73272949b3a9eae8fa08b5c7..fdce1ddf230cb95a413593978a04c84c02961b3c 100644 (file)
@@ -314,6 +314,28 @@ struct page;
 struct address_space;
 struct writeback_control;
 
+#define IOCB_EVENTFD           (1 << 0)
+
+struct kiocb {
+       struct file             *ki_filp;
+       loff_t                  ki_pos;
+       void (*ki_complete)(struct kiocb *iocb, long ret, long ret2);
+       void                    *private;
+       int                     ki_flags;
+};
+
+static inline bool is_sync_kiocb(struct kiocb *kiocb)
+{
+       return kiocb->ki_complete == NULL;
+}
+
+static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
+{
+       *kiocb = (struct kiocb) {
+               .ki_filp = filp,
+       };
+}
+
 /*
  * "descriptor" for what we're up to with a read.
  * This allows us to use the same read code yet
index e74114bcca686f73517c635320bbdeeddb011770..738ea48be889e670275616bae5063259ca3d61d9 100644 (file)
@@ -211,7 +211,7 @@ int sock_create(int family, int type, int proto, struct socket **res);
 int sock_create_kern(int family, int type, int proto, struct socket **res);
 int sock_create_lite(int family, int type, int proto, struct socket **res);
 void sock_release(struct socket *sock);
-int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len);
+int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
                 int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
index 13acb3d8ecdd7c7bd2d85768c303119dc4b1f6b3..b5679aed660b82896ca003a1aedfbf5a77e446af 100644 (file)
@@ -1313,6 +1313,8 @@ enum netdev_priv_flags {
  *     @base_addr:     Device I/O address
  *     @irq:           Device IRQ number
  *
+ *     @carrier_changes:       Stats to monitor carrier on<->off transitions
+ *
  *     @state:         Generic network queuing layer state, see netdev_state_t
  *     @dev_list:      The global list of network devices
  *     @napi_list:     List entry, that is used for polling napi devices
@@ -1346,8 +1348,6 @@ enum netdev_priv_flags {
  *     @tx_dropped:    Dropped packets by core network,
  *                     do not use this in drivers
  *
- *     @carrier_changes:       Stats to monitor carrier on<->off transitions
- *
  *     @wireless_handlers:     List of functions to handle Wireless Extensions,
  *                             instead of ioctl,
  *                             see <net/iw_handler.h> for details.
@@ -1390,14 +1390,14 @@ enum netdev_priv_flags {
  *     @dev_port:              Used to differentiate devices that share
  *                             the same function
  *     @addr_list_lock:        XXX: need comments on this one
- *     @uc:                    unicast mac addresses
- *     @mc:                    multicast mac addresses
- *     @dev_addrs:             list of device hw addresses
- *     @queues_kset:           Group of all Kobjects in the Tx and RX queues
  *     @uc_promisc:            Counter, that indicates, that promiscuous mode
  *                             has been enabled due to the need to listen to
  *                             additional unicast addresses in a device that
  *                             does not implement ndo_set_rx_mode()
+ *     @uc:                    unicast mac addresses
+ *     @mc:                    multicast mac addresses
+ *     @dev_addrs:             list of device hw addresses
+ *     @queues_kset:           Group of all Kobjects in the Tx and RX queues
  *     @promiscuity:           Number of times, the NIC is told to work in
  *                             Promiscuous mode, if it becomes 0 the NIC will
  *                             exit from working in Promiscuous mode
@@ -1427,6 +1427,12 @@ enum netdev_priv_flags {
  *     @ingress_queue:         XXX: need comments on this one
  *     @broadcast:             hw bcast address
  *
+ *     @rx_cpu_rmap:   CPU reverse-mapping for RX completion interrupts,
+ *                     indexed by RX queue number. Assigned by driver.
+ *                     This must only be set if the ndo_rx_flow_steer
+ *                     operation is defined
+ *     @index_hlist:           Device index hash chain
+ *
  *     @_tx:                   Array of TX queues
  *     @num_tx_queues:         Number of TX queues allocated at alloc_netdev_mq() time
  *     @real_num_tx_queues:    Number of TX queues currently active in device
@@ -1436,11 +1442,6 @@ enum netdev_priv_flags {
  *
  *     @xps_maps:      XXX: need comments on this one
  *
- *     @rx_cpu_rmap:   CPU reverse-mapping for RX completion interrupts,
- *                     indexed by RX queue number. Assigned by driver.
- *                     This must only be set if the ndo_rx_flow_steer
- *                     operation is defined
- *
  *     @trans_start:           Time (in jiffies) of last Tx
  *     @watchdog_timeo:        Represents the timeout that is used by
  *                             the watchdog ( see dev_watchdog() )
@@ -1448,7 +1449,6 @@ enum netdev_priv_flags {
  *
  *     @pcpu_refcnt:           Number of references to this device
  *     @todo_list:             Delayed register/unregister
- *     @index_hlist:           Device index hash chain
  *     @link_watch_list:       XXX: need comments on this one
  *
  *     @reg_state:             Register/unregister state machine
@@ -1515,6 +1515,8 @@ struct net_device {
        unsigned long           base_addr;
        int                     irq;
 
+       atomic_t                carrier_changes;
+
        /*
         *      Some hardware also needs these fields (state,dev_list,
         *      napi_list,unreg_list,close_list) but they are not
@@ -1555,8 +1557,6 @@ struct net_device {
        atomic_long_t           rx_dropped;
        atomic_long_t           tx_dropped;
 
-       atomic_t                carrier_changes;
-
 #ifdef CONFIG_WIRELESS_EXT
        const struct iw_handler_def *   wireless_handlers;
        struct iw_public_data * wireless_data;
@@ -1596,6 +1596,8 @@ struct net_device {
        unsigned short          dev_id;
        unsigned short          dev_port;
        spinlock_t              addr_list_lock;
+       unsigned char           name_assign_type;
+       bool                    uc_promisc;
        struct netdev_hw_addr_list      uc;
        struct netdev_hw_addr_list      mc;
        struct netdev_hw_addr_list      dev_addrs;
@@ -1603,10 +1605,6 @@ struct net_device {
 #ifdef CONFIG_SYSFS
        struct kset             *queues_kset;
 #endif
-
-       unsigned char           name_assign_type;
-
-       bool                    uc_promisc;
        unsigned int            promiscuity;
        unsigned int            allmulti;
 
@@ -1653,7 +1651,10 @@ struct net_device {
 
        struct netdev_queue __rcu *ingress_queue;
        unsigned char           broadcast[MAX_ADDR_LEN];
-
+#ifdef CONFIG_RFS_ACCEL
+       struct cpu_rmap         *rx_cpu_rmap;
+#endif
+       struct hlist_node       index_hlist;
 
 /*
  * Cache lines mostly used on transmit path
@@ -1664,13 +1665,11 @@ struct net_device {
        struct Qdisc            *qdisc;
        unsigned long           tx_queue_len;
        spinlock_t              tx_global_lock;
+       int                     watchdog_timeo;
 
 #ifdef CONFIG_XPS
        struct xps_dev_maps __rcu *xps_maps;
 #endif
-#ifdef CONFIG_RFS_ACCEL
-       struct cpu_rmap         *rx_cpu_rmap;
-#endif
 
        /* These may be needed for future network-power-down code. */
 
@@ -1680,13 +1679,11 @@ struct net_device {
         */
        unsigned long           trans_start;
 
-       int                     watchdog_timeo;
        struct timer_list       watchdog_timer;
 
        int __percpu            *pcpu_refcnt;
        struct list_head        todo_list;
 
-       struct hlist_node       index_hlist;
        struct list_head        link_watch_list;
 
        enum { NETREG_UNINITIALIZED=0,
@@ -1751,7 +1748,6 @@ struct net_device {
 #endif
        struct phy_device *phydev;
        struct lock_class_key *qdisc_tx_busylock;
-       struct pm_qos_request   pm_qos_req;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
index 02fc86d2348e2157d19cd4c16c4574ef86ac4d07..6835c1279df77328894ef9bc1396d786e7f29dec 100644 (file)
@@ -134,7 +134,7 @@ struct netlink_callback {
 
 struct netlink_notify {
        struct net *net;
-       int portid;
+       u32 portid;
        int protocol;
 };
 
index 5db76a32fcaba3d9656fa5de6d077780d1d07e46..2da5d1081ad990b57a07715ea0751ae10a80ac54 100644 (file)
@@ -77,7 +77,20 @@ static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
        return rtnl_dereference(dev->ingress_queue);
 }
 
-extern struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
+struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
+
+#ifdef CONFIG_NET_CLS_ACT
+void net_inc_ingress_queue(void);
+void net_dec_ingress_queue(void);
+#else
+static inline void net_inc_ingress_queue(void)
+{
+}
+
+static inline void net_dec_ingress_queue(void)
+{
+}
+#endif
 
 extern void rtnetlink_init(void);
 extern void __rtnl_unlock(void);
index c9852ef7e317a6167675f68e7948b6aebea85947..5bf59c8493b763c6dc11ccce18da9c0ac6c0da46 100644 (file)
@@ -139,6 +139,11 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
        return __cmsg_nxthdr(__msg->msg_control, __msg->msg_controllen, __cmsg);
 }
 
+static inline size_t msg_data_left(struct msghdr *msg)
+{
+       return iov_iter_count(&msg->msg_iter);
+}
+
 /* "Socket"-level control message types: */
 
 #define        SCM_RIGHTS      0x01            /* rw: access rights (array of int) */
index 71880299ed487b68dc7b278248a4fb29ddb6b6ec..1f4a37f1f025827c9a561a8e5a856680062227cc 100644 (file)
@@ -139,4 +139,18 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
 size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
 
+int import_iovec(int type, const struct iovec __user * uvector,
+                unsigned nr_segs, unsigned fast_segs,
+                struct iovec **iov, struct iov_iter *i);
+
+#ifdef CONFIG_COMPAT
+struct compat_iovec;
+int compat_import_iovec(int type, const struct compat_iovec __user * uvector,
+                unsigned nr_segs, unsigned fast_segs,
+                struct iovec **iov, struct iov_iter *i);
+#endif
+
+int import_single_range(int type, void __user *buf, size_t len,
+                struct iovec *iov, struct iov_iter *i);
+
 #endif
index 42a9c8431177c295276068e18967c1bce12e648e..48103cf94e976e9c13809cfec895e9bb1c9fa96c 100644 (file)
@@ -40,7 +40,7 @@ int compat_sock_get_timestampns(struct sock *, struct timespec __user *);
 #define compat_mmsghdr mmsghdr
 #endif /* defined(CONFIG_COMPAT) */
 
-ssize_t get_compat_msghdr(struct msghdr *, struct compat_msghdr __user *,
+int get_compat_msghdr(struct msghdr *, struct compat_msghdr __user *,
                      struct sockaddr __user **, struct iovec **);
 asmlinkage long compat_sys_sendmsg(int, struct compat_msghdr __user *,
                                   unsigned int);
index b7ce1003c429d43b0a69e45408eddd5fb35129e4..360c4802288db91a38b435bcf5b5d2eb71a8cd1f 100644 (file)
 
 struct inet_hashinfo;
 
-#define INET_TWDR_RECYCLE_SLOTS_LOG    5
-#define INET_TWDR_RECYCLE_SLOTS                (1 << INET_TWDR_RECYCLE_SLOTS_LOG)
-
-/*
- * If time > 4sec, it is "slow" path, no recycling is required,
- * so that we select tick to get range about 4 seconds.
- */
-#if HZ <= 16 || HZ > 4096
-# error Unsupported: HZ <= 16 or HZ > 4096
-#elif HZ <= 32
-# define INET_TWDR_RECYCLE_TICK (5 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 64
-# define INET_TWDR_RECYCLE_TICK (6 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 128
-# define INET_TWDR_RECYCLE_TICK (7 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 256
-# define INET_TWDR_RECYCLE_TICK (8 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 512
-# define INET_TWDR_RECYCLE_TICK (9 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 1024
-# define INET_TWDR_RECYCLE_TICK (10 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#elif HZ <= 2048
-# define INET_TWDR_RECYCLE_TICK (11 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#else
-# define INET_TWDR_RECYCLE_TICK (12 + 2 - INET_TWDR_RECYCLE_SLOTS_LOG)
-#endif
-
-static inline u32 inet_tw_time_stamp(void)
-{
-       return jiffies;
-}
-
-/* TIME_WAIT reaping mechanism. */
-#define INET_TWDR_TWKILL_SLOTS 8 /* Please keep this a power of 2. */
-
-#define INET_TWDR_TWKILL_QUOTA 100
-
 struct inet_timewait_death_row {
-       /* Short-time timewait calendar */
-       int                     twcal_hand;
-       unsigned long           twcal_jiffie;
-       struct timer_list       twcal_timer;
-       struct hlist_head       twcal_row[INET_TWDR_RECYCLE_SLOTS];
-
-       spinlock_t              death_lock;
-       int                     tw_count;
-       int                     period;
-       u32                     thread_slots;
-       struct work_struct      twkill_work;
-       struct timer_list       tw_timer;
-       int                     slot;
-       struct hlist_head       cells[INET_TWDR_TWKILL_SLOTS];
-       struct inet_hashinfo    *hashinfo;
+       atomic_t                tw_count;
+
+       struct inet_hashinfo    *hashinfo ____cacheline_aligned_in_smp;
        int                     sysctl_tw_recycle;
        int                     sysctl_max_tw_buckets;
 };
 
-void inet_twdr_hangman(unsigned long data);
-void inet_twdr_twkill_work(struct work_struct *work);
-void inet_twdr_twcal_tick(unsigned long data);
-
 struct inet_bind_bucket;
 
 /*
@@ -133,52 +80,18 @@ struct inet_timewait_sock {
        __be16                  tw_sport;
        kmemcheck_bitfield_begin(flags);
        /* And these are ours. */
-       unsigned int            tw_pad0         : 1,    /* 1 bit hole */
+       unsigned int            tw_kill         : 1,
                                tw_transparent  : 1,
                                tw_flowlabel    : 20,
                                tw_pad          : 2,    /* 2 bits hole */
                                tw_tos          : 8;
        kmemcheck_bitfield_end(flags);
-       u32                     tw_ttd;
+       struct timer_list       tw_timer;
        struct inet_bind_bucket *tw_tb;
-       struct hlist_node       tw_death_node;
+       struct inet_timewait_death_row *tw_dr;
 };
 #define tw_tclass tw_tos
 
-static inline int inet_twsk_dead_hashed(const struct inet_timewait_sock *tw)
-{
-       return !hlist_unhashed(&tw->tw_death_node);
-}
-
-static inline void inet_twsk_dead_node_init(struct inet_timewait_sock *tw)
-{
-       tw->tw_death_node.pprev = NULL;
-}
-
-static inline void __inet_twsk_del_dead_node(struct inet_timewait_sock *tw)
-{
-       __hlist_del(&tw->tw_death_node);
-       inet_twsk_dead_node_init(tw);
-}
-
-static inline int inet_twsk_del_dead_node(struct inet_timewait_sock *tw)
-{
-       if (inet_twsk_dead_hashed(tw)) {
-               __inet_twsk_del_dead_node(tw);
-               return 1;
-       }
-       return 0;
-}
-
-#define inet_twsk_for_each(tw, node, head) \
-       hlist_nulls_for_each_entry(tw, node, head, tw_node)
-
-#define inet_twsk_for_each_inmate(tw, jail) \
-       hlist_for_each_entry(tw, jail, tw_death_node)
-
-#define inet_twsk_for_each_inmate_safe(tw, safe, jail) \
-       hlist_for_each_entry_safe(tw, safe, jail, tw_death_node)
-
 static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk)
 {
        return (struct inet_timewait_sock *)sk;
@@ -193,16 +106,14 @@ int inet_twsk_bind_unhash(struct inet_timewait_sock *tw,
                          struct inet_hashinfo *hashinfo);
 
 struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
+                                          struct inet_timewait_death_row *dr,
                                           const int state);
 
 void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
                           struct inet_hashinfo *hashinfo);
 
-void inet_twsk_schedule(struct inet_timewait_sock *tw,
-                       struct inet_timewait_death_row *twdr,
-                       const int timeo, const int timewait_len);
-void inet_twsk_deschedule(struct inet_timewait_sock *tw,
-                         struct inet_timewait_death_row *twdr);
+void inet_twsk_schedule(struct inet_timewait_sock *tw, const int timeo);
+void inet_twsk_deschedule(struct inet_timewait_sock *tw);
 
 void inet_twsk_purge(struct inet_hashinfo *hashinfo,
                     struct inet_timewait_death_row *twdr, int family);
index bd6f523f2251a9efe255bb11c3d995df2b6f8db2..3a4898ec8c67c5242e9467a0d32f2e68f3a55302 100644 (file)
@@ -57,7 +57,6 @@
 #include <linux/page_counter.h>
 #include <linux/memcontrol.h>
 #include <linux/static_key.h>
-#include <linux/aio.h>
 #include <linux/sched.h>
 
 #include <linux/filter.h>
index f2902ef7ab75cdf2f81880b95db6fea4486bd461..4dce116bfd80c81ab09aec171817e082723ad74e 100644 (file)
@@ -47,7 +47,8 @@ struct rxrpc_header {
 #define RXRPC_PACKET_TYPE_CHALLENGE    6       /* connection security challenge (SRVR->CLNT) */
 #define RXRPC_PACKET_TYPE_RESPONSE     7       /* connection secutity response (CLNT->SRVR) */
 #define RXRPC_PACKET_TYPE_DEBUG                8       /* debug info request */
-#define RXRPC_N_PACKET_TYPES           9       /* number of packet types (incl type 0) */
+#define RXRPC_PACKET_TYPE_VERSION      13      /* version string request */
+#define RXRPC_N_PACKET_TYPES           14      /* number of packet types (incl type 0) */
 
        uint8_t         flags;          /* packet flags */
 #define RXRPC_CLIENT_INITIATED 0x01            /* signifies a packet generated by a client */
index bb0635bd74f26a2ecb9f651de9e0c4113e4f2476..879edfc5ee52d2985d4fb925ec820ba6b113d6d0 100644 (file)
@@ -32,7 +32,6 @@
 #include <linux/security.h>
 #include <linux/bootmem.h>
 #include <linux/memblock.h>
-#include <linux/aio.h>
 #include <linux/syscalls.h>
 #include <linux/kexec.h>
 #include <linux/kdb.h>
@@ -46,6 +45,7 @@
 #include <linux/irq_work.h>
 #include <linux/utsname.h>
 #include <linux/ctype.h>
+#include <linux/uio.h>
 
 #include <asm/uaccess.h>
 
@@ -521,7 +521,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
        int i;
        int level = default_message_loglevel;
        int facility = 1;       /* LOG_USER */
-       size_t len = iocb->ki_nbytes;
+       size_t len = iov_iter_count(from);
        ssize_t ret = len;
 
        if (len > LOG_LINE_MAX)
index ce410bb9f2e103e0fcfda7d7b844948a0a28fbce..4012336de30f6fe88688bc0366bba216af72ea9d 100644 (file)
@@ -19,6 +19,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/aio.h>
 #include <linux/mm.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
index 9d96e283520cc7f3ec27714dfa4abfcb3800e319..fc6e33f6b7f3376b365c1b04409eb23580b729e2 100644 (file)
@@ -766,3 +766,60 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
                                   flags);
 }
 EXPORT_SYMBOL(dup_iter);
+
+int import_iovec(int type, const struct iovec __user * uvector,
+                unsigned nr_segs, unsigned fast_segs,
+                struct iovec **iov, struct iov_iter *i)
+{
+       ssize_t n;
+       struct iovec *p;
+       n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
+                                 *iov, &p);
+       if (n < 0) {
+               if (p != *iov)
+                       kfree(p);
+               *iov = NULL;
+               return n;
+       }
+       iov_iter_init(i, type, p, nr_segs, n);
+       *iov = p == *iov ? NULL : p;
+       return 0;
+}
+EXPORT_SYMBOL(import_iovec);
+
+#ifdef CONFIG_COMPAT
+#include <linux/compat.h>
+
+int compat_import_iovec(int type, const struct compat_iovec __user * uvector,
+                unsigned nr_segs, unsigned fast_segs,
+                struct iovec **iov, struct iov_iter *i)
+{
+       ssize_t n;
+       struct iovec *p;
+       n = compat_rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
+                                 *iov, &p);
+       if (n < 0) {
+               if (p != *iov)
+                       kfree(p);
+               *iov = NULL;
+               return n;
+       }
+       iov_iter_init(i, type, p, nr_segs, n);
+       *iov = p == *iov ? NULL : p;
+       return 0;
+}
+#endif
+
+int import_single_range(int rw, void __user *buf, size_t len,
+                struct iovec *iov, struct iov_iter *i)
+{
+       if (len > MAX_RW_COUNT)
+               len = MAX_RW_COUNT;
+       if (unlikely(!access_ok(!rw, buf, len)))
+               return -EFAULT;
+
+       iov->iov_base = buf;
+       iov->iov_len = len;
+       iov_iter_init(i, rw, iov, 1, len);
+       return 0;
+}
index ad7242043bdb8b74872e536b61d01ca05a1de6b3..876f4e6f3ed6674537e5acad829ac43c7aa8e110 100644 (file)
@@ -13,7 +13,6 @@
 #include <linux/compiler.h>
 #include <linux/fs.h>
 #include <linux/uaccess.h>
-#include <linux/aio.h>
 #include <linux/capability.h>
 #include <linux/kernel_stat.h>
 #include <linux/gfp.h>
index e6045804c8d876db5c480d6c64e3c5f4e7bb7a84..a96c8562d83567466b6633dd169c84ff63418e66 100644 (file)
@@ -20,8 +20,8 @@
 #include <linux/buffer_head.h>
 #include <linux/writeback.h>
 #include <linux/frontswap.h>
-#include <linux/aio.h>
 #include <linux/blkdev.h>
+#include <linux/uio.h>
 #include <asm/pgtable.h>
 
 static struct bio *get_swap_bio(gfp_t gfp_flags,
@@ -274,7 +274,6 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
                iov_iter_bvec(&from, ITER_BVEC | WRITE, &bv, 1, PAGE_SIZE);
                init_sync_kiocb(&kiocb, swap_file);
                kiocb.ki_pos = page_file_offset(page);
-               kiocb.ki_nbytes = PAGE_SIZE;
 
                set_page_writeback(page);
                unlock_page(page);
index cf2d0ca010bc52efd5ea86c7f6ba760a5c3ef286..80b360c7bcd1696a77cf74f44b334ce53d367688 100644 (file)
@@ -31,7 +31,7 @@
 #include <linux/mm.h>
 #include <linux/export.h>
 #include <linux/swap.h>
-#include <linux/aio.h>
+#include <linux/uio.h>
 
 static struct vfsmount *shm_mnt;
 
index c4b6b0f43d5d4243d8c35274a4669333bd9f5f25..5cfd26a0006f07d15ee62a5660e0177745f454cf 100644 (file)
 #include <asm/uaccess.h>
 #include <net/compat.h>
 
-ssize_t get_compat_msghdr(struct msghdr *kmsg,
-                         struct compat_msghdr __user *umsg,
-                         struct sockaddr __user **save_addr,
-                         struct iovec **iov)
+int get_compat_msghdr(struct msghdr *kmsg,
+                     struct compat_msghdr __user *umsg,
+                     struct sockaddr __user **save_addr,
+                     struct iovec **iov)
 {
        compat_uptr_t uaddr, uiov, tmp3;
        compat_size_t nr_segs;
@@ -81,13 +81,9 @@ ssize_t get_compat_msghdr(struct msghdr *kmsg,
 
        kmsg->msg_iocb = NULL;
 
-       err = compat_rw_copy_check_uvector(save_addr ? READ : WRITE,
-                                          compat_ptr(uiov), nr_segs,
-                                          UIO_FASTIOV, *iov, iov);
-       if (err >= 0)
-               iov_iter_init(&kmsg->msg_iter, save_addr ? READ : WRITE,
-                             *iov, nr_segs, err);
-       return err;
+       return compat_import_iovec(save_addr ? READ : WRITE,
+                                  compat_ptr(uiov), nr_segs,
+                                  UIO_FASTIOV, iov, &kmsg->msg_iter);
 }
 
 /* Bleech... */
index df493d68330c03d1cb5b59e40d31294d7f45b3f8..b80fb91bb3f7e8dc630663cb5e012dc97ac6924f 100644 (file)
@@ -673,7 +673,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
        if (!chunk)
                return 0;
 
-       if (iov_iter_count(&msg->msg_iter) < chunk) {
+       if (msg_data_left(msg) < chunk) {
                if (__skb_checksum_complete(skb))
                        goto csum_error;
                if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
index b2775f06c7102a00afbd170daae05468d660b5b1..af4a1b0adc104f626c65507b19bd99c6dc3dd0eb 100644 (file)
@@ -1630,6 +1630,22 @@ int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 }
 EXPORT_SYMBOL(call_netdevice_notifiers);
 
+#ifdef CONFIG_NET_CLS_ACT
+static struct static_key ingress_needed __read_mostly;
+
+void net_inc_ingress_queue(void)
+{
+       static_key_slow_inc(&ingress_needed);
+}
+EXPORT_SYMBOL_GPL(net_inc_ingress_queue);
+
+void net_dec_ingress_queue(void)
+{
+       static_key_slow_dec(&ingress_needed);
+}
+EXPORT_SYMBOL_GPL(net_dec_ingress_queue);
+#endif
+
 static struct static_key netstamp_needed __read_mostly;
 #ifdef HAVE_JUMP_LABEL
 /* We are not allowed to call static_key_slow_dec() from irq context
@@ -3547,7 +3563,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
        struct netdev_queue *rxq = rcu_dereference(skb->dev->ingress_queue);
 
        if (!rxq || rcu_access_pointer(rxq->qdisc) == &noop_qdisc)
-               goto out;
+               return skb;
 
        if (*pt_prev) {
                *ret = deliver_skb(skb, *pt_prev, orig_dev);
@@ -3561,8 +3577,6 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
                return NULL;
        }
 
-out:
-       skb->tc_verd = 0;
        return skb;
 }
 #endif
@@ -3698,12 +3712,15 @@ another_round:
 
 skip_taps:
 #ifdef CONFIG_NET_CLS_ACT
-       skb = handle_ing(skb, &pt_prev, &ret, orig_dev);
-       if (!skb)
-               goto unlock;
+       if (static_key_false(&ingress_needed)) {
+               skb = handle_ing(skb, &pt_prev, &ret, orig_dev);
+               if (!skb)
+                       goto unlock;
+       }
+
+       skb->tc_verd = 0;
 ncls:
 #endif
-
        if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
                goto drop;
 
index 332f7d6d994291c2cd8cded425c8e89965556acf..5f566663e47f3faa8025a998b2d8ae976db58860 100644 (file)
 
 struct inet_timewait_death_row dccp_death_row = {
        .sysctl_max_tw_buckets = NR_FILE * 2,
-       .period         = DCCP_TIMEWAIT_LEN / INET_TWDR_TWKILL_SLOTS,
-       .death_lock     = __SPIN_LOCK_UNLOCKED(dccp_death_row.death_lock),
        .hashinfo       = &dccp_hashinfo,
-       .tw_timer       = TIMER_INITIALIZER(inet_twdr_hangman, 0,
-                                           (unsigned long)&dccp_death_row),
-       .twkill_work    = __WORK_INITIALIZER(dccp_death_row.twkill_work,
-                                            inet_twdr_twkill_work),
-/* Short-time timewait calendar */
-
-       .twcal_hand     = -1,
-       .twcal_timer    = TIMER_INITIALIZER(inet_twdr_twcal_tick, 0,
-                                           (unsigned long)&dccp_death_row),
 };
 
 EXPORT_SYMBOL_GPL(dccp_death_row);
 
 void dccp_time_wait(struct sock *sk, int state, int timeo)
 {
-       struct inet_timewait_sock *tw = NULL;
+       struct inet_timewait_sock *tw;
 
-       if (dccp_death_row.tw_count < dccp_death_row.sysctl_max_tw_buckets)
-               tw = inet_twsk_alloc(sk, state);
+       tw = inet_twsk_alloc(sk, &dccp_death_row, state);
 
        if (tw != NULL) {
                const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -71,8 +59,7 @@ void dccp_time_wait(struct sock *sk, int state, int timeo)
                if (state == DCCP_TIME_WAIT)
                        timeo = DCCP_TIMEWAIT_LEN;
 
-               inet_twsk_schedule(tw, &dccp_death_row, timeo,
-                                  DCCP_TIMEWAIT_LEN);
+               inet_twsk_schedule(tw, timeo);
                inet_twsk_put(tw);
        } else {
                /* Sorry, if we're out of memory, just CLOSE this
index 263710259774151e40fa67ba3aa9652d4a1e2955..af150b43b214123b052c43ec7e40449af3d7ecd2 100644 (file)
@@ -886,12 +886,12 @@ EXPORT_SYMBOL(gue_build_header);
 
 #ifdef CONFIG_NET_FOU_IP_TUNNELS
 
-static const struct ip_tunnel_encap_ops __read_mostly fou_iptun_ops = {
+static const struct ip_tunnel_encap_ops fou_iptun_ops = {
        .encap_hlen = fou_encap_hlen,
        .build_header = fou_build_header,
 };
 
-static const struct ip_tunnel_encap_ops __read_mostly gue_iptun_ops = {
+static const struct ip_tunnel_encap_ops gue_iptun_ops = {
        .encap_hlen = gue_encap_hlen,
        .build_header = gue_build_header,
 };
index b77f5e84c623f055fe277ea2178a29589fadaf1b..8986e63f3bda61a6c8ba980c050b96ec90625107 100644 (file)
@@ -113,10 +113,6 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
        int min_headroom;
        int err;
 
-       skb = udp_tunnel_handle_offloads(skb, csum);
-       if (IS_ERR(skb))
-               return PTR_ERR(skb);
-
        min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
                        + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
                        + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
@@ -131,6 +127,10 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
        if (unlikely(!skb))
                return -ENOMEM;
 
+       skb = udp_tunnel_handle_offloads(skb, csum);
+       if (IS_ERR(skb))
+               return PTR_ERR(skb);
+
        gnvh = (struct genevehdr *)__skb_push(skb, sizeof(*gnvh) + opt_len);
        geneve_build_header(gnvh, tun_flags, vni, opt_len, opt);
 
index 76322c9867d5eb1ffe7808c908e42208046888a7..70e8b3c308ec3bd5c3069d87558b8b24d644ed52 100644 (file)
@@ -248,7 +248,7 @@ static int inet_twsk_diag_fill(struct sock *sk,
        struct inet_timewait_sock *tw = inet_twsk(sk);
        struct inet_diag_msg *r;
        struct nlmsghdr *nlh;
-       s32 tmo;
+       long tmo;
 
        nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
                        nlmsg_flags);
@@ -258,7 +258,7 @@ static int inet_twsk_diag_fill(struct sock *sk,
        r = nlmsg_data(nlh);
        BUG_ON(tw->tw_state != TCP_TIME_WAIT);
 
-       tmo = tw->tw_ttd - inet_tw_time_stamp();
+       tmo = tw->tw_timer.expires - jiffies;
        if (tmo < 0)
                tmo = 0;
 
index d4630bf2d9aad1fd9070a11323b1cd0f7c0b9949..c6fb80bd5826ea840eebd033fb87d01c595ab120 100644 (file)
@@ -388,7 +388,7 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row,
                *twp = tw;
        } else if (tw) {
                /* Silly. Should hash-dance instead... */
-               inet_twsk_deschedule(tw, death_row);
+               inet_twsk_deschedule(tw);
 
                inet_twsk_put(tw);
        }
@@ -565,7 +565,7 @@ ok:
                spin_unlock(&head->lock);
 
                if (tw) {
-                       inet_twsk_deschedule(tw, death_row);
+                       inet_twsk_deschedule(tw);
                        while (twrefcnt) {
                                twrefcnt--;
                                inet_twsk_put(tw);
index 118f0f195820fa98554bafa5e1ddbd0da7c002c7..00ec8d5d7e7ee2f1c79dc7446127f13c7e23a331 100644 (file)
@@ -67,9 +67,9 @@ int inet_twsk_bind_unhash(struct inet_timewait_sock *tw,
 }
 
 /* Must be called with locally disabled BHs. */
-static void __inet_twsk_kill(struct inet_timewait_sock *tw,
-                            struct inet_hashinfo *hashinfo)
+static void inet_twsk_kill(struct inet_timewait_sock *tw)
 {
+       struct inet_hashinfo *hashinfo = tw->tw_dr->hashinfo;
        struct inet_bind_hashbucket *bhead;
        int refcnt;
        /* Unlink from established hashes. */
@@ -89,6 +89,8 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw,
 
        BUG_ON(refcnt >= atomic_read(&tw->tw_refcnt));
        atomic_sub(refcnt, &tw->tw_refcnt);
+       atomic_dec(&tw->tw_dr->tw_count);
+       inet_twsk_put(tw);
 }
 
 void inet_twsk_free(struct inet_timewait_sock *tw)
@@ -168,16 +170,34 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk,
 }
 EXPORT_SYMBOL_GPL(__inet_twsk_hashdance);
 
-struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int state)
+void tw_timer_handler(unsigned long data)
 {
-       struct inet_timewait_sock *tw =
-               kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab,
-                                GFP_ATOMIC);
+       struct inet_timewait_sock *tw = (struct inet_timewait_sock *)data;
+
+       if (tw->tw_kill)
+               NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED);
+       else
+               NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);
+       inet_twsk_kill(tw);
+}
+
+struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk,
+                                          struct inet_timewait_death_row *dr,
+                                          const int state)
+{
+       struct inet_timewait_sock *tw;
+
+       if (atomic_read(&dr->tw_count) >= dr->sysctl_max_tw_buckets)
+               return NULL;
+
+       tw = kmem_cache_alloc(sk->sk_prot_creator->twsk_prot->twsk_slab,
+                             GFP_ATOMIC);
        if (tw) {
                const struct inet_sock *inet = inet_sk(sk);
 
                kmemcheck_annotate_bitfield(tw, flags);
 
+               tw->tw_dr           = dr;
                /* Give us an identity. */
                tw->tw_daddr        = inet->inet_daddr;
                tw->tw_rcv_saddr    = inet->inet_rcv_saddr;
@@ -196,13 +216,14 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int stat
                tw->tw_prot         = sk->sk_prot_creator;
                atomic64_set(&tw->tw_cookie, atomic64_read(&sk->sk_cookie));
                twsk_net_set(tw, sock_net(sk));
+               setup_timer(&tw->tw_timer, tw_timer_handler, (unsigned long)tw);
                /*
                 * Because we use RCU lookups, we should not set tw_refcnt
                 * to a non null value before everything is setup for this
                 * timewait socket.
                 */
                atomic_set(&tw->tw_refcnt, 0);
-               inet_twsk_dead_node_init(tw);
+
                __module_get(tw->tw_prot->owner);
        }
 
@@ -210,139 +231,20 @@ struct inet_timewait_sock *inet_twsk_alloc(const struct sock *sk, const int stat
 }
 EXPORT_SYMBOL_GPL(inet_twsk_alloc);
 
-/* Returns non-zero if quota exceeded.  */
-static int inet_twdr_do_twkill_work(struct inet_timewait_death_row *twdr,
-                                   const int slot)
-{
-       struct inet_timewait_sock *tw;
-       unsigned int killed;
-       int ret;
-
-       /* NOTE: compare this to previous version where lock
-        * was released after detaching chain. It was racy,
-        * because tw buckets are scheduled in not serialized context
-        * in 2.3 (with netfilter), and with softnet it is common, because
-        * soft irqs are not sequenced.
-        */
-       killed = 0;
-       ret = 0;
-rescan:
-       inet_twsk_for_each_inmate(tw, &twdr->cells[slot]) {
-               __inet_twsk_del_dead_node(tw);
-               spin_unlock(&twdr->death_lock);
-               __inet_twsk_kill(tw, twdr->hashinfo);
-#ifdef CONFIG_NET_NS
-               NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);
-#endif
-               inet_twsk_put(tw);
-               killed++;
-               spin_lock(&twdr->death_lock);
-               if (killed > INET_TWDR_TWKILL_QUOTA) {
-                       ret = 1;
-                       break;
-               }
-
-               /* While we dropped twdr->death_lock, another cpu may have
-                * killed off the next TW bucket in the list, therefore
-                * do a fresh re-read of the hlist head node with the
-                * lock reacquired.  We still use the hlist traversal
-                * macro in order to get the prefetches.
-                */
-               goto rescan;
-       }
-
-       twdr->tw_count -= killed;
-#ifndef CONFIG_NET_NS
-       NET_ADD_STATS_BH(&init_net, LINUX_MIB_TIMEWAITED, killed);
-#endif
-       return ret;
-}
-
-void inet_twdr_hangman(unsigned long data)
-{
-       struct inet_timewait_death_row *twdr;
-       unsigned int need_timer;
-
-       twdr = (struct inet_timewait_death_row *)data;
-       spin_lock(&twdr->death_lock);
-
-       if (twdr->tw_count == 0)
-               goto out;
-
-       need_timer = 0;
-       if (inet_twdr_do_twkill_work(twdr, twdr->slot)) {
-               twdr->thread_slots |= (1 << twdr->slot);
-               schedule_work(&twdr->twkill_work);
-               need_timer = 1;
-       } else {
-               /* We purged the entire slot, anything left?  */
-               if (twdr->tw_count)
-                       need_timer = 1;
-               twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1));
-       }
-       if (need_timer)
-               mod_timer(&twdr->tw_timer, jiffies + twdr->period);
-out:
-       spin_unlock(&twdr->death_lock);
-}
-EXPORT_SYMBOL_GPL(inet_twdr_hangman);
-
-void inet_twdr_twkill_work(struct work_struct *work)
-{
-       struct inet_timewait_death_row *twdr =
-               container_of(work, struct inet_timewait_death_row, twkill_work);
-       int i;
-
-       BUILD_BUG_ON((INET_TWDR_TWKILL_SLOTS - 1) >
-                       (sizeof(twdr->thread_slots) * 8));
-
-       while (twdr->thread_slots) {
-               spin_lock_bh(&twdr->death_lock);
-               for (i = 0; i < INET_TWDR_TWKILL_SLOTS; i++) {
-                       if (!(twdr->thread_slots & (1 << i)))
-                               continue;
-
-                       while (inet_twdr_do_twkill_work(twdr, i) != 0) {
-                               if (need_resched()) {
-                                       spin_unlock_bh(&twdr->death_lock);
-                                       schedule();
-                                       spin_lock_bh(&twdr->death_lock);
-                               }
-                       }
-
-                       twdr->thread_slots &= ~(1 << i);
-               }
-               spin_unlock_bh(&twdr->death_lock);
-       }
-}
-EXPORT_SYMBOL_GPL(inet_twdr_twkill_work);
-
 /* These are always called from BH context.  See callers in
  * tcp_input.c to verify this.
  */
 
 /* This is for handling early-kills of TIME_WAIT sockets. */
-void inet_twsk_deschedule(struct inet_timewait_sock *tw,
-                         struct inet_timewait_death_row *twdr)
+void inet_twsk_deschedule(struct inet_timewait_sock *tw)
 {
-       spin_lock(&twdr->death_lock);
-       if (inet_twsk_del_dead_node(tw)) {
-               inet_twsk_put(tw);
-               if (--twdr->tw_count == 0)
-                       del_timer(&twdr->tw_timer);
-       }
-       spin_unlock(&twdr->death_lock);
-       __inet_twsk_kill(tw, twdr->hashinfo);
+       if (del_timer_sync(&tw->tw_timer))
+               inet_twsk_kill(tw);
 }
 EXPORT_SYMBOL(inet_twsk_deschedule);
 
-void inet_twsk_schedule(struct inet_timewait_sock *tw,
-                      struct inet_timewait_death_row *twdr,
-                      const int timeo, const int timewait_len)
+void inet_twsk_schedule(struct inet_timewait_sock *tw, const int timeo)
 {
-       struct hlist_head *list;
-       int slot;
-
        /* timeout := RTO * 3.5
         *
         * 3.5 = 1+2+0.5 to wait for two retransmits.
@@ -367,115 +269,15 @@ void inet_twsk_schedule(struct inet_timewait_sock *tw,
         * is greater than TS tick!) and detect old duplicates with help
         * of PAWS.
         */
-       slot = (timeo + (1 << INET_TWDR_RECYCLE_TICK) - 1) >> INET_TWDR_RECYCLE_TICK;
 
-       spin_lock(&twdr->death_lock);
-
-       /* Unlink it, if it was scheduled */
-       if (inet_twsk_del_dead_node(tw))
-               twdr->tw_count--;
-       else
+       tw->tw_kill = timeo <= 4*HZ;
+       if (!mod_timer_pinned(&tw->tw_timer, jiffies + timeo)) {
                atomic_inc(&tw->tw_refcnt);
-
-       if (slot >= INET_TWDR_RECYCLE_SLOTS) {
-               /* Schedule to slow timer */
-               if (timeo >= timewait_len) {
-                       slot = INET_TWDR_TWKILL_SLOTS - 1;
-               } else {
-                       slot = DIV_ROUND_UP(timeo, twdr->period);
-                       if (slot >= INET_TWDR_TWKILL_SLOTS)
-                               slot = INET_TWDR_TWKILL_SLOTS - 1;
-               }
-               tw->tw_ttd = inet_tw_time_stamp() + timeo;
-               slot = (twdr->slot + slot) & (INET_TWDR_TWKILL_SLOTS - 1);
-               list = &twdr->cells[slot];
-       } else {
-               tw->tw_ttd = inet_tw_time_stamp() + (slot << INET_TWDR_RECYCLE_TICK);
-
-               if (twdr->twcal_hand < 0) {
-                       twdr->twcal_hand = 0;
-                       twdr->twcal_jiffie = jiffies;
-                       twdr->twcal_timer.expires = twdr->twcal_jiffie +
-                                             (slot << INET_TWDR_RECYCLE_TICK);
-                       add_timer(&twdr->twcal_timer);
-               } else {
-                       if (time_after(twdr->twcal_timer.expires,
-                                      jiffies + (slot << INET_TWDR_RECYCLE_TICK)))
-                               mod_timer(&twdr->twcal_timer,
-                                         jiffies + (slot << INET_TWDR_RECYCLE_TICK));
-                       slot = (twdr->twcal_hand + slot) & (INET_TWDR_RECYCLE_SLOTS - 1);
-               }
-               list = &twdr->twcal_row[slot];
+               atomic_inc(&tw->tw_dr->tw_count);
        }
-
-       hlist_add_head(&tw->tw_death_node, list);
-
-       if (twdr->tw_count++ == 0)
-               mod_timer(&twdr->tw_timer, jiffies + twdr->period);
-       spin_unlock(&twdr->death_lock);
 }
 EXPORT_SYMBOL_GPL(inet_twsk_schedule);
 
-void inet_twdr_twcal_tick(unsigned long data)
-{
-       struct inet_timewait_death_row *twdr;
-       int n, slot;
-       unsigned long j;
-       unsigned long now = jiffies;
-       int killed = 0;
-       int adv = 0;
-
-       twdr = (struct inet_timewait_death_row *)data;
-
-       spin_lock(&twdr->death_lock);
-       if (twdr->twcal_hand < 0)
-               goto out;
-
-       slot = twdr->twcal_hand;
-       j = twdr->twcal_jiffie;
-
-       for (n = 0; n < INET_TWDR_RECYCLE_SLOTS; n++) {
-               if (time_before_eq(j, now)) {
-                       struct hlist_node *safe;
-                       struct inet_timewait_sock *tw;
-
-                       inet_twsk_for_each_inmate_safe(tw, safe,
-                                                      &twdr->twcal_row[slot]) {
-                               __inet_twsk_del_dead_node(tw);
-                               __inet_twsk_kill(tw, twdr->hashinfo);
-#ifdef CONFIG_NET_NS
-                               NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITKILLED);
-#endif
-                               inet_twsk_put(tw);
-                               killed++;
-                       }
-               } else {
-                       if (!adv) {
-                               adv = 1;
-                               twdr->twcal_jiffie = j;
-                               twdr->twcal_hand = slot;
-                       }
-
-                       if (!hlist_empty(&twdr->twcal_row[slot])) {
-                               mod_timer(&twdr->twcal_timer, j);
-                               goto out;
-                       }
-               }
-               j += 1 << INET_TWDR_RECYCLE_TICK;
-               slot = (slot + 1) & (INET_TWDR_RECYCLE_SLOTS - 1);
-       }
-       twdr->twcal_hand = -1;
-
-out:
-       if ((twdr->tw_count -= killed) == 0)
-               del_timer(&twdr->tw_timer);
-#ifndef CONFIG_NET_NS
-       NET_ADD_STATS_BH(&init_net, LINUX_MIB_TIMEWAITKILLED, killed);
-#endif
-       spin_unlock(&twdr->death_lock);
-}
-EXPORT_SYMBOL_GPL(inet_twdr_twcal_tick);
-
 void inet_twsk_purge(struct inet_hashinfo *hashinfo,
                     struct inet_timewait_death_row *twdr, int family)
 {
@@ -509,7 +311,7 @@ restart:
 
                        rcu_read_unlock();
                        local_bh_disable();
-                       inet_twsk_deschedule(tw, twdr);
+                       inet_twsk_deschedule(tw);
                        local_bh_enable();
                        inet_twsk_put(tw);
                        goto restart_rcu;
index d8953ef0770ca14bbb9c883a18354bcac321db7b..e1f3b911dd1e3739a63e38b63a1b9a7b29bfd7f0 100644 (file)
@@ -63,7 +63,7 @@ static int sockstat_seq_show(struct seq_file *seq, void *v)
        socket_seq_show(seq);
        seq_printf(seq, "TCP: inuse %d orphan %d tw %d alloc %d mem %ld\n",
                   sock_prot_inuse_get(net, &tcp_prot), orphans,
-                  tcp_death_row.tw_count, sockets,
+                  atomic_read(&tcp_death_row.tw_count), sockets,
                   proto_memory_allocated(&tcp_prot));
        seq_printf(seq, "UDP: inuse %d mem %ld\n",
                   sock_prot_inuse_get(net, &udp_prot),
index c0bb648fb2f98dc5804e413b211276033465a872..561cd4b8fc6e07b49222d788651e93c42ee3adcb 100644 (file)
@@ -46,7 +46,6 @@
 #include <linux/stddef.h>
 #include <linux/slab.h>
 #include <linux/errno.h>
-#include <linux/aio.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/spinlock.h>
index 094a6822c71d8cc69b1be28a9c6bb511f8f8b87b..18e3a12eb1b283bd370bdb4c16f5969e30bcec15 100644 (file)
@@ -1119,7 +1119,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 
        sg = !!(sk->sk_route_caps & NETIF_F_SG);
 
-       while (iov_iter_count(&msg->msg_iter)) {
+       while (msg_data_left(msg)) {
                int copy = 0;
                int max = size_goal;
 
@@ -1163,8 +1163,8 @@ new_segment:
                }
 
                /* Try to append data to the end of skb. */
-               if (copy > iov_iter_count(&msg->msg_iter))
-                       copy = iov_iter_count(&msg->msg_iter);
+               if (copy > msg_data_left(msg))
+                       copy = msg_data_left(msg);
 
                /* Where to copy to? */
                if (skb_availroom(skb) > 0) {
@@ -1221,7 +1221,7 @@ new_segment:
                tcp_skb_pcount_set(skb, 0);
 
                copied += copy;
-               if (!iov_iter_count(&msg->msg_iter)) {
+               if (!msg_data_left(msg)) {
                        tcp_tx_timestamp(sk, skb);
                        goto out;
                }
index 031cf72cd05c8094de8a9a76cd4cffa13e45c0d5..a7ef679dd3ea434e5b3bda48d391e1a872ed2478 100644 (file)
@@ -3099,17 +3099,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
                        if (sacked & TCPCB_SACKED_RETRANS)
                                tp->retrans_out -= acked_pcount;
                        flag |= FLAG_RETRANS_DATA_ACKED;
-               } else {
+               } else if (!(sacked & TCPCB_SACKED_ACKED)) {
                        last_ackt = skb->skb_mstamp;
                        WARN_ON_ONCE(last_ackt.v64 == 0);
                        if (!first_ackt.v64)
                                first_ackt = last_ackt;
 
-                       if (!(sacked & TCPCB_SACKED_ACKED)) {
-                               reord = min(pkts_acked, reord);
-                               if (!after(scb->end_seq, tp->high_seq))
-                                       flag |= FLAG_ORIG_SACK_ACKED;
-                       }
+                       reord = min(pkts_acked, reord);
+                       if (!after(scb->end_seq, tp->high_seq))
+                               flag |= FLAG_ORIG_SACK_ACKED;
                }
 
                if (sacked & TCPCB_SACKED_ACKED)
index 37578d52897e58942b6e6b8ef4e8db6f0f245b85..3571f2be4470749b9a7c8178fa43a6bf2bac33d8 100644 (file)
@@ -1685,7 +1685,7 @@ do_time_wait:
                                                        iph->daddr, th->dest,
                                                        inet_iif(skb));
                if (sk2) {
-                       inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row);
+                       inet_twsk_deschedule(inet_twsk(sk));
                        inet_twsk_put(inet_twsk(sk));
                        sk = sk2;
                        goto process;
@@ -2242,9 +2242,9 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
 static void get_timewait4_sock(const struct inet_timewait_sock *tw,
                               struct seq_file *f, int i)
 {
+       long delta = tw->tw_timer.expires - jiffies;
        __be32 dest, src;
        __u16 destp, srcp;
-       s32 delta = tw->tw_ttd - inet_tw_time_stamp();
 
        dest  = tw->tw_daddr;
        src   = tw->tw_rcv_saddr;
index 2088fdcca14140f23aa01f60a1675c116f734a57..63d6311b5365944fae0bdaa0f7abfbc51f55a429 100644 (file)
@@ -34,18 +34,7 @@ int sysctl_tcp_abort_on_overflow __read_mostly;
 
 struct inet_timewait_death_row tcp_death_row = {
        .sysctl_max_tw_buckets = NR_FILE * 2,
-       .period         = TCP_TIMEWAIT_LEN / INET_TWDR_TWKILL_SLOTS,
-       .death_lock     = __SPIN_LOCK_UNLOCKED(tcp_death_row.death_lock),
        .hashinfo       = &tcp_hashinfo,
-       .tw_timer       = TIMER_INITIALIZER(inet_twdr_hangman, 0,
-                                           (unsigned long)&tcp_death_row),
-       .twkill_work    = __WORK_INITIALIZER(tcp_death_row.twkill_work,
-                                            inet_twdr_twkill_work),
-/* Short-time timewait calendar */
-
-       .twcal_hand     = -1,
-       .twcal_timer    = TIMER_INITIALIZER(inet_twdr_twcal_tick, 0,
-                                           (unsigned long)&tcp_death_row),
 };
 EXPORT_SYMBOL_GPL(tcp_death_row);
 
@@ -158,7 +147,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
                if (!th->fin ||
                    TCP_SKB_CB(skb)->end_seq != tcptw->tw_rcv_nxt + 1) {
 kill_with_rst:
-                       inet_twsk_deschedule(tw, &tcp_death_row);
+                       inet_twsk_deschedule(tw);
                        inet_twsk_put(tw);
                        return TCP_TW_RST;
                }
@@ -174,11 +163,9 @@ kill_with_rst:
                if (tcp_death_row.sysctl_tw_recycle &&
                    tcptw->tw_ts_recent_stamp &&
                    tcp_tw_remember_stamp(tw))
-                       inet_twsk_schedule(tw, &tcp_death_row, tw->tw_timeout,
-                                          TCP_TIMEWAIT_LEN);
+                       inet_twsk_schedule(tw, tw->tw_timeout);
                else
-                       inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,
-                                          TCP_TIMEWAIT_LEN);
+                       inet_twsk_schedule(tw, TCP_TIMEWAIT_LEN);
                return TCP_TW_ACK;
        }
 
@@ -211,13 +198,12 @@ kill_with_rst:
                         */
                        if (sysctl_tcp_rfc1337 == 0) {
 kill:
-                               inet_twsk_deschedule(tw, &tcp_death_row);
+                               inet_twsk_deschedule(tw);
                                inet_twsk_put(tw);
                                return TCP_TW_SUCCESS;
                        }
                }
-               inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,
-                                  TCP_TIMEWAIT_LEN);
+               inet_twsk_schedule(tw, TCP_TIMEWAIT_LEN);
 
                if (tmp_opt.saw_tstamp) {
                        tcptw->tw_ts_recent       = tmp_opt.rcv_tsval;
@@ -267,8 +253,7 @@ kill:
                 * Do not reschedule in the last case.
                 */
                if (paws_reject || th->ack)
-                       inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,
-                                          TCP_TIMEWAIT_LEN);
+                       inet_twsk_schedule(tw, TCP_TIMEWAIT_LEN);
 
                return tcp_timewait_check_oow_rate_limit(
                        tw, skb, LINUX_MIB_TCPACKSKIPPEDTIMEWAIT);
@@ -283,16 +268,15 @@ EXPORT_SYMBOL(tcp_timewait_state_process);
  */
 void tcp_time_wait(struct sock *sk, int state, int timeo)
 {
-       struct inet_timewait_sock *tw = NULL;
        const struct inet_connection_sock *icsk = inet_csk(sk);
        const struct tcp_sock *tp = tcp_sk(sk);
+       struct inet_timewait_sock *tw;
        bool recycle_ok = false;
 
        if (tcp_death_row.sysctl_tw_recycle && tp->rx_opt.ts_recent_stamp)
                recycle_ok = tcp_remember_stamp(sk);
 
-       if (tcp_death_row.tw_count < tcp_death_row.sysctl_max_tw_buckets)
-               tw = inet_twsk_alloc(sk, state);
+       tw = inet_twsk_alloc(sk, &tcp_death_row, state);
 
        if (tw) {
                struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
@@ -355,8 +339,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo)
                                timeo = TCP_TIMEWAIT_LEN;
                }
 
-               inet_twsk_schedule(tw, &tcp_death_row, timeo,
-                                  TCP_TIMEWAIT_LEN);
+               inet_twsk_schedule(tw, timeo);
                inet_twsk_put(tw);
        } else {
                /* Sorry, if we're out of memory, just CLOSE this
index e662d85d1635d0269b669bb0f726760be3bae0d2..8c8d7e06b72fc1e5c4a50ca55136757f0501f8c0 100644 (file)
@@ -2994,6 +2994,8 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
        rcu_read_unlock();
 #endif
 
+       /* Do not fool tcpdump (if any), clean our debris */
+       skb->tstamp.tv64 = 0;
        return skb;
 }
 EXPORT_SYMBOL(tcp_make_synack);
index 033f17816ef4cf482d40eb496129a7d832c0a251..871641bc1ed4eb5b8f5f554c9efb75f2419fe5b6 100644 (file)
@@ -246,7 +246,7 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row,
                *twp = tw;
        } else if (tw) {
                /* Silly. Should hash-dance instead... */
-               inet_twsk_deschedule(tw, death_row);
+               inet_twsk_deschedule(tw);
 
                inet_twsk_put(tw);
        }
index b53148444e157f821c86b467b166fc9ce7bd5ccb..ed9d681207fa340881fd100db0ea1cb3eb9a2ffb 100644 (file)
@@ -288,8 +288,7 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
 static void vti6_dev_uninit(struct net_device *dev)
 {
        struct ip6_tnl *t = netdev_priv(dev);
-       struct net *net = dev_net(dev);
-       struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+       struct vti6_net *ip6n = net_generic(t->net, vti6_net_id);
 
        if (dev == ip6n->fb_tnl_dev)
                RCU_INIT_POINTER(ip6n->tnls_wc[0], NULL);
index f73a97f6e68ec8286972fadcf9328e29af123242..ad51df85aa00dda56b86dca24ed5fe9a3b5c791e 100644 (file)
@@ -1486,7 +1486,7 @@ do_time_wait:
                                            ntohs(th->dest), tcp_v6_iif(skb));
                if (sk2) {
                        struct inet_timewait_sock *tw = inet_twsk(sk);
-                       inet_twsk_deschedule(tw, &tcp_death_row);
+                       inet_twsk_deschedule(tw);
                        inet_twsk_put(tw);
                        sk = sk2;
                        tcp_v6_restore_cb(skb);
@@ -1728,9 +1728,9 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 static void get_timewait6_sock(struct seq_file *seq,
                               struct inet_timewait_sock *tw, int i)
 {
+       long delta = tw->tw_timer.expires - jiffies;
        const struct in6_addr *dest, *src;
        __u16 destp, srcp;
-       s32 delta = tw->tw_ttd - inet_tw_time_stamp();
 
        dest = &tw->tw_v6_daddr;
        src  = &tw->tw_v6_rcv_saddr;
index 51afea4b0af78a46c41099cd55ca9506f3636835..3ad91266c821489500fbc8cbbcfc7bfd774b6f48 100644 (file)
@@ -63,7 +63,7 @@ struct nfulnl_instance {
        struct timer_list timer;
        struct net *net;
        struct user_namespace *peer_user_ns;    /* User namespace of the peer process */
-       int peer_portid;                        /* PORTID of the peer process */
+       u32 peer_portid;                /* PORTID of the peer process */
 
        /* configurable parameters */
        unsigned int flushtimeout;      /* timeout until queue flush */
@@ -152,7 +152,7 @@ static void nfulnl_timer(unsigned long data);
 
 static struct nfulnl_instance *
 instance_create(struct net *net, u_int16_t group_num,
-               int portid, struct user_namespace *user_ns)
+               u32 portid, struct user_namespace *user_ns)
 {
        struct nfulnl_instance *inst;
        struct nfnl_log_net *log = nfnl_log_pernet(net);
@@ -1007,7 +1007,7 @@ static int seq_show(struct seq_file *s, void *v)
 {
        const struct nfulnl_instance *inst = v;
 
-       seq_printf(s, "%5d %6d %5d %1d %5d %6d %2d\n",
+       seq_printf(s, "%5u %6u %5u %1u %5u %6u %2u\n",
                   inst->group_num,
                   inst->peer_portid, inst->qlen,
                   inst->copy_mode, inst->copy_range,
index 628afc350c025f7012fa927c03ec3bdc6b3b6a2c..0b98c74202390ae79598ceb955360f937bb9556d 100644 (file)
@@ -55,7 +55,7 @@ struct nfqnl_instance {
        struct hlist_node hlist;                /* global list of queues */
        struct rcu_head rcu;
 
-       int peer_portid;
+       u32 peer_portid;
        unsigned int queue_maxlen;
        unsigned int copy_range;
        unsigned int queue_dropped;
@@ -110,8 +110,7 @@ instance_lookup(struct nfnl_queue_net *q, u_int16_t queue_num)
 }
 
 static struct nfqnl_instance *
-instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
-               int portid)
+instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
 {
        struct nfqnl_instance *inst;
        unsigned int h;
@@ -870,7 +869,7 @@ static const struct nla_policy nfqa_verdict_batch_policy[NFQA_MAX+1] = {
 };
 
 static struct nfqnl_instance *
-verdict_instance_lookup(struct nfnl_queue_net *q, u16 queue_num, int nlportid)
+verdict_instance_lookup(struct nfnl_queue_net *q, u16 queue_num, u32 nlportid)
 {
        struct nfqnl_instance *queue;
 
@@ -1252,7 +1251,7 @@ static int seq_show(struct seq_file *s, void *v)
 {
        const struct nfqnl_instance *inst = v;
 
-       seq_printf(s, "%5d %6d %5d %1d %5d %5d %5d %8d %2d\n",
+       seq_printf(s, "%5u %6u %5u %1u %5u %5u %5u %8u %2d\n",
                   inst->queue_num,
                   inst->peer_portid, inst->queue_total,
                   inst->copy_mode, inst->copy_range,
index c205b26a2beea67d4eec9a0d22d18f97c73edd8d..cca96cec1b689fcd104e273a64db6eda44171beb 100644 (file)
@@ -272,7 +272,7 @@ tproxy_handle_time_wait4(struct sk_buff *skb, __be32 laddr, __be16 lport,
                                            hp->source, lport ? lport : hp->dest,
                                            skb->dev, NFT_LOOKUP_LISTENER);
                if (sk2) {
-                       inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row);
+                       inet_twsk_deschedule(inet_twsk(sk));
                        inet_twsk_put(inet_twsk(sk));
                        sk = sk2;
                }
@@ -437,7 +437,7 @@ tproxy_handle_time_wait6(struct sk_buff *skb, int tproto, int thoff,
                                            tgi->lport ? tgi->lport : hp->dest,
                                            skb->dev, NFT_LOOKUP_LISTENER);
                if (sk2) {
-                       inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row);
+                       inet_twsk_deschedule(inet_twsk(sk));
                        inet_twsk_put(inet_twsk(sk));
                        sk = sk2;
                }
index 14a2d11581da7ededf0eff4ca09293a277853b2d..3763036710aedfc38768793c44388b70375ae71b 100644 (file)
@@ -1584,7 +1584,7 @@ static const struct genl_ops nfc_genl_ops[] = {
 
 struct urelease_work {
        struct  work_struct w;
-       int     portid;
+       u32     portid;
 };
 
 static void nfc_urelease_event_work(struct work_struct *work)
index 378c3a6acf84cab59346ab832d2fffba33e6e543..14f041398ca1744ea7596decaad7145184c7df0c 100644 (file)
@@ -130,7 +130,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
        rcu_read_lock();
        conn = rds_conn_lookup(head, laddr, faddr, trans);
        if (conn && conn->c_loopback && conn->c_trans != &rds_loop_transport &&
-           !is_outgoing) {
+           laddr == faddr && !is_outgoing) {
                /* This is a looped back IB connection, and we're
                 * called by the code handling the incoming connect.
                 * We need a second connection object into which we
@@ -193,6 +193,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
        }
 
        atomic_set(&conn->c_state, RDS_CONN_DOWN);
+       conn->c_send_gen = 0;
        conn->c_reconnect_jiffies = 0;
        INIT_DELAYED_WORK(&conn->c_send_w, rds_send_worker);
        INIT_DELAYED_WORK(&conn->c_recv_w, rds_recv_worker);
index c3f2855c3d8432272f7899608513a499558d9ad8..0d41155a2258cbbd16e19171c3daa376e3a83877 100644 (file)
@@ -110,6 +110,7 @@ struct rds_connection {
        void                    *c_transport_data;
 
        atomic_t                c_state;
+       unsigned long           c_send_gen;
        unsigned long           c_flags;
        unsigned long           c_reconnect_jiffies;
        struct delayed_work     c_send_w;
index 44672befc0ee29a3e04ca01768c087fd0abd2f36..e9430f537f9c2bb23bbaeeb66933e1e85058bd34 100644 (file)
@@ -140,8 +140,11 @@ int rds_send_xmit(struct rds_connection *conn)
        struct scatterlist *sg;
        int ret = 0;
        LIST_HEAD(to_be_dropped);
+       int batch_count;
+       unsigned long send_gen = 0;
 
 restart:
+       batch_count = 0;
 
        /*
         * sendmsg calls here after having queued its message on the send
@@ -156,6 +159,17 @@ restart:
                goto out;
        }
 
+       /*
+        * we record the send generation after doing the xmit acquire.
+        * if someone else manages to jump in and do some work, we'll use
+        * this to avoid a goto restart farther down.
+        *
+        * The acquire_in_xmit() check above ensures that only one
+        * caller can increment c_send_gen at any time.
+        */
+       conn->c_send_gen++;
+       send_gen = conn->c_send_gen;
+
        /*
         * rds_conn_shutdown() sets the conn state and then tests RDS_IN_XMIT,
         * we do the opposite to avoid races.
@@ -202,6 +216,16 @@ restart:
                if (!rm) {
                        unsigned int len;
 
+                       batch_count++;
+
+                       /* we want to process as big a batch as we can, but
+                        * we also want to avoid softlockups.  If we've been
+                        * through a lot of messages, lets back off and see
+                        * if anyone else jumps in
+                        */
+                       if (batch_count >= 1024)
+                               goto over_batch;
+
                        spin_lock_irqsave(&conn->c_lock, flags);
 
                        if (!list_empty(&conn->c_send_queue)) {
@@ -357,9 +381,9 @@ restart:
                }
        }
 
+over_batch:
        if (conn->c_trans->xmit_complete)
                conn->c_trans->xmit_complete(conn);
-
        release_in_xmit(conn);
 
        /* Nuke any messages we decided not to retransmit. */
@@ -380,10 +404,15 @@ restart:
         * If the transport cannot continue (i.e ret != 0), then it must
         * call us when more room is available, such as from the tx
         * completion handler.
+        *
+        * We have an extra generation check here so that if someone manages
+        * to jump in after our release_in_xmit, we'll see that they have done
+        * some work and we will skip our goto
         */
        if (ret == 0) {
                smp_mb();
-               if (!list_empty(&conn->c_send_queue)) {
+               if (!list_empty(&conn->c_send_queue) &&
+                   send_gen == conn->c_send_gen) {
                        rds_stats_inc(s_send_lock_queue_raced);
                        goto restart;
                }
index 481f89f93789a147fd5979e894e62145f4d9d767..4505a691d88c283bbbd8038c2dd088825015bd32 100644 (file)
@@ -28,7 +28,7 @@
 const char *rxrpc_pkts[] = {
        "?00",
        "DATA", "ACK", "BUSY", "ABORT", "ACKALL", "CHALL", "RESP", "DEBUG",
-       "?09", "?10", "?11", "?12", "?13", "?14", "?15"
+       "?09", "?10", "?11", "?12", "VERSION", "?14", "?15"
 };
 
 /*
@@ -593,6 +593,20 @@ static void rxrpc_post_packet_to_conn(struct rxrpc_connection *conn,
        rxrpc_queue_conn(conn);
 }
 
+/*
+ * post endpoint-level events to the local endpoint
+ * - this includes debug and version messages
+ */
+static void rxrpc_post_packet_to_local(struct rxrpc_local *local,
+                                      struct sk_buff *skb)
+{
+       _enter("%p,%p", local, skb);
+
+       atomic_inc(&local->usage);
+       skb_queue_tail(&local->event_queue, skb);
+       rxrpc_queue_work(&local->event_processor);
+}
+
 static struct rxrpc_connection *rxrpc_conn_from_local(struct rxrpc_local *local,
                                               struct sk_buff *skb,
                                               struct rxrpc_skb_priv *sp)
@@ -699,6 +713,11 @@ void rxrpc_data_ready(struct sock *sk)
                goto bad_message;
        }
 
+       if (sp->hdr.type == RXRPC_PACKET_TYPE_VERSION) {
+               rxrpc_post_packet_to_local(local, skb);
+               goto out;
+       }
+       
        if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
            (sp->hdr.callNumber == 0 || sp->hdr.seq == 0))
                goto bad_message;
@@ -731,6 +750,8 @@ void rxrpc_data_ready(struct sock *sk)
                else
                        goto cant_route_call;
        }
+
+out:
        rxrpc_put_local(local);
        return;
 
index 2fc1e659e5c9ec1e14ac9bf897988907e9edd1b0..aef1bd294e1796b68052936bdc8d0b62b52814fe 100644 (file)
@@ -152,11 +152,13 @@ struct rxrpc_local {
        struct work_struct      destroyer;      /* endpoint destroyer */
        struct work_struct      acceptor;       /* incoming call processor */
        struct work_struct      rejecter;       /* packet reject writer */
+       struct work_struct      event_processor; /* endpoint event processor */
        struct list_head        services;       /* services listening on this endpoint */
        struct list_head        link;           /* link in endpoint list */
        struct rw_semaphore     defrag_sem;     /* control re-enablement of IP DF bit */
        struct sk_buff_head     accept_queue;   /* incoming calls awaiting acceptance */
        struct sk_buff_head     reject_queue;   /* packets awaiting rejection */
+       struct sk_buff_head     event_queue;    /* endpoint event packets awaiting processing */
        spinlock_t              lock;           /* access lock */
        rwlock_t                services_lock;  /* lock for services list */
        atomic_t                usage;
index 87f7135d238b498f9208543cf4608c1cb78a59d3..ca904ed5400a11bd08e47fea56d0caeb30f0a442 100644 (file)
 #include <linux/net.h>
 #include <linux/skbuff.h>
 #include <linux/slab.h>
+#include <linux/udp.h>
+#include <linux/ip.h>
 #include <net/sock.h>
 #include <net/af_rxrpc.h>
+#include <generated/utsrelease.h>
 #include "ar-internal.h"
 
+static const char rxrpc_version_string[65] = "linux-" UTS_RELEASE " AF_RXRPC";
+
 static LIST_HEAD(rxrpc_locals);
 DEFINE_RWLOCK(rxrpc_local_lock);
 static DECLARE_RWSEM(rxrpc_local_sem);
 static DECLARE_WAIT_QUEUE_HEAD(rxrpc_local_wq);
 
 static void rxrpc_destroy_local(struct work_struct *work);
+static void rxrpc_process_local_events(struct work_struct *work);
 
 /*
  * allocate a new local
@@ -37,11 +43,13 @@ struct rxrpc_local *rxrpc_alloc_local(struct sockaddr_rxrpc *srx)
                INIT_WORK(&local->destroyer, &rxrpc_destroy_local);
                INIT_WORK(&local->acceptor, &rxrpc_accept_incoming_calls);
                INIT_WORK(&local->rejecter, &rxrpc_reject_packets);
+               INIT_WORK(&local->event_processor, &rxrpc_process_local_events);
                INIT_LIST_HEAD(&local->services);
                INIT_LIST_HEAD(&local->link);
                init_rwsem(&local->defrag_sem);
                skb_queue_head_init(&local->accept_queue);
                skb_queue_head_init(&local->reject_queue);
+               skb_queue_head_init(&local->event_queue);
                spin_lock_init(&local->lock);
                rwlock_init(&local->services_lock);
                atomic_set(&local->usage, 1);
@@ -264,10 +272,12 @@ static void rxrpc_destroy_local(struct work_struct *work)
        ASSERT(list_empty(&local->services));
        ASSERT(!work_pending(&local->acceptor));
        ASSERT(!work_pending(&local->rejecter));
+       ASSERT(!work_pending(&local->event_processor));
 
        /* finish cleaning up the local descriptor */
        rxrpc_purge_queue(&local->accept_queue);
        rxrpc_purge_queue(&local->reject_queue);
+       rxrpc_purge_queue(&local->event_queue);
        kernel_sock_shutdown(local->socket, SHUT_RDWR);
        sock_release(local->socket);
 
@@ -308,3 +318,91 @@ void __exit rxrpc_destroy_all_locals(void)
 
        _leave("");
 }
+
+/*
+ * Reply to a version request
+ */
+static void rxrpc_send_version_request(struct rxrpc_local *local,
+                                      struct rxrpc_header *hdr,
+                                      struct sk_buff *skb)
+{
+       struct sockaddr_in sin;
+       struct msghdr msg;
+       struct kvec iov[2];
+       size_t len;
+       int ret;
+
+       _enter("");
+
+       sin.sin_family = AF_INET;
+       sin.sin_port = udp_hdr(skb)->source;
+       sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
+
+       msg.msg_name    = &sin;
+       msg.msg_namelen = sizeof(sin);
+       msg.msg_control = NULL;
+       msg.msg_controllen = 0;
+       msg.msg_flags   = 0;
+
+       hdr->seq        = 0;
+       hdr->serial     = 0;
+       hdr->type       = RXRPC_PACKET_TYPE_VERSION;
+       hdr->flags      = RXRPC_LAST_PACKET | (~hdr->flags & RXRPC_CLIENT_INITIATED);
+       hdr->userStatus = 0;
+       hdr->_rsvd      = 0;
+
+       iov[0].iov_base = hdr;
+       iov[0].iov_len  = sizeof(*hdr);
+       iov[1].iov_base = (char *)rxrpc_version_string;
+       iov[1].iov_len  = sizeof(rxrpc_version_string);
+
+       len = iov[0].iov_len + iov[1].iov_len;
+
+       _proto("Tx VERSION (reply)");
+
+       ret = kernel_sendmsg(local->socket, &msg, iov, 2, len);
+       if (ret < 0)
+               _debug("sendmsg failed: %d", ret);
+
+       _leave("");
+}
+
+/*
+ * Process event packets targetted at a local endpoint.
+ */
+static void rxrpc_process_local_events(struct work_struct *work)
+{
+       struct rxrpc_local *local = container_of(work, struct rxrpc_local, event_processor);
+       struct sk_buff *skb;
+       char v;
+
+       _enter("");
+
+       atomic_inc(&local->usage);
+       
+       while ((skb = skb_dequeue(&local->event_queue))) {
+               struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+
+               kdebug("{%d},{%u}", local->debug_id, sp->hdr.type);
+
+               switch (sp->hdr.type) {
+               case RXRPC_PACKET_TYPE_VERSION:
+                       if (skb_copy_bits(skb, 0, &v, 1) < 0)
+                               return;
+                       _proto("Rx VERSION { %02x }", v);
+                       if (v == 0)
+                               rxrpc_send_version_request(local, &sp->hdr, skb);
+                       break;
+
+               default:
+                       /* Just ignore anything we don't understand */
+                       break;
+               }
+
+               rxrpc_put_local(local);
+               rxrpc_free_skb(skb);
+       }
+
+       rxrpc_put_local(local);
+       _leave("");
+}
index 09f584566e234ba7e53e76a653ee87883f419549..c0042807bfc6a5e2b6e03d70fbcffe097be73326 100644 (file)
@@ -542,11 +542,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
        call->tx_pending = NULL;
 
        copied = 0;
-       if (len > iov_iter_count(&msg->msg_iter))
-               len = iov_iter_count(&msg->msg_iter);
-       while (len) {
-               int copy;
-
+       do {
                if (!skb) {
                        size_t size, chunk, max, space;
 
@@ -568,8 +564,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
                        max &= ~(call->conn->size_align - 1UL);
 
                        chunk = max;
-                       if (chunk > len && !more)
-                               chunk = len;
+                       if (chunk > msg_data_left(msg) && !more)
+                               chunk = msg_data_left(msg);
 
                        space = chunk + call->conn->size_align;
                        space &= ~(call->conn->size_align - 1UL);
@@ -612,23 +608,23 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
                sp = rxrpc_skb(skb);
 
                /* append next segment of data to the current buffer */
-               copy = skb_tailroom(skb);
-               ASSERTCMP(copy, >, 0);
-               if (copy > len)
-                       copy = len;
-               if (copy > sp->remain)
-                       copy = sp->remain;
-
-               _debug("add");
-               ret = skb_add_data(skb, &msg->msg_iter, copy);
-               _debug("added");
-               if (ret < 0)
-                       goto efault;
-               sp->remain -= copy;
-               skb->mark += copy;
-               copied += copy;
-
-               len -= copy;
+               if (msg_data_left(msg) > 0) {
+                       int copy = skb_tailroom(skb);
+                       ASSERTCMP(copy, >, 0);
+                       if (copy > msg_data_left(msg))
+                               copy = msg_data_left(msg);
+                       if (copy > sp->remain)
+                               copy = sp->remain;
+
+                       _debug("add");
+                       ret = skb_add_data(skb, &msg->msg_iter, copy);
+                       _debug("added");
+                       if (ret < 0)
+                               goto efault;
+                       sp->remain -= copy;
+                       skb->mark += copy;
+                       copied += copy;
+               }
 
                /* check for the far side aborting the call or a network error
                 * occurring */
@@ -636,7 +632,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
                        goto call_aborted;
 
                /* add the packet to the send queue if it's now full */
-               if (sp->remain <= 0 || (!len && !more)) {
+               if (sp->remain <= 0 ||
+                   (msg_data_left(msg) == 0 && !more)) {
                        struct rxrpc_connection *conn = call->conn;
                        uint32_t seq;
                        size_t pad;
@@ -666,7 +663,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
                        sp->hdr.serviceId = conn->service_id;
 
                        sp->hdr.flags = conn->out_clientflag;
-                       if (len == 0 && !more)
+                       if (msg_data_left(msg) == 0 && !more)
                                sp->hdr.flags |= RXRPC_LAST_PACKET;
                        else if (CIRC_SPACE(call->acks_head, call->acks_tail,
                                            call->acks_winsz) > 1)
@@ -682,10 +679,10 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
 
                        memcpy(skb->head, &sp->hdr,
                               sizeof(struct rxrpc_header));
-                       rxrpc_queue_packet(call, skb, !iov_iter_count(&msg->msg_iter) && !more);
+                       rxrpc_queue_packet(call, skb, !msg_data_left(msg) && !more);
                        skb = NULL;
                }
-       }
+       } while (msg_data_left(msg) > 0);
 
 success:
        ret = copied;
index eb5b8445fef989c1f331fa1485fa39c649181a82..4cdbfb85686a7ee55d71d0a7c1ea5cdd7e789a22 100644 (file)
@@ -88,11 +88,19 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 /* ------------------------------------------------------------- */
 
+static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
+{
+       net_inc_ingress_queue();
+
+       return 0;
+}
+
 static void ingress_destroy(struct Qdisc *sch)
 {
        struct ingress_qdisc_data *p = qdisc_priv(sch);
 
        tcf_destroy_chain(&p->filter_list);
+       net_dec_ingress_queue();
 }
 
 static int ingress_dump(struct Qdisc *sch, struct sk_buff *skb)
@@ -124,6 +132,7 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
        .id             =       "ingress",
        .priv_size      =       sizeof(struct ingress_qdisc_data),
        .enqueue        =       ingress_enqueue,
+       .init           =       ingress_init,
        .destroy        =       ingress_destroy,
        .dump           =       ingress_dump,
        .owner          =       THIS_MODULE,
index 179f1c8c0d8bba4aa00705e6c9ca41ef909f7d50..956ead2cab9ad89f36835039a9b728d24a58ca41 100644 (file)
@@ -560,8 +560,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 tfifo_dequeue:
        skb = __skb_dequeue(&sch->q);
        if (skb) {
-deliver:
                qdisc_qstats_backlog_dec(sch, skb);
+deliver:
                qdisc_unthrottled(sch);
                qdisc_bstats_update(sch, skb);
                return skb;
@@ -578,6 +578,7 @@ deliver:
                        rb_erase(p, &q->t_root);
 
                        sch->q.qlen--;
+                       qdisc_qstats_backlog_dec(sch, skb);
                        skb->next = NULL;
                        skb->prev = NULL;
                        skb->tstamp = netem_skb_cb(skb)->tstamp_save;
index 073809f4125f276799418342f9c60519a3da82e9..5b0126234606dac94152b69fb42a571b1f3cc7bf 100644 (file)
@@ -610,35 +610,27 @@ void __sock_tx_timestamp(const struct sock *sk, __u8 *tx_flags)
 }
 EXPORT_SYMBOL(__sock_tx_timestamp);
 
-static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg,
-                                    size_t size)
+static inline int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg)
 {
-       return sock->ops->sendmsg(sock, msg, size);
+       int ret = sock->ops->sendmsg(sock, msg, msg_data_left(msg));
+       BUG_ON(ret == -EIOCBQUEUED);
+       return ret;
 }
 
-int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
+int sock_sendmsg(struct socket *sock, struct msghdr *msg)
 {
-       int err = security_socket_sendmsg(sock, msg, size);
+       int err = security_socket_sendmsg(sock, msg,
+                                         msg_data_left(msg));
 
-       return err ?: sock_sendmsg_nosec(sock, msg, size);
+       return err ?: sock_sendmsg_nosec(sock, msg);
 }
 EXPORT_SYMBOL(sock_sendmsg);
 
 int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
                   struct kvec *vec, size_t num, size_t size)
 {
-       mm_segment_t oldfs = get_fs();
-       int result;
-
-       set_fs(KERNEL_DS);
-       /*
-        * the following is safe, since for compiler definitions of kvec and
-        * iovec are identical, yielding the same in-core layout and alignment
-        */
-       iov_iter_init(&msg->msg_iter, WRITE, (struct iovec *)vec, num, size);
-       result = sock_sendmsg(sock, msg, size);
-       set_fs(oldfs);
-       return result;
+       iov_iter_kvec(&msg->msg_iter, WRITE | ITER_KVEC, vec, num, size);
+       return sock_sendmsg(sock, msg);
 }
 EXPORT_SYMBOL(kernel_sendmsg);
 
@@ -755,12 +747,8 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
        mm_segment_t oldfs = get_fs();
        int result;
 
+       iov_iter_kvec(&msg->msg_iter, READ | ITER_KVEC, vec, num, size);
        set_fs(KERNEL_DS);
-       /*
-        * the following is safe, since for compiler definitions of kvec and
-        * iovec are identical, yielding the same in-core layout and alignment
-        */
-       iov_iter_init(&msg->msg_iter, READ, (struct iovec *)vec, num, size);
        result = sock_recvmsg(sock, msg, size, flags);
        set_fs(oldfs);
        return result;
@@ -808,10 +796,10 @@ static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
        if (iocb->ki_pos != 0)
                return -ESPIPE;
 
-       if (iocb->ki_nbytes == 0)       /* Match SYS5 behaviour */
+       if (!iov_iter_count(to))        /* Match SYS5 behaviour */
                return 0;
 
-       res = sock_recvmsg(sock, &msg, iocb->ki_nbytes, msg.msg_flags);
+       res = sock_recvmsg(sock, &msg, iov_iter_count(to), msg.msg_flags);
        *to = msg.msg_iter;
        return res;
 }
@@ -833,7 +821,7 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
        if (sock->type == SOCK_SEQPACKET)
                msg.msg_flags |= MSG_EOR;
 
-       res = sock_sendmsg(sock, &msg, iocb->ki_nbytes);
+       res = sock_sendmsg(sock, &msg);
        *from = msg.msg_iter;
        return res;
 }
@@ -1650,18 +1638,14 @@ SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
        struct iovec iov;
        int fput_needed;
 
-       if (len > INT_MAX)
-               len = INT_MAX;
-       if (unlikely(!access_ok(VERIFY_READ, buff, len)))
-               return -EFAULT;
+       err = import_single_range(WRITE, buff, len, &iov, &msg.msg_iter);
+       if (unlikely(err))
+               return err;
        sock = sockfd_lookup_light(fd, &err, &fput_needed);
        if (!sock)
                goto out;
 
-       iov.iov_base = buff;
-       iov.iov_len = len;
        msg.msg_name = NULL;
-       iov_iter_init(&msg.msg_iter, WRITE, &iov, 1, len);
        msg.msg_control = NULL;
        msg.msg_controllen = 0;
        msg.msg_namelen = 0;
@@ -1675,7 +1659,7 @@ SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
        if (sock->file->f_flags & O_NONBLOCK)
                flags |= MSG_DONTWAIT;
        msg.msg_flags = flags;
-       err = sock_sendmsg(sock, &msg, len);
+       err = sock_sendmsg(sock, &msg);
 
 out_put:
        fput_light(sock->file, fput_needed);
@@ -1710,26 +1694,22 @@ SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
        int err, err2;
        int fput_needed;
 
-       if (size > INT_MAX)
-               size = INT_MAX;
-       if (unlikely(!access_ok(VERIFY_WRITE, ubuf, size)))
-               return -EFAULT;
+       err = import_single_range(READ, ubuf, size, &iov, &msg.msg_iter);
+       if (unlikely(err))
+               return err;
        sock = sockfd_lookup_light(fd, &err, &fput_needed);
        if (!sock)
                goto out;
 
        msg.msg_control = NULL;
        msg.msg_controllen = 0;
-       iov.iov_len = size;
-       iov.iov_base = ubuf;
-       iov_iter_init(&msg.msg_iter, READ, &iov, 1, size);
        /* Save some cycles and don't copy the address if not needed */
        msg.msg_name = addr ? (struct sockaddr *)&address : NULL;
        /* We assume all kernel code knows the size of sockaddr_storage */
        msg.msg_namelen = 0;
        if (sock->file->f_flags & O_NONBLOCK)
                flags |= MSG_DONTWAIT;
-       err = sock_recvmsg(sock, &msg, size, flags);
+       err = sock_recvmsg(sock, &msg, iov_iter_count(&msg.msg_iter), flags);
 
        if (err >= 0 && addr != NULL) {
                err2 = move_addr_to_user(&address,
@@ -1849,10 +1829,10 @@ struct used_address {
        unsigned int name_len;
 };
 
-static ssize_t copy_msghdr_from_user(struct msghdr *kmsg,
-                                    struct user_msghdr __user *umsg,
-                                    struct sockaddr __user **save_addr,
-                                    struct iovec **iov)
+static int copy_msghdr_from_user(struct msghdr *kmsg,
+                                struct user_msghdr __user *umsg,
+                                struct sockaddr __user **save_addr,
+                                struct iovec **iov)
 {
        struct sockaddr __user *uaddr;
        struct iovec __user *uiov;
@@ -1898,13 +1878,8 @@ static ssize_t copy_msghdr_from_user(struct msghdr *kmsg,
 
        kmsg->msg_iocb = NULL;
 
-       err = rw_copy_check_uvector(save_addr ? READ : WRITE,
-                                   uiov, nr_segs,
-                                   UIO_FASTIOV, *iov, iov);
-       if (err >= 0)
-               iov_iter_init(&kmsg->msg_iter, save_addr ? READ : WRITE,
-                             *iov, nr_segs, err);
-       return err;
+       return import_iovec(save_addr ? READ : WRITE, uiov, nr_segs,
+                           UIO_FASTIOV, iov, &kmsg->msg_iter);
 }
 
 static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
@@ -1919,7 +1894,7 @@ static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
            __attribute__ ((aligned(sizeof(__kernel_size_t))));
        /* 20 is size of ipv6_pktinfo */
        unsigned char *ctl_buf = ctl;
-       int ctl_len, total_len;
+       int ctl_len;
        ssize_t err;
 
        msg_sys->msg_name = &address;
@@ -1929,8 +1904,7 @@ static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
        else
                err = copy_msghdr_from_user(msg_sys, msg, NULL, &iov);
        if (err < 0)
-               goto out_freeiov;
-       total_len = err;
+               return err;
 
        err = -ENOBUFS;
 
@@ -1977,10 +1951,10 @@ static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg,
            used_address->name_len == msg_sys->msg_namelen &&
            !memcmp(&used_address->name, msg_sys->msg_name,
                    used_address->name_len)) {
-               err = sock_sendmsg_nosec(sock, msg_sys, total_len);
+               err = sock_sendmsg_nosec(sock, msg_sys);
                goto out_freectl;
        }
-       err = sock_sendmsg(sock, msg_sys, total_len);
+       err = sock_sendmsg(sock, msg_sys);
        /*
         * If this is sendmmsg() and sending to current destination address was
         * successful, remember it.
@@ -1996,8 +1970,7 @@ out_freectl:
        if (ctl_buf != ctl)
                sock_kfree_s(sock->sk, ctl_buf, ctl_len);
 out_freeiov:
-       if (iov != iovstack)
-               kfree(iov);
+       kfree(iov);
        return err;
 }
 
@@ -2122,8 +2095,8 @@ static int ___sys_recvmsg(struct socket *sock, struct user_msghdr __user *msg,
        else
                err = copy_msghdr_from_user(msg_sys, msg, &uaddr, &iov);
        if (err < 0)
-               goto out_freeiov;
-       total_len = err;
+               return err;
+       total_len = iov_iter_count(&msg_sys->msg_iter);
 
        cmsg_ptr = (unsigned long)msg_sys->msg_control;
        msg_sys->msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT);
@@ -2161,8 +2134,7 @@ static int ___sys_recvmsg(struct socket *sock, struct user_msghdr __user *msg,
        err = len;
 
 out_freeiov:
-       if (iov != iovstack)
-               kfree(iov);
+       kfree(iov);
        return err;
 }
 
index cc331b6cf573d95340fddf0ea0e3ae8dd8ae8f3d..0c8120229a0353967138d20c5fe009a1c6e81d1f 100644 (file)
@@ -257,7 +257,7 @@ static int svc_sendto(struct svc_rqst *rqstp, struct xdr_buf *xdr)
 
                svc_set_cmsg_data(rqstp, cmh);
 
-               if (sock_sendmsg(sock, &msg, 0) < 0)
+               if (sock_sendmsg(sock, &msg) < 0)
                        goto out;
        }
 
index 85d1d476461257b3e248bd870cd9f7217fa21657..526c4feb3b50d723d24b8c55288c8c941257da52 100644 (file)
@@ -238,11 +238,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
                skb->sp->xvec[skb->sp->len++] = x;
 
-               if (xfrm_tunnel_check(skb, x, family)) {
-                       XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
-                       goto drop;
-               }
-
                spin_lock(&x->lock);
                if (unlikely(x->km.state == XFRM_STATE_ACQ)) {
                        XFRM_INC_STATS(net, LINUX_MIB_XFRMACQUIREERROR);
@@ -271,6 +266,11 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
                spin_unlock(&x->lock);
 
+               if (xfrm_tunnel_check(skb, x, family)) {
+                       XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
+                       goto drop;
+               }
+
                seq_hi = htonl(xfrm_replay_seqhi(x, seq));
 
                XFRM_SKB_CB(skb)->seq.input.low = seq;
index 30594bfa5fb1c00fffed6e353b21acd4afbf3c28..2bbb41822d8ec8882f8dacbbb4c5f8a1feac59ca 100644 (file)
@@ -153,6 +153,8 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
 
        switch (sclass) {
        case SECCLASS_NETLINK_ROUTE_SOCKET:
+               /* RTM_MAX always point to RTM_SETxxxx, ie RTM_NEWxxx + 3 */
+               BUILD_BUG_ON(RTM_MAX != (RTM_NEWNSID + 3));
                err = nlmsg_perm(nlmsg_type, perm, nlmsg_route_perms,
                                 sizeof(nlmsg_route_perms));
                break;
@@ -163,6 +165,7 @@ int selinux_nlmsg_lookup(u16 sclass, u16 nlmsg_type, u32 *perm)
                break;
 
        case SECCLASS_NETLINK_XFRM_SOCKET:
+               BUILD_BUG_ON(XFRM_MSG_MAX != XFRM_MSG_MAPPING);
                err = nlmsg_perm(nlmsg_type, perm, nlmsg_xfrm_perms,
                                 sizeof(nlmsg_xfrm_perms));
                break;
index 279e24f613051fddb8ca16375ab9031e6a703b03..a69ebc79bc5008e8251c8837a5ea973eb2c458b9 100644 (file)
@@ -25,7 +25,6 @@
 #include <linux/slab.h>
 #include <linux/time.h>
 #include <linux/pm_qos.h>
-#include <linux/aio.h>
 #include <linux/io.h>
 #include <linux/dma-mapping.h>
 #include <sound/core.h>
@@ -35,6 +34,7 @@
 #include <sound/pcm_params.h>
 #include <sound/timer.h>
 #include <sound/minors.h>
+#include <linux/uio.h>
 
 /*
  *  Compatibility