From: Linus Torvalds Date: Mon, 13 Oct 2014 13:58:15 +0000 (+0200) Subject: Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git... X-Git-Tag: firefly_0821_release~176^2~3062 X-Git-Url: http://plrg.eecs.uci.edu/git/?a=commitdiff_plain;h=9d9420f1209a1facea7110d549ac695f5aeeb503;hp=-c;p=firefly-linux-kernel-4.4.55.git Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side updates: - Fix and enhance poll support (Jiri Olsa) - Re-enable inheritance optimization (Jiri Olsa) - Enhance Intel memory events support (Stephane Eranian) - Refactor the Intel uncore driver to be more maintainable (Zheng Yan) - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra, Andi Kleen) - [ plus various smaller fixes/cleanups ] User visible tooling updates: - Add +field argument support for --field option, so that one can add fields to the default list of fields to show, ie now one can just do: perf report --fields +pid And the pid will appear in addition to the default fields (Jiri Olsa) - Add +field argument support for --sort option (Jiri Olsa) - Honour -w in the report tools (report, top), allowing to specify the widths for the histogram entries columns (Namhyung Kim) - Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger) - Add beautifier for mremap flags param in 'trace' (Alex Snast) - perf script: Allow callchains if any event samples them - Don't truncate Intel style addresses in 'annotate' (Alex Converse) - Allow profiling when kptr_restrict == 1 for non root users, kernel samples will just remain unresolved (Andi Kleen) - Allow configuring default options for callchains in config file (Namhyung Kim) - Support operations for shared futexes. (Davidlohr Bueso) - "perf kvm stat report" improvements by Alexander Yarygin: - Save pid string in opts.target.pid - Enable the target.system_wide flag - Unify the title bar output - [ plus lots of other fixes and small improvements. ] Tooling infrastructure changes: - Refactor unit and scale function parameters for PMU parsing routines (Matt Fleming) - Improve DSO long names lookup with rbtree, resulting in great speedup for workloads with lots of DSOs (Waiman Long) - We were not handling POLLHUP notifications for event file descriptors Fix it by filtering entries in the events file descriptor array after poll() returns, refcounting mmaps so that when the last fd pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho de Melo) - Intel PT prep work, from Adrian Hunter, including: - Let a user specify a PMU event without any config terms - Add perf-with-kcore script - Let default config be defined for a PMU - Add perf_pmu__scan_file() - Add a 'perf test' for tracking with sched_switch - Add 'flush' callback to scripting API - Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo) - hists browser (used in top and report) refactorings, getting rid of unused variables and reducing source code size by handling similar cases in a fewer functions (Namhyung Kim). - Replace thread unsafe strerror() with strerror_r() accross the whole tools/perf/ tree (Masami Hiramatsu) - Rename ordered_samples to ordered_events and allow setting a queue size for ordering events (Jiri Olsa) - [ plus lots of fixes, cleanups and other improvements ]" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits) perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment perf/x86/intel/uncore: Fix minor race in box set up perf record: Fix error message for --filter option not coming after tracepoint perf tools: Fix build breakage on arm64 targets perf symbols: Improve DSO long names lookup speed with rbtree perf symbols: Encapsulate dsos list head into struct dsos perf bench futex: Sanitize -q option in requeue perf bench futex: Support operations for shared futexes perf trace: Fix mmap return address truncation to 32-bit perf tools: Refactor unit and scale function parameters perf tools: Fix line number in the config file error message perf tools: Convert {record,top}.call-graph option to call-graph.record-mode perf tools: Introduce perf_callchain_config() perf callchain: Move some parser functions to callchain.c perf tools: Move callchain config from record_opts to callchain_param perf hists browser: Fix callchain print bug on TUI perf tools: Use ACCESS_ONCE() instead of volatile cast perf tools: Modify error code for when perf_session__new() fails perf tools: Fix perf record as non root with kptr_restrict == 1 perf stat: Fix --per-core on multi socket systems ... --- 9d9420f1209a1facea7110d549ac695f5aeeb503 diff --combined arch/x86/kernel/cpu/Makefile index 77dcab277710,7e1fd4e08552..01d5453b5502 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@@ -13,13 -13,10 +13,13 @@@ nostackp := $(call cc-option, -fno-stac CFLAGS_common.o := $(nostackp) obj-y := intel_cacheinfo.o scattered.o topology.o -obj-y += proc.o capflags.o powerflags.o common.o +obj-y += common.o obj-y += rdrand.o obj-y += match.o +obj-$(CONFIG_PROC_FS) += proc.o +obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o + obj-$(CONFIG_X86_32) += bugs.o obj-$(CONFIG_X86_64) += bugs_64.o @@@ -39,7 -36,9 +39,9 @@@ obj-$(CONFIG_CPU_SUP_AMD) += perf_even endif obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_p6.o perf_event_knc.o perf_event_p4.o obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o - obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_uncore.o perf_event_intel_rapl.o + obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_uncore.o perf_event_intel_uncore_snb.o + obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_uncore_snbep.o perf_event_intel_uncore_nhmex.o + obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_rapl.o endif @@@ -51,7 -50,6 +53,7 @@@ obj-$(CONFIG_X86_LOCAL_APIC) += perfct obj-$(CONFIG_HYPERVISOR_GUEST) += vmware.o hypervisor.o mshyperv.o +ifdef CONFIG_X86_FEATURE_NAMES quiet_cmd_mkcapflags = MKCAP $@ cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@ @@@ -60,4 -58,3 +62,4 @@@ cpufeature = $(src)/../../include/asm/c targets += capflags.c $(obj)/capflags.c: $(cpufeature) $(src)/mkcapflags.sh FORCE $(call if_changed,mkcapflags) +endif diff --combined include/linux/pci_ids.h index 2338e68398cb,3102b7e3b460..0aa51b451597 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@@ -2245,8 -2245,6 +2245,8 @@@ #define PCI_VENDOR_ID_MORETON 0x15aa #define PCI_DEVICE_ID_RASTEL_2PORT 0x2000 +#define PCI_VENDOR_ID_VMWARE 0x15ad + #define PCI_VENDOR_ID_ZOLTRIX 0x15b0 #define PCI_DEVICE_ID_ZOLTRIX_2BD0 0x2bd0 @@@ -2538,6 -2536,7 +2538,7 @@@ #define PCI_DEVICE_ID_INTEL_EESSC 0x0008 #define PCI_DEVICE_ID_INTEL_SNB_IMC 0x0100 #define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154 + #define PCI_DEVICE_ID_INTEL_IVB_E3_IMC 0x0150 #define PCI_DEVICE_ID_INTEL_HSW_IMC 0x0c00 #define PCI_DEVICE_ID_INTEL_PXHD_0 0x0320 #define PCI_DEVICE_ID_INTEL_PXHD_1 0x0321 @@@ -2820,22 -2819,7 +2821,22 @@@ #define PCI_DEVICE_ID_INTEL_UNC_R2PCIE 0x3c43 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI0 0x3c44 #define PCI_DEVICE_ID_INTEL_UNC_R3QPI1 0x3c45 +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_RAS 0x3c71 /* 15.1 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR0 0x3c72 /* 16.2 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR1 0x3c73 /* 16.3 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR2 0x3c76 /* 16.6 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_ERR3 0x3c77 /* 16.7 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0 0x3ca0 /* 14.0 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA 0x3ca8 /* 15.0 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0 0x3caa /* 15.2 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD1 0x3cab /* 15.3 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD2 0x3cac /* 15.4 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD3 0x3cad /* 15.5 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO 0x3cb8 /* 17.0 */ #define PCI_DEVICE_ID_INTEL_JAKETOWN_UBOX 0x3ce0 +#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD0 0x3cf4 /* 12.6 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_BR 0x3cf5 /* 13.6 */ +#define PCI_DEVICE_ID_INTEL_SBRIDGE_SAD1 0x3cf6 /* 12.7 */ #define PCI_DEVICE_ID_INTEL_IOAT_SNB 0x402f #define PCI_DEVICE_ID_INTEL_5100_16 0x65f0 #define PCI_DEVICE_ID_INTEL_5100_19 0x65f3 diff --combined kernel/events/core.c index b1c663593f5c,b164cb07b30d..385f11d94105 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@@ -47,6 -47,8 +47,8 @@@ #include + static struct workqueue_struct *perf_wq; + struct remote_function_call { struct task_struct *p; int (*func)(void *info); @@@ -120,6 -122,13 +122,13 @@@ static int cpu_function_call(int cpu, i return data.ret; } + #define EVENT_OWNER_KERNEL ((void *) -1) + + static bool is_kernel_event(struct perf_event *event) + { + return event->owner == EVENT_OWNER_KERNEL; + } + #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\ PERF_FLAG_FD_OUTPUT |\ PERF_FLAG_PID_CGROUP |\ @@@ -392,9 -401,14 +401,9 @@@ perf_cgroup_match(struct perf_event *ev event->cgrp->css.cgroup); } -static inline void perf_put_cgroup(struct perf_event *event) -{ - css_put(&event->cgrp->css); -} - static inline void perf_detach_cgroup(struct perf_event *event) { - perf_put_cgroup(event); + css_put(&event->cgrp->css); event->cgrp = NULL; } @@@ -1370,6 -1384,45 +1379,45 @@@ out perf_event__header_size(tmp); } + /* + * User event without the task. + */ + static bool is_orphaned_event(struct perf_event *event) + { + return event && !is_kernel_event(event) && !event->owner; + } + + /* + * Event has a parent but parent's task finished and it's + * alive only because of children holding refference. + */ + static bool is_orphaned_child(struct perf_event *event) + { + return is_orphaned_event(event->parent); + } + + static void orphans_remove_work(struct work_struct *work); + + static void schedule_orphans_remove(struct perf_event_context *ctx) + { + if (!ctx->task || ctx->orphans_remove_sched || !perf_wq) + return; + + if (queue_delayed_work(perf_wq, &ctx->orphans_remove, 1)) { + get_ctx(ctx); + ctx->orphans_remove_sched = true; + } + } + + static int __init perf_workqueue_init(void) + { + perf_wq = create_singlethread_workqueue("perf"); + WARN(!perf_wq, "failed to create perf workqueue\n"); + return perf_wq ? 0 : -1; + } + + core_initcall(perf_workqueue_init); + static inline int event_filter_match(struct perf_event *event) { @@@ -1419,6 -1472,9 +1467,9 @@@ event_sched_out(struct perf_event *even if (event->attr.exclusive || !cpuctx->active_oncpu) cpuctx->exclusive = 0; + if (is_orphaned_child(event)) + schedule_orphans_remove(ctx); + perf_pmu_enable(event->pmu); } @@@ -1519,11 -1575,6 +1570,11 @@@ retry */ if (ctx->is_active) { raw_spin_unlock_irq(&ctx->lock); + /* + * Reload the task pointer, it might have been changed by + * a concurrent perf_event_context_sched_out(). + */ + task = ctx->task; goto retry; } @@@ -1726,6 -1777,9 +1777,9 @@@ event_sched_in(struct perf_event *event if (event->attr.exclusive) cpuctx->exclusive = 1; + if (is_orphaned_child(event)) + schedule_orphans_remove(ctx); + out: perf_pmu_enable(event->pmu); @@@ -1967,11 -2021,6 +2021,11 @@@ retry */ if (ctx->is_active) { raw_spin_unlock_irq(&ctx->lock); + /* + * Reload the task pointer, it might have been changed by + * a concurrent perf_event_context_sched_out(). + */ + task = ctx->task; goto retry; } @@@ -2326,7 -2375,7 +2380,7 @@@ static void perf_event_context_sched_ou next_parent = rcu_dereference(next_ctx->parent_ctx); /* If neither context have a parent context; they cannot be clones. */ - if (!parent || !next_parent) + if (!parent && !next_parent) goto unlock; if (next_parent == ctx || next_ctx == parent || next_parent == parent) { @@@ -3073,6 -3122,7 +3127,7 @@@ static void __perf_event_init_context(s INIT_LIST_HEAD(&ctx->flexible_groups); INIT_LIST_HEAD(&ctx->event_list); atomic_set(&ctx->refcount, 1); + INIT_DELAYED_WORK(&ctx->orphans_remove, orphans_remove_work); } static struct perf_event_context * @@@ -3318,16 -3368,12 +3373,12 @@@ static void free_event(struct perf_even } /* - * Called when the last reference to the file is gone. + * Remove user event from the owner task. */ - static void put_event(struct perf_event *event) + static void perf_remove_from_owner(struct perf_event *event) { - struct perf_event_context *ctx = event->ctx; struct task_struct *owner; - if (!atomic_long_dec_and_test(&event->refcount)) - return; - rcu_read_lock(); owner = ACCESS_ONCE(event->owner); /* @@@ -3360,6 -3406,20 +3411,20 @@@ mutex_unlock(&owner->perf_event_mutex); put_task_struct(owner); } + } + + /* + * Called when the last reference to the file is gone. + */ + static void put_event(struct perf_event *event) + { + struct perf_event_context *ctx = event->ctx; + + if (!atomic_long_dec_and_test(&event->refcount)) + return; + + if (!is_kernel_event(event)) + perf_remove_from_owner(event); WARN_ON_ONCE(ctx->parent_ctx); /* @@@ -3394,6 -3454,42 +3459,42 @@@ static int perf_release(struct inode *i return 0; } + /* + * Remove all orphanes events from the context. + */ + static void orphans_remove_work(struct work_struct *work) + { + struct perf_event_context *ctx; + struct perf_event *event, *tmp; + + ctx = container_of(work, struct perf_event_context, + orphans_remove.work); + + mutex_lock(&ctx->mutex); + list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry) { + struct perf_event *parent_event = event->parent; + + if (!is_orphaned_child(event)) + continue; + + perf_remove_from_context(event, true); + + mutex_lock(&parent_event->child_mutex); + list_del_init(&event->child_list); + mutex_unlock(&parent_event->child_mutex); + + free_event(event); + put_event(parent_event); + } + + raw_spin_lock_irq(&ctx->lock); + ctx->orphans_remove_sched = false; + raw_spin_unlock_irq(&ctx->lock); + mutex_unlock(&ctx->mutex); + + put_ctx(ctx); + } + u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running) { struct perf_event *child; @@@ -3491,6 -3587,19 +3592,19 @@@ static int perf_event_read_one(struct p return n * sizeof(u64); } + static bool is_event_hup(struct perf_event *event) + { + bool no_children; + + if (event->state != PERF_EVENT_STATE_EXIT) + return false; + + mutex_lock(&event->child_mutex); + no_children = list_empty(&event->child_list); + mutex_unlock(&event->child_mutex); + return no_children; + } + /* * Read the performance event - simple non blocking version for now */ @@@ -3532,7 -3641,12 +3646,12 @@@ static unsigned int perf_poll(struct fi { struct perf_event *event = file->private_data; struct ring_buffer *rb; - unsigned int events = POLL_HUP; + unsigned int events = POLLHUP; + + poll_wait(file, &event->waitq, wait); + + if (is_event_hup(event)) + return events; /* * Pin the event->rb by taking event->mmap_mutex; otherwise @@@ -3543,9 -3657,6 +3662,6 @@@ if (rb) events = atomic_xchg(&rb->poll, 0); mutex_unlock(&event->mmap_mutex); - - poll_wait(file, &event->waitq, wait); - return events; } @@@ -5809,7 -5920,7 +5925,7 @@@ static void swevent_hlist_release(struc if (!hlist) return; - rcu_assign_pointer(swhash->swevent_hlist, NULL); + RCU_INIT_POINTER(swhash->swevent_hlist, NULL); kfree_rcu(hlist, rcu_head); } @@@ -7392,6 -7503,9 +7508,9 @@@ perf_event_create_kernel_counter(struc goto err; } + /* Mark owner so we could distinguish it from user events. */ + event->owner = EVENT_OWNER_KERNEL; + account_event(event); ctx = find_get_context(event->pmu, task, cpu); @@@ -7478,6 -7592,12 +7597,12 @@@ static void sync_child_event(struct per list_del_init(&child_event->child_list); mutex_unlock(&parent_event->child_mutex); + /* + * Make sure user/parent get notified, that we just + * lost one event. + */ + perf_event_wakeup(parent_event); + /* * Release the parent event, if this was the last * reference to it. @@@ -7512,6 -7632,9 +7637,9 @@@ __perf_event_exit_task(struct perf_even if (child_event->parent) { sync_child_event(child_event, child); free_event(child_event); + } else { + child_event->state = PERF_EVENT_STATE_EXIT; + perf_event_wakeup(child_event); } } @@@ -7695,6 -7818,7 +7823,7 @@@ inherit_event(struct perf_event *parent struct perf_event *group_leader, struct perf_event_context *child_ctx) { + enum perf_event_active_state parent_state = parent_event->state; struct perf_event *child_event; unsigned long flags; @@@ -7715,7 -7839,8 +7844,8 @@@ if (IS_ERR(child_event)) return child_event; - if (!atomic_long_inc_not_zero(&parent_event->refcount)) { + if (is_orphaned_event(parent_event) || + !atomic_long_inc_not_zero(&parent_event->refcount)) { free_event(child_event); return NULL; } @@@ -7727,7 -7852,7 +7857,7 @@@ * not its attr.disabled bit. We hold the parent's mutex, * so we won't race with perf_event_{en, dis}able_family. */ - if (parent_event->state >= PERF_EVENT_STATE_INACTIVE) + if (parent_state >= PERF_EVENT_STATE_INACTIVE) child_event->state = PERF_EVENT_STATE_INACTIVE; else child_event->state = PERF_EVENT_STATE_OFF; @@@ -7943,10 -8068,8 +8073,10 @@@ int perf_event_init_task(struct task_st for_each_task_context_nr(ctxn) { ret = perf_event_init_context(child, ctxn); - if (ret) + if (ret) { + perf_event_free_task(child); return ret; + } } return 0;