History log of /linux/tools/perf/tests/workloads/thloop.c (Results 1 – 25 of 38)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# a23e1966 15-Jul-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.11 merge window.


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 6f47c7ae 28-May-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.9' into next

Sync up with the mainline to bring in the new cleanup API.


Revision tags: v6.10-rc1
# 60a2f25d 16-May-2024 Tvrtko Ursulin <tursulin@ursulin.net>

Merge drm/drm-next into drm-intel-gt-next

Some display refactoring patches are needed in order to allow conflict-
less merging.

Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>


Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7
# 06d07429 29-Feb-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to get the drm_printer changes to drm-intel-next.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 2e21dee6 13-Mar-2024 Jiri Kosina <jkosina@suse.com>

Merge branch 'for-6.9/amd-sfh' into for-linus

- assorted fixes and optimizations for amd-sfh (Basavaraj Natikar)

Signed-off-by: Jiri Kosina <jkosina@suse.com>


Revision tags: v6.8-rc6, v6.8-rc5
# 41c177cf 11-Feb-2024 Rob Clark <robdclark@chromium.org>

Merge tag 'drm-misc-next-2024-02-08' into msm-next

Merge the drm-misc tree to uprev MSM CI.

Signed-off-by: Rob Clark <robdclark@chromium.org>


Revision tags: v6.8-rc4, v6.8-rc3
# 4db102dc 29-Jan-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-next into drm-misc-next

Kickstart 6.9 development cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.8-rc2
# be3382ec 23-Jan-2024 Lucas De Marchi <lucas.demarchi@intel.com>

Merge drm/drm-next into drm-xe-next

Sync to v6.8-rc1.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>


# 03c11eb3 14-Feb-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch

Conflicts:
arch/x86/include/asm/percpu.h
arch/x86/include/asm/text-patching.h

Signed-off-by: Ingo Molnar <mingo@k

Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch

Conflicts:
arch/x86/include/asm/percpu.h
arch/x86/include/asm/text-patching.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


# 42ac0be1 26-Jan-2024 Ingo Molnar <mingo@kernel.org>

Merge branch 'linus' into x86/mm, to refresh the branch and pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 06f609b3 25-Jan-2024 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f0b7a0d1 23-Jan-2024 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-hotfixes-stable


# cf79f291 22-Jan-2024 Maxime Ripard <mripard@kernel.org>

Merge v6.8-rc1 into drm-misc-fixes

Let's kickstart the 6.8 fix cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.8-rc1
# 9d64bf43 19-Jan-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
"Add Namhyung Kim as tools/perf/

Merge tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
"Add Namhyung Kim as tools/perf/ co-maintainer, we're taking turns
processing patches, switching roles from perf-tools to perf-tools-next
at each Linux release.

Data profiling:

- Associate samples that identify loads and stores with data
structures. This uses events available on Intel, AMD and others and
DWARF info:

# To get memory access samples in kernel for 1 second (on Intel)
$ perf mem record -a -K --ldlat=4 -- sleep 1

# Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
$ perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000' -- sleep 1

Then, amongst several modes of post processing, one can do things like:

$ perf report -s type,typeoff --hierarchy --group --stdio
...
#
# Samples: 10K of events 'cpu/mem-loads,ldlat=4/P, cpu/mem-stores/P, dummy:u'
# Event count (approx.): 602758064
#
# Overhead Data Type / Data Type Offset
# ........................... ............................
#
26.09% 3.28% 0.00% long unsigned int
26.09% 3.28% 0.00% long unsigned int +0 (no field)
18.48% 0.73% 0.00% struct page
10.83% 0.02% 0.00% struct page +8 (lru.next)
3.90% 0.28% 0.00% struct page +0 (flags)
3.45% 0.06% 0.00% struct page +24 (mapping)
0.25% 0.28% 0.00% struct page +48 (_mapcount.counter)
0.02% 0.06% 0.00% struct page +32 (index)
0.02% 0.00% 0.00% struct page +52 (_refcount.counter)
0.02% 0.01% 0.00% struct page +56 (memcg_data)
0.00% 0.01% 0.00% struct page +16 (lru.prev)
15.37% 17.54% 0.00% (stack operation)
15.37% 17.54% 0.00% (stack operation) +0 (no field)
11.71% 50.27% 0.00% (unknown)
11.71% 50.27% 0.00% (unknown) +0 (no field)

$ perf annotate --data-type
...
Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
============================================================================
samples offset size field
13 0 640 struct cfs_rq {
2 0 16 struct load_weight load {
2 0 8 unsigned long weight;
0 8 4 u32 inv_weight;
};
0 16 8 unsigned long runnable_weight;
0 24 4 unsigned int nr_running;
1 28 4 unsigned int h_nr_running;
...

$ perf annotate --data-type=page --group
Annotate type: 'struct page' in [kernel.kallsyms] (480 samples):
event[0] = cpu/mem-loads,ldlat=4/P
event[1] = cpu/mem-stores/P
event[2] = dummy:u
===================================================================================
samples offset size field
447 33 0 0 64 struct page {
108 8 0 0 8 long unsigned int flags;
319 13 0 8 40 union {
319 13 0 8 40 struct {
236 2 0 8 16 union {
236 2 0 8 16 struct list_head lru {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct {
236 1 0 8 8 void* __filler;
0 1 0 16 4 unsigned int mlock_count;
};
236 2 0 8 16 struct list_head buddy_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
236 2 0 8 16 struct list_head pcp_list {
236 1 0 8 8 struct list_head* next;
0 1 0 16 8 struct list_head* prev;
};
};
82 4 0 24 8 struct address_space* mapping;
1 7 0 32 8 union {
1 7 0 32 8 long unsigned int index;
1 7 0 32 8 long unsigned int share;
};
0 0 0 40 8 long unsigned int private;
};

This uses the existing annotate code, calling objdump to do the
disassembly, with improvements to avoid having this take too long,
but longer term a switch to a disassembler library, possibly
reusing code in the kernel will be pursued.

This is the initial implementation, please use it and report
impressions and bugs. Make sure the kernel-debuginfo packages match
the running kernel. The 'perf report' phase for non short perf.data
files may take a while.

There is a great article about it on LWN:

https://lwn.net/Articles/955709/ - "Data-type profiling for perf"

One last test I did while writing this text, on a AMD Ryzen 5950X,
using a distro kernel, while doing a simple 'find /' on an
otherwise idle system resulted in:

# uname -r
6.6.9-100.fc38.x86_64
# perf -vv | grep BPF_
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
bpf_skeletons: [ on ] # HAVE_BPF_SKEL
# rpm -qa | grep kernel-debuginfo
kernel-debuginfo-common-x86_64-6.6.9-100.fc38.x86_64
kernel-debuginfo-6.6.9-100.fc38.x86_64
#
# perf mem record -a --filter 'mem_op == load || mem_op == store, ip > 0x8000000000000000'
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.199 MB perf.data (2913 samples) ]
#
# ls -la perf.data
-rw-------. 1 root root 2346486 Jan 9 18:36 perf.data
# perf evlist
ibs_op//
dummy:u
# perf evlist -v
ibs_op//: type: 11, size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1
dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT, read_format: ID, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
#
# perf report -s type,typeoff --hierarchy --group --stdio
# Total Lost Samples: 0
#
# Samples: 2K of events 'ibs_op//, dummy:u'
# Event count (approx.): 1904553038
#
# Overhead Data Type / Data Type Offset
# ................... ............................
#
73.70% 0.00% (unknown)
73.70% 0.00% (unknown) +0 (no field)
3.01% 0.00% long unsigned int
3.00% 0.00% long unsigned int +0 (no field)
0.01% 0.00% long unsigned int +2 (no field)
2.73% 0.00% struct task_struct
1.71% 0.00% struct task_struct +52 (on_cpu)
0.38% 0.00% struct task_struct +2104 (rcu_read_unlock_special.b.blocked)
0.23% 0.00% struct task_struct +2100 (rcu_read_lock_nesting)
0.14% 0.00% struct task_struct +2384 ()
0.06% 0.00% struct task_struct +3096 (signal)
0.05% 0.00% struct task_struct +3616 (cgroups)
0.05% 0.00% struct task_struct +2344 (active_mm)
0.02% 0.00% struct task_struct +46 (flags)
0.02% 0.00% struct task_struct +2096 (migration_disabled)
0.01% 0.00% struct task_struct +24 (__state)
0.01% 0.00% struct task_struct +3956 (mm_cid_active)
0.01% 0.00% struct task_struct +1048 (cpus_ptr)
0.01% 0.00% struct task_struct +184 (se.group_node.next)
0.01% 0.00% struct task_struct +20 (thread_info.cpu)
0.00% 0.00% struct task_struct +104 (on_rq)
0.00% 0.00% struct task_struct +2456 (pid)
1.36% 0.00% struct module
0.59% 0.00% struct module +952 (kallsyms)
0.42% 0.00% struct module +0 (state)
0.23% 0.00% struct module +8 (list.next)
0.12% 0.00% struct module +216 (syms)
0.95% 0.00% struct inode
0.41% 0.00% struct inode +40 (i_sb)
0.22% 0.00% struct inode +0 (i_mode)
0.06% 0.00% struct inode +76 (i_rdev)
0.06% 0.00% struct inode +56 (i_security)
<SNIP>

perf top/report:

- Don't ignore job control, allowing control+Z + bg to work.

- Add s390 raw data interpretation for PAI (Processor Activity
Instrumentation) counters.

perf archive:

- Add new option '--all' to pack perf.data with DSOs.

- Add new option '--unpack' to expand tarballs.

Initialization speedups:

- Lazily initialize zstd streams to save memory when not using it.

- Lazily allocate/size mmap event copy.

- Lazy load kernel symbols in 'perf record'.

- Be lazier in allocating lost samples buffer in 'perf record'.

- Don't synthesize BPF events when disabled via the command line
(perf record --no-bpf-event).

Assorted improvements:

- Show note on AMD systems that the :p, :pp, :ppp and :P are all the
same, as IBS (Instruction Based Sampling) is used and it is
inherentely precise, not having levels of precision like in Intel
systems.

- When 'cycles' isn't available, fall back to the "task-clock" event
when not system wide, not to 'cpu-clock'.

- Add --debug-file option to redirect debug output, e.g.:

$ perf --debug-file /tmp/perf.log record -v true

- Shrink 'struct map' to under one cacheline by avoiding function
pointers for selecting if addresses are identity or DSO relative,
and using just a byte for some boolean struct members.

- Resolve the arch specific strerrno just once to use in
perf_env__arch_strerrno().

- Reduce memory for recording PERF_RECORD_LOST_SAMPLES event.

Assorted fixes:

- Fix the default 'perf top' usage on Intel hybrid systems, now it
starts with a browser showing the number of samples for Efficiency
(cpu_atom/cycles/P) and Performance (cpu_core/cycles/P). This
behaviour is similar on ARM64, with its respective set of
big.LITTLE processors.

- Fix segfault on build_mem_topology() error path.

- Fix 'perf mem' error on hybrid related to availability of mem event
in a PMU.

- Fix missing reference count gets (map, maps) in the db-export code.

- Avoid recursively taking env->bpf_progs.lock in the 'perf_env'
code.

- Use the newly introduced maps__for_each_map() to add missing
locking around iteration of 'struct map' entries.

- Parse NOTE segments until the build id is found, don't stop on the
first one, ELF files may have several such NOTE segments.

- Remove 'egrep' usage, its deprecated, use 'grep -E' instead.

- Warn first about missing libelf, not libbpf, that depends on
libelf.

- Use alternative to 'find ... -printf' as this isn't supported in
busybox.

- Address python 3.6 DeprecationWarning for string scapes.

- Fix memory leak in uniq() in libsubcmd.

- Fix man page formatting for 'perf lock'

- Fix some spelling mistakes.

perf tests:

- Fail shell tests that needs some symbol in perf itself if it is
stripped. These tests check if a symbol is resolved, if some hot
function is indeed detected by profiling, etc.

- The 'perf test sigtrap' test is currently failing on PREEMPT_RT,
skip it if sleeping spinlocks are detected (using BTF) and point to
the mailing list discussion about it. This test is also being
skipped on several architectures (powerpc, s390x, arm and aarch64)
due to other pending issues with intruction breakpoints.

- Adjust test case perf record offcpu profiling tests for s390.

- Fix 'Setup struct perf_event_attr' fails on s390 on z/VM guest,
addressing issues caused by the fallback from cycles to task-clock
done in this release.

- Fix mask for VG register in the user-regs test.

- Use shellcheck on 'perf test' shell scripts automatically to make
sure changes don't introduce things it flags as problematic.

- Add option to change objdump binary and allow it to be set via
'perf config'.

- Add basic 'perf script', 'perf list --json" and 'perf diff' tests.

- Basic branch counter support.

- Make DSO tests a suite rather than individual.

- Remove atomics from test_loop to avoid test failures.

- Fix call chain match on powerpc for the record+probe_libc_inet_pton
test.

- Improve Intel hybrid tests.

Vendor event files (JSON):

powerpc:

- Update datasource event name to fix duplicate events on IBM's
Power10.

- Add PVN for HX-C2000 CPU with Power8 Architecture.

Intel:

- Alderlake/rocketlake metric fixes.

- Update emeraldrapids events to v1.02.

- Update icelakex events to v1.23.

- Update sapphirerapids events to v1.17.

- Add skx, clx, icx and spr upi bandwidth metric.

AMD:

- Add Zen 4 memory controller events.

RISC-V:

- Add StarFive Dubhe-80 and Dubhe-90 JSON files.
https://www.starfivetech.com/en/site/cpu-u

- Add T-HEAD C9xx JSON file.
https://github.com/riscv-software-src/opensbi/blob/master/docs/platform/thead-c9xx.md

ARM64:

- Remove UTF-8 characters from cmn.json, that were causing build
failure in some distros.

- Add core PMU events and metrics for Ampere One X.

- Rename Ampere One's BPU_FLUSH_MEM_FAULT to GPC_FLUSH_MEM_FAULT

libperf:

- Rename several perf_cpu_map constructor names to clarify what they
really do.

- Ditto for some other methods, coping with some issues in their
semantics, like perf_cpu_map__empty() ->
perf_cpu_map__has_any_cpu_or_is_empty().

- Document perf_cpu_map__nr()'s behavior

perf stat:

- Exit if parse groups fails.

- Combine the -A/--no-aggr and --no-merge options.

- Fix help message for --metric-no-threshold option.

Hardware tracing:

ARM64 CoreSight:

- Bump minimum OpenCSD version to ensure a bugfix is present.

- Add 'T' itrace option for timestamp trace

- Set start vm addr of exectable file to 0 and don't ignore first
sample on the arm-cs-trace-disasm.py 'perf script'"

* tag 'perf-tools-for-v6.8-1-2024-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits)
MAINTAINERS: Add Namhyung as tools/perf/ co-maintainer
perf test: test case 'Setup struct perf_event_attr' fails on s390 on z/vm
perf db-export: Fix missing reference count get in call_path_from_sample()
perf tests: Add perf script test
libsubcmd: Fix memory leak in uniq()
perf TUI: Don't ignore job control
perf vendor events intel: Update sapphirerapids events to v1.17
perf vendor events intel: Update icelakex events to v1.23
perf vendor events intel: Update emeraldrapids events to v1.02
perf vendor events intel: Alderlake/rocketlake metric fixes
perf x86 test: Add hybrid test for conflicting legacy/sysfs event
perf x86 test: Update hybrid expectations
perf vendor events amd: Add Zen 4 memory controller events
perf stat: Fix hard coded LL miss units
perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event
perf env: Avoid recursively taking env->bpf_progs.lock
perf annotate: Add --insn-stat option for debugging
perf annotate: Add --type-stat option for debugging
perf annotate: Support event group display
perf annotate: Add --data-type option
...

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1
# 72b4ca7e 02-Nov-2023 Nick Forrington <nick.forrington@arm.com>

perf test: Remove atomics from test_loop to avoid test failures

The current use of atomics can lead to test failures, as tests (such as
tests/shell/record.sh) search for samples with "test_loop" as

perf test: Remove atomics from test_loop to avoid test failures

The current use of atomics can lead to test failures, as tests (such as
tests/shell/record.sh) search for samples with "test_loop" as the
top-most stack frame, but find frames related to the atomic operation
(e.g. __aarch64_ldadd4_relax).

This change simply removes the "count" variable, as it is not necessary.

Fixes: 1962ab6f6e0b39e4 ("perf test workload thloop: Make count increments atomic")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Nick Forrington <nick.forrington@arm.com>
Acked-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20231102162225.50028-1-nick.forrington@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


Revision tags: v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1
# 9a87ffc9 02-May-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.4 merge window.


# cdc780f0 26-Apr-2023 Jiri Kosina <jkosina@suse.cz>

Merge branch 'for-6.4/amd-sfh' into for-linus

- assorted functional fixes for amd-sfh driver (Basavaraj Natikar)


Revision tags: v6.3, v6.3-rc7
# ea68a3e9 11-Apr-2023 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in commit from drm-next (earlier in drm-intel-next):

1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")

In order to

Merge drm/drm-next into drm-intel-gt-next

Need to pull in commit from drm-next (earlier in drm-intel-next):

1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")

In order to merge following patch to drm-intel-gt-next:

https://patchwork.freedesktop.org/patch/530942/?series=114925&rev=6

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


Revision tags: v6.3-rc6, v6.3-rc5
# cecdd52a 28-Mar-2023 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catch up with 6.3-rc cycle...

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.3-rc4
# e752ab11 20-Mar-2023 Rob Clark <robdclark@chromium.org>

Merge remote-tracking branch 'drm/drm-next' into msm-next

Merge drm-next into msm-next to pick up external clk and PM dependencies
for improved a6xx GPU reset sequence.

Signed-off-by: Rob Clark <ro

Merge remote-tracking branch 'drm/drm-next' into msm-next

Merge drm-next into msm-next to pick up external clk and PM dependencies
for improved a6xx GPU reset sequence.

Signed-off-by: Rob Clark <robdclark@chromium.org>

show more ...


Revision tags: v6.3-rc3
# d26a3a6c 17-Mar-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.3-rc2' into next

Merge with mainline to get of_property_present() and other newer APIs.


# b3c9a041 13-Mar-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get latest upstream.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# a1eccc57 13-Mar-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.3-rc1 and sync with the other DRM trees.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.3-rc2, v6.3-rc1
# 0df82189 23-Feb-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:
"Miscellaneous:

- Add Ian Rogers

Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:
"Miscellaneous:

- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.

- Add support for retire latency feature (pipeline stall of a
instruction compared to the previous one, in cycles) present on
some Intel processors.

- Add 'perf c2c' report option to show false sharing with adjacent
cachelines, to be used in machines with cacheline prefetching,
where accesses to a cacheline brings the next one too.

- Skip 'perf test bpf' when the required kernel-debuginfo package
isn't installed.

- Avoid d3-flame-graph package dependency in 'perf script flamegraph',
making this feature more generally available.

- Add JSON metric events to present CPI stall cycles in Power10.

- Assorted improvements/refactorings on the JSON metrics parsing
code.

perf lock contention:

- Add -o/--lock-owner option:

$ sudo ./perf lock contention -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes

Total time: 4.766 [sec]

4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner

403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome

The owner is unknown in most cases. Filtering only for the
mutex locks, it will more likely get the owners.

- -S/--callstack-filter is to limit display entries having the given
string in the callstack:

$ sudo ./perf lock contention -abv -S net sleep 1
...
contended total wait max wait avg wait type caller

5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd

Please note that to have the -b option (BPF) working above one has
to build with BUILD_BPF_SKEL=1.

- Add more 'perf test' entries to test these new features.

perf script:

- Add 'cgroup' field for 'perf script' output:

$ perf record --all-cgroups -- true
$ perf script -F comm,pid,cgroup
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...
true 337112 /user.slice/user-657345.slice/user@657345.service/...

- Add support for showing branch speculation information in 'perf
script' and in the 'perf report' raw dump (-D).

perf record:

- Fix 'perf record' segfault with --overwrite and --max-size.

perf test/bench:

- Switch basic BPF filtering test to use syscall tracepoint to avoid
the variable number of probes inserted when using the previous
probe point (do_epoll_wait) that happens on different CPU
architectures.

- Fix DWARF unwind test by adding non-inline to expected function in
a backtrace.

- Use 'grep -c' where the longer form 'grep | wc -l' was being used.

- Add getpid and execve benchmarks to 'perf bench syscall'.

Intel PT:

- Add support for synthesizing "cycle" events from Intel PT traces as
we support "instruction" events when Intel PT CYC packets are
available. This enables much more accurate profiles than when using
the regular 'perf record -e cycles' (the default) when the workload
lasts for very short periods (<10ms).

- .plt symbol handling improvements, better handling IBT (in the past
MPX) done in the context of decoding Intel PT processor traces,
IFUNC symbols on x86_64, static executables, understanding .plt.got
symbols on x86_64.

- Add a 'perf test' to test symbol resolution, part of the .plt
improvements series, this tests things like symbol size in contexts
where only the symbol start is available (kallsyms), etc.

- Better handle auxtrace/Intel PT data when using pipe mode (perf
record sleep 1|perf report).

- Fix symbol lookup with kcore with multiple segments match stext,
getting the symbol resolution to just show DSOs as unknown.

ARM:

- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
Macrocell v4).

- Ensure ARM64 CoreSight timestamps don't go backwards.

- Document that ARM64 SPE (Statistical Profiling Extension) is used
with 'perf c2c/mem'.

- Add raw decoding for ARM64 SPEv1.2 previous branch address.

- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1,
TLB, cache, branch, PE utilization and instruction mix metrics.

- Update decoder code for OpenCSD version 1.4, on ARM64 systems.

- Fix command line auto-complete of CPU events on aarch64.

Build:

- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked,
as several tests use tracepoints, those should be skipped.

- More fallout fixes for the removal of tools/lib/traceevent/.

- Fix build error when linking with libpfm"

* tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits)
perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc
perf intel-pt: Synthesize cycle events
perf c2c: Add report option to show false sharing in adjacent cachelines
perf record: Fix segfault with --overwrite and --max-size
perf stat: Avoid merging/aggregating metric counts twice
perf tools: Fix perf tool build error in util/pfm.c
perf tools: Fix auto-complete on aarch64
perf lock contention: Support old rw_semaphore type
perf lock contention: Add -o/--lock-owner option
perf lock contention: Fix to save callstack for the default modified
perf test bpf: Skip test if kernel-debuginfo is not present
perf probe: Update the exit error codes in function try_to_find_probe_trace_event
perf script: Fix missing Retire Latency fields option documentation
perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT
perf test x86: Support the retire_lat (Retire Latency) sample_type check
perf test bpf: Check for libtraceevent support
perf script: Support Retire Latency
perf report: Support Retire Latency
perf lock contention: Support filters for different aggregation
...

show more ...


# 7ae9fb1b 21-Feb-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.3 merge window.


12