#
891e8abe |
| 22-Sep-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'perf-tools-for-v6.12-1-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF + BTF to collect and
Merge tag 'perf-tools-for-v6.12-1-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF + BTF to collect and pretty print syscall and tracepoint arguments in 'perf trace', done as an GSoC activity
- Data-type profiling improvements:
- Cache debuginfo to speed up data type resolution
- Add the 'typecln' sort order, to show which cacheline in a target is hot or cold. The following shows members in the cfs_rq's first cache line:
$ perf report -s type,typecln,typeoff -H ... - 2.67% struct cfs_rq + 1.23% struct cfs_rq: cache-line 2 + 0.57% struct cfs_rq: cache-line 4 + 0.46% struct cfs_rq: cache-line 6 - 0.41% struct cfs_rq: cache-line 0 0.39% struct cfs_rq +0x14 (h_nr_running) 0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
- When a typedef resolves to a unnamed struct, use the typedef name
- When a struct has just one basic type field (int, etc), resolve the type sort order to the name of the struct, not the type of the field
- Support type folding/unfolding in the data-type annotation TUI
- Fix bitfields offsets and sizes
- Initial support for PowerPC, using libcapstone and the usual objdump disassembly parsing routines
- Add support for disassembling and addr2line using the LLVM libraries, speeding up those operations
- Support --addr2line option in 'perf script' as with other tools
- Intel branch counters (LBR event logging) support, only available in recent Intel processors, for instance, the new "brcntr" field can be asked from 'perf script' to print the information collected from this feature:
$ perf script -F +brstackinsn,+brcntr
# Branch counter abbr list: # branch-instructions:ppp = A # branch-misses = B # '-' No event occurs # '+' Event occurrences may be lost due to branch counter saturated tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit) f3+31: 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5] 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC 0000000000401766 insn: 8b 45 fc 0000000000401769 insn: 83 e0 01 000000000040176c insn: 85 c0 000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC 0000000000401776 insn: 83 45 fc 01 000000000040177a insn: 81 7d fc 0f 27 00 00 0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC
- Support Timed PEBS (Precise Event-Based Sampling), a recent hardware feature in Intel processors
- Add 'perf ftrace profile' subcommand, using ftrace's function-graph tracer so that users can see the total, average, max execution time as well as the number of invocations easily, for instance:
$ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \ perf stat -e cycles -C1 true 2> /dev/null | head # Total (us) Avg (us) Max (us) Count Function 65.611 65.611 65.611 1 __x64_sys_perf_event_open 30.527 30.527 30.527 1 anon_inode_getfile 30.260 30.260 30.260 1 __anon_inode_getfile 29.700 29.700 29.700 1 alloc_file_pseudo 17.578 17.578 17.578 1 d_alloc_pseudo 17.382 17.382 17.382 1 __d_alloc 16.738 16.738 16.738 1 kmem_cache_alloc_lru 15.686 15.686 15.686 1 perf_event_alloc 14.012 7.006 11.264 2 obj_cgroup_charge
- 'perf sched timehist' improvements, including the addition of priority showing/filtering command line options
- Varios improvements to the 'perf probe', including 'perf test' regression testings
- Introduce the 'perf check', initially to check if some feature is in place, using it in 'perf test'
- Various fixes for 32-bit systems
- Address more leak sanitizer failures
- Fix memory leaks (LBR, disasm lock ops, etc)
- More reference counting fixes (branch_info, etc)
- Constify 'struct perf_tool' parameters to improve code generation and reduce the chances of having its internals changed, which isn't expected
- More constifications in various other places
- Add more build tests, including for JEVENTS
- Add more 'perf test' entries ('perf record LBR', pipe/inject, --setup-filter, 'perf ftrace', 'cgroup sampling', etc)
- Inject build ids for all entries in a call chain in 'perf inject', not just for the main sample
- Improve the BPF based sample filter, allowing root to setup filters in bpffs that then can be used by non-root users
- Allow filtering by cgroups with the BPF based sample filter
- Allow a more compact way for 'perf mem report' using the -T/--type-profile and also provide a --sort option similar to the one in 'perf report', 'perf top', to setup the sort order manually
- Fix --group behavior in 'perf annotate' when leader has no samples, where it was not showing anything even when other events in the group had samples
- Fix spinlock and rwlock accounting in 'perf lock contention'
- Fix libsubcmd fixdep Makefile dependencies
- Improve 'perf ftrace' error message when ftrace isn't available
- Update various Intel JSON vendor event files
- ARM64 CoreSight hardware tracing infrastructure improvements, mostly not visible to users
- Update power10 JSON events
* tag 'perf-tools-for-v6.12-1-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (310 commits) perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space perf env: Find correct branch counter info on hybrid perf evlist: Print hint for group tools: Drop nonsensical -O6 perf pmu: To info add event_type_desc perf evsel: Add accessor for tool_event perf pmus: Fake PMU clean up perf list: Avoid potential out of bounds memory read perf help: Fix a typo ("bellow") perf ftrace: Detect whether ftrace is enabled on system perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex perf build: Require at least clang 16.0.6 to build BPF skeletons perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace perf parse-events: Remove duplicated include in parse-events.c perf callchain: Allow symbols to be optional when resolving a callchain perf inject: Lazy build-id mmap2 event insertion perf inject: Add new mmap2-buildid-all option perf inject: Fix build ID injection perf annotate-data: Add pr_debug_scope() ...
show more ...
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
#
1f2b7fbb |
| 13-Aug-2024 |
Kan Liang <kan.liang@linux.intel.com> |
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available f
perf annotate: Save branch counters for each block
When annotating a basic block, it's useful to display the occurrences of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64 space.
Because the saturation of a branch counter is small, e.g., for Intel Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter once saturated. That can be used to indicate a potential event lost because of the saturation.
Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240813160208.2493643-5-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
#
a23e1966 |
| 15-Jul-2024 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.11 merge window.
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
#
6f47c7ae |
| 28-May-2024 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.9' into next
Sync up with the mainline to bring in the new cleanup API.
|
Revision tags: v6.10-rc1 |
|
#
60a2f25d |
| 16-May-2024 |
Tvrtko Ursulin <tursulin@ursulin.net> |
Merge drm/drm-next into drm-intel-gt-next
Some display refactoring patches are needed in order to allow conflict- less merging.
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
|
Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
#
0ea5c948 |
| 15-Jan-2024 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Backmerge to bring Xe driver to drm-intel-next.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
#
03c11eb3 |
| 14-Feb-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@k
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
6b93f350 |
| 08-Jan-2024 |
Jiri Kosina <jkosina@suse.com> |
Merge branch 'for-6.8/amd-sfh' into for-linus
- addition of new interfaces to export User presence information and Ambient light from amd-sfh to other drivers within the kernel (Basavaraj Natika
Merge branch 'for-6.8/amd-sfh' into for-linus
- addition of new interfaces to export User presence information and Ambient light from amd-sfh to other drivers within the kernel (Basavaraj Natikar)
show more ...
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2 |
|
#
3bf3e21c |
| 15-Nov-2023 |
Maxime Ripard <mripard@kernel.org> |
Merge drm/drm-next into drm-misc-next
Let's kickstart the v6.8 release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
5d2d4a9f |
| 15-Nov-2023 |
Peter Zijlstra <peterz@infradead.org> |
Merge branch 'tip/perf/urgent'
Avoid conflicts, base on fixes.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
Revision tags: v6.7-rc1 |
|
#
7ab89417 |
| 03-Nov-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim: "Build:
- Compile BPF programs by defaul
Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim: "Build:
- Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on.
This can be disabled by passing BUILD_BPF_SKEL=0 to make.
- Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally.
perf record:
- Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list.
- Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint.
perf lock contention:
- Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option).
$ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup
835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service
- Add -G/--cgroup-filter option to see contention only for given cgroups.
This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr.
$ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller
8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a
- Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map.
- Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after.
perf kwork:
- Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items.
- Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet).
$ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%]
PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP>
- Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below:
$ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP>
perf bench:
- Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups.
Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches.
$ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes
Total time: 0.307 [sec]
3.078180 usecs/op 324867 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000':
200,026 context-switches 63 cgroup-switches
0.321637922 seconds time elapsed
You can see small number of cgroup-switches because both write and read tasks are in the same cgroup.
$ sudo mkdir /sys/fs/cgroup/{AAA,BBB}
$ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes
Total time: 0.351 [sec]
3.512990 usecs/op 284657 ops/sec
Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB':
200,020 context-switches 200,019 cgroup-switches
0.365034567 seconds time elapsed
Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more.
- Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work.
perf test:
- Fix various shellcheck issues on the tests written in shell script.
- Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs.
Event parsing:
- Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case.
- Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost.
- Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits.
For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure.
$ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
Event metrics:
- Add "Compat" regex to match event with multiple identifiers.
- Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne.
Misc:
- Assorted memory leak fixes and footprint reduction.
- Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily.
- Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer.
- Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events.
- Update bash shell completion for events and metrics"
* tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ...
show more ...
|
Revision tags: v6.6 |
|
#
d47d876d |
| 25-Oct-2023 |
Ian Rogers <irogers@google.com> |
perf callchain: Make display use of branch_type_stat const
Display code doesn't modify the branch_type_stat so switch uses to const. This is done to aid refactoring struct callchain_list where curre
perf callchain: Make display use of branch_type_stat const
Display code doesn't modify the branch_type_stat so switch uses to const. This is done to aid refactoring struct callchain_list where current the branch_type_stat is embedded even if not used.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: liuwenyu <liuwenyu7@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
show more ...
|
Revision tags: v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1 |
|
#
9a87ffc9 |
| 02-May-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.4 merge window.
|
#
cdc780f0 |
| 26-Apr-2023 |
Jiri Kosina <jkosina@suse.cz> |
Merge branch 'for-6.4/amd-sfh' into for-linus
- assorted functional fixes for amd-sfh driver (Basavaraj Natikar)
|
Revision tags: v6.3, v6.3-rc7 |
|
#
ea68a3e9 |
| 11-Apr-2023 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Need to pull in commit from drm-next (earlier in drm-intel-next):
1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")
In order to
Merge drm/drm-next into drm-intel-gt-next
Need to pull in commit from drm-next (earlier in drm-intel-next):
1eca0778f4b3 ("drm/i915: add struct i915_dsm to wrap dsm members together")
In order to merge following patch to drm-intel-gt-next:
https://patchwork.freedesktop.org/patch/530942/?series=114925&rev=6
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
Revision tags: v6.3-rc6, v6.3-rc5 |
|
#
cecdd52a |
| 28-Mar-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catch up with 6.3-rc cycle...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v6.3-rc4 |
|
#
e752ab11 |
| 20-Mar-2023 |
Rob Clark <robdclark@chromium.org> |
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <ro
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
show more ...
|
Revision tags: v6.3-rc3 |
|
#
d26a3a6c |
| 17-Mar-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.3-rc2' into next
Merge with mainline to get of_property_present() and other newer APIs.
|
#
b3c9a041 |
| 13-Mar-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get latest upstream.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
a1eccc57 |
| 13-Mar-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v6.3-rc1 and sync with the other DRM trees.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.3-rc2, v6.3-rc1 |
|
#
0df82189 |
| 23-Feb-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo: "Miscellaneous:
- Add Ian Rogers
Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo: "Miscellaneous:
- Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
- Add support for retire latency feature (pipeline stall of a instruction compared to the previous one, in cycles) present on some Intel processors.
- Add 'perf c2c' report option to show false sharing with adjacent cachelines, to be used in machines with cacheline prefetching, where accesses to a cacheline brings the next one too.
- Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed.
- Avoid d3-flame-graph package dependency in 'perf script flamegraph', making this feature more generally available.
- Add JSON metric events to present CPI stall cycles in Power10.
- Assorted improvements/refactorings on the JSON metrics parsing code.
perf lock contention:
- Add -o/--lock-owner option:
$ sudo ./perf lock contention -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome
The owner is unknown in most cases. Filtering only for the mutex locks, it will more likely get the owners.
- -S/--callstack-filter is to limit display entries having the given string in the callstack:
$ sudo ./perf lock contention -abv -S net sleep 1 ... contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Please note that to have the -b option (BPF) working above one has to build with BUILD_BPF_SKEL=1.
- Add more 'perf test' entries to test these new features.
perf script:
- Add 'cgroup' field for 'perf script' output:
$ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/...
- Add support for showing branch speculation information in 'perf script' and in the 'perf report' raw dump (-D).
perf record:
- Fix 'perf record' segfault with --overwrite and --max-size.
perf test/bench:
- Switch basic BPF filtering test to use syscall tracepoint to avoid the variable number of probes inserted when using the previous probe point (do_epoll_wait) that happens on different CPU architectures.
- Fix DWARF unwind test by adding non-inline to expected function in a backtrace.
- Use 'grep -c' where the longer form 'grep | wc -l' was being used.
- Add getpid and execve benchmarks to 'perf bench syscall'.
Intel PT:
- Add support for synthesizing "cycle" events from Intel PT traces as we support "instruction" events when Intel PT CYC packets are available. This enables much more accurate profiles than when using the regular 'perf record -e cycles' (the default) when the workload lasts for very short periods (<10ms).
- .plt symbol handling improvements, better handling IBT (in the past MPX) done in the context of decoding Intel PT processor traces, IFUNC symbols on x86_64, static executables, understanding .plt.got symbols on x86_64.
- Add a 'perf test' to test symbol resolution, part of the .plt improvements series, this tests things like symbol size in contexts where only the symbol start is available (kallsyms), etc.
- Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report).
- Fix symbol lookup with kcore with multiple segments match stext, getting the symbol resolution to just show DSOs as unknown.
ARM:
- Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace Macrocell v4).
- Ensure ARM64 CoreSight timestamps don't go backwards.
- Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'.
- Add raw decoding for ARM64 SPEv1.2 previous branch address.
- Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB, cache, branch, PE utilization and instruction mix metrics.
- Update decoder code for OpenCSD version 1.4, on ARM64 systems.
- Fix command line auto-complete of CPU events on aarch64.
Build:
- Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as several tests use tracepoints, those should be skipped.
- More fallout fixes for the removal of tools/lib/traceevent/.
- Fix build error when linking with libpfm"
* tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits) perf tests stat_all_metrics: Change true workload to sleep workload for system wide check perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc perf intel-pt: Synthesize cycle events perf c2c: Add report option to show false sharing in adjacent cachelines perf record: Fix segfault with --overwrite and --max-size perf stat: Avoid merging/aggregating metric counts twice perf tools: Fix perf tool build error in util/pfm.c perf tools: Fix auto-complete on aarch64 perf lock contention: Support old rw_semaphore type perf lock contention: Add -o/--lock-owner option perf lock contention: Fix to save callstack for the default modified perf test bpf: Skip test if kernel-debuginfo is not present perf probe: Update the exit error codes in function try_to_find_probe_trace_event perf script: Fix missing Retire Latency fields option documentation perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT perf test x86: Support the retire_lat (Retire Latency) sample_type check perf test bpf: Check for libtraceevent support perf script: Support Retire Latency perf report: Support Retire Latency perf lock contention: Support filters for different aggregation ...
show more ...
|
#
7ae9fb1b |
| 21-Feb-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.3 merge window.
|
Revision tags: v6.2, v6.2-rc8, v6.2-rc7 |
|
#
6ade6c64 |
| 02-Feb-2023 |
Sandipan Das <sandipan.das@amd.com> |
perf script: Show branch speculation info
Show the branch speculation info if provided by the branch recording hardware feature. This can be useful for optimizing code further.
The speculation info
perf script: Show branch speculation info
Show the branch speculation info if provided by the branch recording hardware feature. This can be useful for optimizing code further.
The speculation info is appended to the end of the list of fields so any existing tools that use "/" as a delimiter for access fields via an index remain unaffected. Also show "-" instead of "N/A" when speculation info is unavailable because "/" is used as the field separator.
E.g.
$ perf record -j any,u,save_type ./test_branch $ perf script --fields brstacksym
Before:
[...] check_match+0x60/strcmp+0x0/P/-/-/0/CALL do_lookup_x+0x3c5/check_match+0x0/P/-/-/0/CALL [...]
After:
[...] check_match+0x60/strcmp+0x0/P/-/-/0/CALL/NON_SPEC_CORRECT_PATH do_lookup_x+0x3c5/check_match+0x0/P/-/-/0/CALL/NON_SPEC_CORRECT_PATH [...]
The bitfield swapping scheme used duing sample parsing has changed because of the addition of new branch flags, namely "spec", "new_type" and "priv". Earlier, these were all part of the "reserved" field but now, each of these fields get swapped separately. Change the expected flag values accordingly for the test to pass.
E.g.
$ perf test -v 27
Before:
27: Sample parsing : --- start --- test child forked, pid 61979 parsing failed for sample_type 0x800 test child finished with -1 ---- end ---- Sample parsing: FAILED!
After:
27: Sample parsing : --- start --- test child forked, pid 63293 test child finished with 0 ---- end ---- Sample parsing: Ok
Signed-off-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: x86@kernel.org Link: https://lore.kernel.org/r/56e272583552526e999ba0b536ac009ae3613966.1675333809.git.sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v6.2-rc6, v6.2-rc5 |
|
#
6f849817 |
| 19-Jan-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get DRM accelerator infrastructure, which is required by ipuv driver.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.2-rc4 |
|
#
407da561 |
| 10-Jan-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.2-rc3' into next
Merge with mainline to bring in timer_shutdown_sync() API.
|