xref: /linux/Documentation/arch/powerpc/vpa-dtl.rst (revision 30bbcb44707a97fcb62246bebc8b413b5ab293f8)
1.. SPDX-License-Identifier: GPL-2.0
2.. _vpa-dtl:
3
4===================================
5DTL (Dispatch Trace Log)
6===================================
7
8Athira Rajeev, 19 April 2025
9
10.. contents::
11    :depth: 3
12
13
14Basic overview
15==============
16
17The pseries Shared Processor Logical Partition(SPLPAR) machines can
18retrieve a log of dispatch and preempt events from the hypervisor
19using data from Disptach Trace Log(DTL) buffer. With this information,
20user can retrieve when and why each dispatch & preempt has occurred.
21The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
22via perf.
23
24Infrastructure used
25===================
26
27The VPA DTL PMU counters do not interrupt on overflow or generate any
28PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
29nterval can be provided by user via sample_period field in nano seconds.
30vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
31Trace Log) contains information about dispatch/preempt, enqueue time etc.
32We directly copy the DTL buffer data as part of auxiliary buffer and it
33will be processed later. This will avoid time taken to create samples
34in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
35entries makes use of AUX support in perf infrastructure. On the tools side,
36this data is made available as PERF_RECORD_AUXTRACE records.
37
38To correlate each DTL entry with other events across CPU's, an auxtrace_queue
39is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
40All auxtrace queues is maintained in auxtrace heap. The queues are sorted
41based on timestamp. When the different PERF_RECORD_XX records are processed,
42compare the timestamp of perf record with timestamp of top element in the
43auxtrace heap so that DTL events can be co-related with other events
44Process the auxtrace queue if the timestamp of element from heap is
45lower than timestamp from entry in perf record. Sometimes it could happen that
46one buffer is only partially processed. if the timestamp of occurrence of
47another event is more than currently processed element in the queue, it will
48move on to next perf record. So keep track of position of buffer to continue
49processing next time. Update the timestamp of the auxtrace heap with the timestamp
50of last processed entry from the auxtrace buffer.
51
52This infrastructure ensures dispatch trace log entries can be correlated
53and presented along with other events like sched.
54
55vpa-dtl PMU example usage
56=========================
57
58.. code-block:: sh
59
60  # ls /sys/devices/vpa_dtl/
61  events  format  perf_event_mux_interval_ms  power  subsystem  type  uevent
62
63
64To capture the DTL data using perf record:
65.. code-block:: sh
66
67  # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
68
69The result can be interpreted using perf record. Snippet of perf report -D
70
71.. code-block:: sh
72
73  # ./perf report -D
74
75There are different PERF_RECORD_XX records. In that records corresponding to
76auxtrace buffers includes:
77
781. PERF_RECORD_AUX
79   Conveys that new data is available in AUX area
80
812. PERF_RECORD_AUXTRACE_INFO
82   Describes offset and size of auxtrace data in the buffers
83
843. PERF_RECORD_AUXTRACE
85   This is the record that defines the auxtrace data which here in case of
86   vpa-dtl pmu is dispatch trace log data.
87
88Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
89
90.. code-block:: sh
91
920 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690  offset: 0  ref: 0  idx: 0  tid: -1  cpu: 0
93.
94. ... VPA DTL PMU data: size 1680 bytes, entries is 35
95.  00000000: boot_tb: 21349649546353231, tb_freq: 512000000
96.  00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
97.  00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
98.  00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
99.  000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
100.  000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
101.  00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
102.  00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
103.  00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
104
105Above is representation of dtl entry of below format:
106
107struct dtl_entry {
108        u8      dispatch_reason;
109        u8      preempt_reason;
110        u16     processor_id;
111        u32     enqueue_to_dispatch_time;
112        u32     ready_to_enqueue_time;
113        u32     waiting_to_ready_time;
114        u64     timebase;
115        u64     fault_addr;
116        u64     srr0;
117        u64     srr1;
118
119};
120
121First two fields represent the dispatch reason and preempt reason. The post
122processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
123for user to consume.
124
125Visualize the dispatch trace log entries with perf report
126=========================================================
127
128.. code-block:: sh
129
130  # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
131  [ perf record: Woken up 1 times to write data ]
132  [ perf record: Captured and wrote 0.300 MB perf.data ]
133
134  # ./perf report
135  # Samples: 321  of event 'vpa-dtl'
136  # Event count (approx.): 321
137  #
138  # Children      Self  Command  Shared Object      Symbol
139  # ........  ........  .......  .................  ..............................
140  #
141     100.00%   100.00%  swapper  [kernel.kallsyms]  [k] plpar_hcall_norets_notrace
142
143Visualize the dispatch trace log entries with perf script
144=========================================================
145
146.. code-block:: sh
147
148   # ./perf script
149     migration/9      67 [009] 105373.359903:                     sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
150     migration/9      67 [009] 105373.359904:               sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
151     migration/9      67 [009] 105373.359907:               sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
152     migration/9      67 [009] 105373.359908:                     sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
153            :256     256 [016] 105373.359913:                                vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854,                        ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
154            :256     256 [017] 105373.360012:                                vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236,                         ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
155            perf   13418 [010] 105373.360048:               sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
156            perf   13418 [010] 105373.360052:                     sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010
157