1.. SPDX-License-Identifier: GPL-2.0 2.. _vpa-dtl: 3 4=================================== 5DTL (Dispatch Trace Log) 6=================================== 7 8Athira Rajeev, 19 April 2025 9 10.. contents:: 11 :depth: 3 12 13 14Basic overview 15============== 16 17The pseries Shared Processor Logical Partition(SPLPAR) machines can 18retrieve a log of dispatch and preempt events from the hypervisor 19using data from Disptach Trace Log(DTL) buffer. With this information, 20user can retrieve when and why each dispatch & preempt has occurred. 21The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters 22via perf. 23 24Infrastructure used 25=================== 26 27The VPA DTL PMU counters do not interrupt on overflow or generate any 28PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer 29nterval can be provided by user via sample_period field in nano seconds. 30vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch 31Trace Log) contains information about dispatch/preempt, enqueue time etc. 32We directly copy the DTL buffer data as part of auxiliary buffer and it 33will be processed later. This will avoid time taken to create samples 34in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL) 35entries makes use of AUX support in perf infrastructure. On the tools side, 36this data is made available as PERF_RECORD_AUXTRACE records. 37 38To correlate each DTL entry with other events across CPU's, an auxtrace_queue 39is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers. 40All auxtrace queues is maintained in auxtrace heap. The queues are sorted 41based on timestamp. When the different PERF_RECORD_XX records are processed, 42compare the timestamp of perf record with timestamp of top element in the 43auxtrace heap so that DTL events can be co-related with other events 44Process the auxtrace queue if the timestamp of element from heap is 45lower than timestamp from entry in perf record. Sometimes it could happen that 46one buffer is only partially processed. if the timestamp of occurrence of 47another event is more than currently processed element in the queue, it will 48move on to next perf record. So keep track of position of buffer to continue 49processing next time. Update the timestamp of the auxtrace heap with the timestamp 50of last processed entry from the auxtrace buffer. 51 52This infrastructure ensures dispatch trace log entries can be correlated 53and presented along with other events like sched. 54 55vpa-dtl PMU example usage 56========================= 57 58.. code-block:: sh 59 60 # ls /sys/devices/vpa_dtl/ 61 events format perf_event_mux_interval_ms power subsystem type uevent 62 63 64To capture the DTL data using perf record: 65.. code-block:: sh 66 67 # ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 68 69The result can be interpreted using perf record. Snippet of perf report -D 70 71.. code-block:: sh 72 73 # ./perf report -D 74 75There are different PERF_RECORD_XX records. In that records corresponding to 76auxtrace buffers includes: 77 781. PERF_RECORD_AUX 79 Conveys that new data is available in AUX area 80 812. PERF_RECORD_AUXTRACE_INFO 82 Describes offset and size of auxtrace data in the buffers 83 843. PERF_RECORD_AUXTRACE 85 This is the record that defines the auxtrace data which here in case of 86 vpa-dtl pmu is dispatch trace log data. 87 88Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump 89 90.. code-block:: sh 91 920 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0 93. 94. ... VPA DTL PMU data: size 1680 bytes, entries is 35 95. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000 96. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773 97. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437 98. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709 99. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243 100. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648 101. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446 102. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126 103. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665 104 105Above is representation of dtl entry of below format: 106 107struct dtl_entry { 108 u8 dispatch_reason; 109 u8 preempt_reason; 110 u16 processor_id; 111 u32 enqueue_to_dispatch_time; 112 u32 ready_to_enqueue_time; 113 u32 waiting_to_ready_time; 114 u64 timebase; 115 u64 fault_addr; 116 u64 srr0; 117 u64 srr1; 118 119}; 120 121First two fields represent the dispatch reason and preempt reason. The post 122processing of PERF_RECORD_AUXTRACE records will translate to meaningful data 123for user to consume. 124 125Visualize the dispatch trace log entries with perf report 126========================================================= 127 128.. code-block:: sh 129 130 # ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1 131 [ perf record: Woken up 1 times to write data ] 132 [ perf record: Captured and wrote 0.300 MB perf.data ] 133 134 # ./perf report 135 # Samples: 321 of event 'vpa-dtl' 136 # Event count (approx.): 321 137 # 138 # Children Self Command Shared Object Symbol 139 # ........ ........ ....... ................. .............................. 140 # 141 100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace 142 143Visualize the dispatch trace log entries with perf script 144========================================================= 145 146.. code-block:: sh 147 148 # ./perf script 149 migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009 150 migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10 151 migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns] 152 migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120 153 :256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) 154 :256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms]) 155 perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns] 156 perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010 157