1.. SPDX-License-Identifier: GPL-2.0 2.. _imc: 3 4=================================== 5IMC (In-Memory Collection Counters) 6=================================== 7 8Anju T Sudhakar, 10 May 2019 9 10.. contents:: 11 :depth: 3 12 13 14Basic overview 15============== 16 17IMC (In-Memory collection counters) is a hardware monitoring facility that 18collects large numbers of hardware performance events at Nest level (these are 19on-chip but off-core), Core level and Thread level. 20 21The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC 22(On-Chip Controller) complex. The microcode collects the counter data and moves 23the nest IMC counter data to memory. 24 25The Core and Thread IMC PMU counters are handled in the core. Core level PMU 26counters give us the IMC counters' data per core and thread level PMU counters 27give us the IMC counters' data per CPU thread. 28 29OPAL obtains the IMC PMU and supported events information from the IMC Catalog 30and passes on to the kernel via the device tree. The event's information 31contains: 32 33- Event name 34- Event Offset 35- Event description 36 37and possibly also: 38 39- Event scale 40- Event unit 41 42Some PMUs may have a common scale and unit values for all their supported 43events. For those cases, the scale and unit properties for those events must be 44inherited from the PMU. 45 46The event offset in the memory is where the counter data gets accumulated. 47 48IMC catalog is available at: 49 https://github.com/open-power/ima-catalog 50 51The kernel discovers the IMC counters information in the device tree at the 52`imc-counters` device node which has a compatible field 53`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs 54and their event's information and register the PMU and its attributes in the 55kernel. 56 57IMC example usage 58================= 59 60.. code-block:: sh 61 62 # perf list 63 [...] 64 nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] 65 nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] 66 [...] 67 core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 68 core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 69 [...] 70 thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 71 thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 72 73To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: 74 75.. code-block:: sh 76 77 # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket 78 79To see non-idle instructions for core 0: 80 81.. code-block:: sh 82 83 # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 84 85To see non-idle instructions for a "make": 86 87.. code-block:: sh 88 89 # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make 90 91 92IMC Trace-mode 93=============== 94 95POWER9 supports two modes for IMC which are the Accumulation mode and Trace 96mode. In Accumulation mode, event counts are accumulated in system Memory. 97Hypervisor then reads the posted counts periodically or when requested. In IMC 98Trace mode, the 64 bit trace SCOM value is initialized with the event 99information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event 100to be monitored and the sampling duration. On each overflow in the CPMCxSEL, 101hardware snapshots the program counter along with event counts and writes into 102memory pointed by LDBAR. 103 104LDBAR is a 64 bit special purpose per thread register, it has bits to indicate 105whether hardware is configured for accumulation or trace mode. 106 107LDBAR Register Layout 108--------------------- 109 110 +-------+----------------------+ 111 | 0 | Enable/Disable | 112 +-------+----------------------+ 113 | 1 | 0: Accumulation Mode | 114 | +----------------------+ 115 | | 1: Trace Mode | 116 +-------+----------------------+ 117 | 2:3 | Reserved | 118 +-------+----------------------+ 119 | 4-6 | PB scope | 120 +-------+----------------------+ 121 | 7 | Reserved | 122 +-------+----------------------+ 123 | 8:50 | Counter Address | 124 +-------+----------------------+ 125 | 51:63 | Reserved | 126 +-------+----------------------+ 127 128TRACE_IMC_SCOM bit representation 129--------------------------------- 130 131 +-------+------------+ 132 | 0:1 | SAMPSEL | 133 +-------+------------+ 134 | 2:33 | CPMC_LOAD | 135 +-------+------------+ 136 | 34:40 | CPMC1SEL | 137 +-------+------------+ 138 | 41:47 | CPMC2SEL | 139 +-------+------------+ 140 | 48:50 | BUFFERSIZE | 141 +-------+------------+ 142 | 51:63 | RESERVED | 143 +-------+------------+ 144 145CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the 146event to count. BUFFERSIZE indicates the memory range. On each overflow, 147hardware snapshots the program counter along with event counts and updates the 148memory and reloads the CMPC_LOAD value for the next sampling duration. IMC 149hardware does not support exceptions, so it quietly wraps around if memory 150buffer reaches the end. 151 152*Currently the event monitored for trace-mode is fixed as cycle.* 153 154Trace IMC example usage 155======================= 156 157.. code-block:: sh 158 159 # perf list 160 [....] 161 trace_imc/trace_cycles/ [Kernel PMU event] 162 163To record an application/process with trace-imc event: 164 165.. code-block:: sh 166 167 # perf record -e trace_imc/trace_cycles/ yes > /dev/null 168 [ perf record: Woken up 1 times to write data ] 169 [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] 170 171The `perf.data` generated, can be read using perf report. 172 173Benefits of using IMC trace-mode 174================================ 175 176PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC 177trace mode snapshots the program counter and updates to the memory. And this 178also provide a way for the operating system to do instruction sampling in real 179time without PMI processing overhead. 180 181Performance data using `perf top` with and without trace-imc event. 182 183PMI interrupts count when `perf top` command is executed without trace-imc event. 184 185.. code-block:: sh 186 187 # grep PMI /proc/interrupts 188 PMI: 0 0 0 0 Performance monitoring interrupts 189 # ./perf top 190 ... 191 # grep PMI /proc/interrupts 192 PMI: 39735 8710 17338 17801 Performance monitoring interrupts 193 # ./perf top -e trace_imc/trace_cycles/ 194 ... 195 # grep PMI /proc/interrupts 196 PMI: 39735 8710 17338 17801 Performance monitoring interrupts 197 198 199That is, the PMI interrupt counts do not increment when using the `trace_imc` event. 200