xref: /linux/Documentation/admin-guide/perf/hisi-pmu.rst (revision 55a42f78ffd386e01a5404419f8c5ded7db70a21)
1======================================================
2HiSilicon SoC uncore Performance Monitoring Unit (PMU)
3======================================================
4
5The HiSilicon SoC chip includes various independent system device PMUs
6such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
7independent and have hardware logic to gather statistics and performance
8information.
9
10The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
11(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
12called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
13two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
14
15HiSilicon SoC uncore PMU driver
16-------------------------------
17
18Each device PMU has separate registers for event counting, control and
19interrupt, and the PMU driver shall register perf PMU drivers like L3C,
20HHA and DDRC etc. The available events and configuration options shall
21be described in the sysfs, see::
22
23/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>
24
25The "perf list" command shall list the available events from sysfs.
26
27Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
28name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
29where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
30module.
31
32e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
33SCCL ID #3.
34
35e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
36SCCL ID #1.
37
38The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
39ID used to count the uncore PMU event. An "associated_cpus" sysfs attribute is
40also provided to show the CPUs associated with this PMU. The "cpumask" indicates
41the CPUs to open the events, usually as a hint for userspaces tools like perf.
42It only contains one associated CPU from the "associated_cpus".
43
44Example usage of perf::
45
46  $# perf list
47  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
48  ------------------------------------------
49  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
50  ------------------------------------------
51  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
52  ------------------------------------------
53  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
54  ------------------------------------------
55
56  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
57  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
58
59For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
60as PMU v1, but some new functions are added to the hardware.
61
621. L3C PMU supports filtering by core/thread within the cluster which can be
63specified as a bitmap::
64
65  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
66
67This will only count the operations from core/thread 0 and 1 in this cluster.
68
692. Tracetag allow the user to chose to count only read, write or atomic
70operations via the tt_req parameeter in perf. The default value counts all
71operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
72represents write operations, 3'b110 represents atomic store operations and
733'b111 represents atomic non-store operations, other values are reserved::
74
75  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
76
77This will only count the read operations in this cluster.
78
793. Datasrc allows the user to check where the data comes from. It is 5 bits.
80Some important codes are as follows:
81
82- 5'b00001: comes from L3C in this die;
83- 5'b01000: comes from L3C in the cross-die;
84- 5'b01001: comes from L3C which is in another socket;
85- 5'b01110: comes from the local DDR;
86- 5'b01111: comes from the cross-die DDR;
87- 5'b10000: comes from cross-socket DDR;
88
89etc, it is mainly helpful to find that the data source is nearest from the CPU
90cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
91configured in perf command::
92
93  $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
94  hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
95
964. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
97contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
98clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
99SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
100CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
101
102- 5'b00000: I/O_MGMT_ICL;
103- 5'b00001: Network_ICL;
104- 5'b00011: HAC_ICL;
105- 5'b10000: PCIe_ICL;
106
1075. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
108uring channel. It is 2 bits. Some important codes are as follows:
109
110- 2'b11: count the events which sent to the uring_ext (MATA) channel;
111- 2'b01: is the same as 2'b11;
112- 2'b10: count the events which sent to the uring (non-MATA) channel;
113- 2'b00: default value, count the events which sent to both uring and
114  uring_ext channels;
115
1166. ch: NoC PMU supports filtering the event counts of certain transaction
117channel with this option. The current supported channels are as follows:
118
119- 3'b010: Request channel
120- 3'b100: Snoop channel
121- 3'b110: Response channel
122- 3'b111: Data channel
123
1247. tt_en: NoC PMU supports counting only transactions that have tracetag set
125if this option is set. See the 2nd list for more information about tracetag.
126
127For HiSilicon uncore PMU v3 whose identifier is 0x40, some uncore PMUs are
128further divided into parts for finer granularity of tracing, each part has its
129own dedicated PMU, and all such PMUs together cover the monitoring job of events
130on particular uncore device. Such PMUs are described in sysfs with name format
131slightly changed::
132
133/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}_{Z}/ddrc{Y}_{Z}/noc{Y}_{Z}>
134
135Z is the sub-id, indicating different PMUs for part of hardware device.
136
137Usage of most PMUs with different sub-ids are identical. Specially, L3C PMU
138provides ``ext`` option to allow exploration of even finer granual statistics
139of L3C PMU.  L3C PMU driver uses that as hint of termination when delivering
140perf command to hardware:
141
142- ext=0: Default, could be used with event names.
143- ext=1 and ext=2: Must be used with event codes, event names are not supported.
144
145An example of perf command could be::
146
147  $# perf stat -a -e hisi_sccl0_l3c1_0/rd_spipe/ sleep 5
148
149or::
150
151  $# perf stat -a -e hisi_sccl0_l3c1_0/event=0x1,ext=1/ sleep 5
152
153As above, ``hisi_sccl0_l3c1_0`` locates PMU of Super CPU CLuster 0, L3 cache 1
154pipe0.
155
156First command locates the first part of L3C since ``ext=0`` is implied by
157default. Second command issues the counting on another part of L3C with the
158event ``0x1``.
159
160Users could configure IDs to count data come from specific CCL/ICL, by setting
161srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
162tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
163check the bit when matching against the srcid_cmd/tgtid_cmd.
164
165If all of these options are disabled, it can works by the default value that
166doesn't distinguish the filter condition and ID information and will return
167the total counter values in the PMU counters.
168
169The current driver does not support sampling. So "perf record" is unsupported.
170Also attach to a task is unsupported as the events are all uncore.
171
172Note: Please contact the maintainer for a complete list of events supported for
173the PMU devices in the SoC and its information if needed.
174