Lines Matching +full:system +full:- +full:wide
1 .. SPDX-License-Identifier: GPL-2.0
15 2.2.2 Per-thread mode
16 2.2.3 Per-CPU mode
17 2.2.4 System wide mode
19 2.3.1 Producer-consumer model
55 -------------------
63 +---------------------------+
65 +---------------------------+
66 `-> Tail `-> Head
86 read-only mapping, which is to be addressed in the section
92 +---------+---------+ +---------------------------------------+
94 +---------+---------+ +---------------------------------------+
95 ` `----------------^ ^
96 `----------------------------------------------|
103 with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up
114 -------------------------------------------
117 mode, per cpu mode, and system wide mode. This section describes these
131 CPUs in the system and the profiled program's PID on the perf event, and
135 evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
142 than for all threads in the system. The *T1* thread represents the
144 threads in the system. The perf samples are exclusively collected for
151 +----+ +-----------+ +----+
153 +----+--------------+-----------+----------+----+-------->
156 +-----------------------------------------------------+
158 +-----------------------------------------------------+
161 +-----+
163 -----+-----+--------------------------------------------->
166 +-----------------------------------------------------+
168 +-----------------------------------------------------+
171 +----+ +-------+
173 --------------------------+----+--------+-------+-------->
176 +-----------------------------------------------------+
178 +-----------------------------------------------------+
181 +--------------+
183 -----------+--------------+------------------------------>
186 +-----------------------------------------------------+
188 +-----------------------------------------------------+
195 2.2.2 Per-thread mode
198 By specifying option ``--per-thread`` in perf command, e.g.
202 perf record --per-thread test_program
207 evsel::cpus::map[0] = { -1 }
221 +----+ +-----------+ +----+
223 +----+--------------+-----------+----------+----+-------->
226 | +-----+ |
228 --|--+-----+----------------------------------|---------->
231 | | +----+ +---+ |
233 --|-----|-----------------+----+--------+---+-|---------->
236 | | +--------------+ | |
238 --|-----|--+--------------+-|-----------------|---------->
241 +-----------------------------------------------------+
243 +-----------------------------------------------------+
248 Figure 4. Ring buffer for per-thread mode
250 When perf runs in per-thread mode, a ring buffer is allocated for the
256 2.2.3 Per-CPU mode
259 The option ``-C`` is used to collect samples on the list of CPUs, for
260 example the below perf command receives option ``-C 0,2``::
262 perf record -C 0,2 test_program
268 evsel::threads::map[] = { -1 }
275 options for per-thread mode and per-CPU mode, e.g. the options ``–C 0,2`` and
282 +----+ +-----------+ +----+
284 +----+--------------+-----------+----------+----+-------->
287 +-----------------------------------------------------+
289 +-----------------------------------------------------+
292 +-----+
294 -----+-----+--------------------------------------------->
297 +----+ +-------+
299 --------------------------+----+--------+-------+-------->
302 +-----------------------------------------------------+
304 +-----------------------------------------------------+
307 +--------------+
309 -----------+--------------+------------------------------>
314 Figure 5. Ring buffer for per-CPU mode
316 2.2.4 System wide mode
320 for all tasks, we call it as the system wide mode, the command is::
322 perf record -a test_program
324 Similar to the per-CPU mode, the perf event doesn't bind to any PID, and
325 it maps to all CPUs in the system::
327 evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
328 evsel::threads::map[] = { -1 }
331 In the system wide mode, every CPU has its own ring buffer, all threads
338 +----+ +-----------+ +----+
340 +----+--------------+-----------+----------+----+-------->
343 +-----------------------------------------------------+
345 +-----------------------------------------------------+
348 +-----+
350 -----+-----+--------------------------------------------->
353 +-----------------------------------------------------+
355 +-----------------------------------------------------+
358 +----+ +-------+
360 --------------------------+----+--------+-------+-------->
363 +-----------------------------------------------------+
365 +-----------------------------------------------------+
368 +--------------+
370 -----------+--------------+------------------------------>
373 +-----------------------------------------------------+
375 +-----------------------------------------------------+
380 Figure 6. Ring buffer for system wide mode
383 --------------------
388 2.3.1 Producer-consumer model
394 data into the file for post analysis. It’s a typical producer-consumer
408 Polling / `--------------| Ring buffer
409 v v ;---------------------v
410 +----------------+ +---------+---------+ +-------------------+
412 +----------------+ +---------+---------+ +-------------------+
413 ^ ^ `------------------------^
415 +-----------------------------+
417 +-----------------------------+
445 Additionally, the tool can map buffers in either read-write mode or read-only
448 The ring buffer in the read-write mode is mapped with the property
455 Alternatively, in the read-only mode, only the kernel keeps to update
461 combinations to support buffer types: the non-overwrite buffer and the
464 .. list-table::
466 :header-rows: 1
468 * - Mapping mode
469 - Forward
470 - Backward
471 * - read-write
472 - Non-overwrite ring buffer
473 - Not used
474 * - read-only
475 - Not used
476 - Overwritable ring buffer
478 The non-overwrite ring buffer uses the read-write mapping with forward
480 and wrap around when overflow, which is used with the read-write mode in
487 read-only mode. It saves the data from the end of the ring buffer and
542 if (LOAD ->data_tail) { LOAD ->data_head
546 STORE ->data_head STORE ->data_tail
575 Some architectures support one-way permeable barrier with load-acquire
576 and store-release operations, these barriers are more relaxed with less
580 If an architecture doesn’t support load-acquire and store-release in its
593 examine how the AUX ring buffer co-works with the regular ring buffer,
598 ---------------------------------------------------------
619 During the initialisation phase, besides the mmap()-ed regular ring
622 non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages
630 perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
639 ring buffer and the AUX ring buffer per CPU-wise, which is the same as
640 the system wide mode, however, the default mode records samples only for
642 in the system. For per-thread mode, the perf tool allocates only one
644 the per-CPU mode, the perf allocates two kinds of ring buffers for
645 selected CPUs specified by the option ``-C``.
647 The below figure demonstrates the buffers' layout in the system wide
655 +----+ +-----------+ +----+
657 +----+--------------+-----------+----------+----+-------->
660 +-----------------------------------------------------+
662 +-----------------------------------------------------+
665 +-----------------------------------------------------+
667 +-----------------------------------------------------+
670 +-----+
672 -----+-----+--------------------------------------------->
675 +-----------------------------------------------------+
677 +-----------------------------------------------------+
680 +-----------------------------------------------------+
682 +-----------------------------------------------------+
685 +----+ +-------+
687 --------------------------+----+--------+-------+-------->
690 +-----------------------------------------------------+
692 +-----------------------------------------------------+
695 +-----------------------------------------------------+
697 +-----------------------------------------------------+
700 +--------------+
702 -----------+--------------+------------------------------>
705 +-----------------------------------------------------+
707 +-----------------------------------------------------+
710 +-----------------------------------------------------+
712 +-----------------------------------------------------+
717 Figure 8. AUX ring buffer for system wide mode
720 --------------
738 - It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this
742 - Since the hardware trace driver has stored new trace data into the AUX
762 -----------------
769 perf record -e cs_etm/@tmc_etr0/u -S -a program &
772 kill -USR2 $PERFPID
778 - Before a snapshot is taken, the AUX ring buffer acts in free run mode.
782 - Once the perf tool receives the *USR2* signal, it triggers the callback
788 - Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()``
792 - After the snapshot is finished, ``auxtrace_record::snapshot_finish()``
804 As we know, the buffers' deployment can be per-thread mode, per-CPU
805 mode, or system wide mode, and the snapshot can be applied to any of
806 these modes. Below is an example of taking snapshot with system wide
814 +------------------------+
815 | AUX Ring buffer 0 | <- aux_head
816 +------------------------+
818 +--------------------------------+
819 | AUX Ring buffer 1 | <- aux_head
820 +--------------------------------+
822 +--------------------------------------------+
823 | AUX Ring buffer 2 | <- aux_head
824 +--------------------------------------------+
826 +---------------------------------------+
827 | AUX Ring buffer 3 | <- aux_head
828 +---------------------------------------+
830 Figure 9. Snapshot with system wide mode