xref: /linux/tools/perf/Documentation/examples.txt (revision d0034a7a4ac7fae708146ac0059b9c47a1543f0d)
12932cffcSCarlos R. Mafra
22932cffcSCarlos R. Mafra		------------------------------
32932cffcSCarlos R. Mafra		****** perf by examples ******
42932cffcSCarlos R. Mafra		------------------------------
52932cffcSCarlos R. Mafra
6*6edfd0ebSKees Cook[ From an e-mail by Ingo Molnar, https://lore.kernel.org/lkml/20090804195717.GA5998@elte.hu ]
72932cffcSCarlos R. Mafra
82932cffcSCarlos R. Mafra
92932cffcSCarlos R. MafraFirst, discovery/enumeration of available counters can be done via
102932cffcSCarlos R. Mafra'perf list':
112932cffcSCarlos R. Mafra
122932cffcSCarlos R. Mafratitan:~> perf list
132932cffcSCarlos R. Mafra  [...]
142932cffcSCarlos R. Mafra  kmem:kmalloc                             [Tracepoint event]
152932cffcSCarlos R. Mafra  kmem:kmem_cache_alloc                    [Tracepoint event]
162932cffcSCarlos R. Mafra  kmem:kmalloc_node                        [Tracepoint event]
172932cffcSCarlos R. Mafra  kmem:kmem_cache_alloc_node               [Tracepoint event]
182932cffcSCarlos R. Mafra  kmem:kfree                               [Tracepoint event]
192932cffcSCarlos R. Mafra  kmem:kmem_cache_free                     [Tracepoint event]
2090a5d5afSKonstantin Khlebnikov  kmem:mm_page_free                        [Tracepoint event]
2190a5d5afSKonstantin Khlebnikov  kmem:mm_page_free_batched                [Tracepoint event]
222932cffcSCarlos R. Mafra  kmem:mm_page_alloc                       [Tracepoint event]
232932cffcSCarlos R. Mafra  kmem:mm_page_alloc_zone_locked           [Tracepoint event]
242932cffcSCarlos R. Mafra  kmem:mm_page_pcpu_drain                  [Tracepoint event]
252932cffcSCarlos R. Mafra  kmem:mm_page_alloc_extfrag               [Tracepoint event]
262932cffcSCarlos R. Mafra
272932cffcSCarlos R. MafraThen any (or all) of the above event sources can be activated and
282932cffcSCarlos R. Maframeasured. For example the page alloc/free properties of a 'hackbench
292932cffcSCarlos R. Mafrarun' are:
302932cffcSCarlos R. Mafra
312932cffcSCarlos R. Mafra titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc
3290a5d5afSKonstantin Khlebnikov -e kmem:mm_page_free_batched -e kmem:mm_page_free ./hackbench 10
332932cffcSCarlos R. Mafra Time: 0.575
342932cffcSCarlos R. Mafra
352932cffcSCarlos R. Mafra Performance counter stats for './hackbench 10':
362932cffcSCarlos R. Mafra
372932cffcSCarlos R. Mafra          13857  kmem:mm_page_pcpu_drain
382932cffcSCarlos R. Mafra          27576  kmem:mm_page_alloc
3990a5d5afSKonstantin Khlebnikov           6025  kmem:mm_page_free_batched
4090a5d5afSKonstantin Khlebnikov          20934  kmem:mm_page_free
412932cffcSCarlos R. Mafra
422932cffcSCarlos R. Mafra    0.613972165  seconds time elapsed
432932cffcSCarlos R. Mafra
442932cffcSCarlos R. MafraYou can observe the statistical properties as well, by using the
452932cffcSCarlos R. Mafra'repeat the workload N times' feature of perf stat:
462932cffcSCarlos R. Mafra
472932cffcSCarlos R. Mafra titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e
4890a5d5afSKonstantin Khlebnikov   kmem:mm_page_alloc -e kmem:mm_page_free_batched -e
4990a5d5afSKonstantin Khlebnikov   kmem:mm_page_free ./hackbench 10
502932cffcSCarlos R. Mafra Time: 0.627
512932cffcSCarlos R. Mafra Time: 0.644
522932cffcSCarlos R. Mafra Time: 0.564
532932cffcSCarlos R. Mafra Time: 0.559
542932cffcSCarlos R. Mafra Time: 0.626
552932cffcSCarlos R. Mafra
562932cffcSCarlos R. Mafra Performance counter stats for './hackbench 10' (5 runs):
572932cffcSCarlos R. Mafra
582932cffcSCarlos R. Mafra          12920  kmem:mm_page_pcpu_drain    ( +-   3.359% )
592932cffcSCarlos R. Mafra          25035  kmem:mm_page_alloc         ( +-   3.783% )
6090a5d5afSKonstantin Khlebnikov           6104  kmem:mm_page_free_batched  ( +-   0.934% )
6190a5d5afSKonstantin Khlebnikov          18376  kmem:mm_page_free	    ( +-   4.941% )
622932cffcSCarlos R. Mafra
632932cffcSCarlos R. Mafra    0.643954516  seconds time elapsed   ( +-   2.363% )
642932cffcSCarlos R. Mafra
652932cffcSCarlos R. MafraFurthermore, these tracepoints can be used to sample the workload as
662932cffcSCarlos R. Mafrawell. For example the page allocations done by a 'git gc' can be
672932cffcSCarlos R. Mafracaptured the following way:
682932cffcSCarlos R. Mafra
694a4d371aSJiri Olsa titan:~/git> perf record -e kmem:mm_page_alloc -c 1 ./git gc
702932cffcSCarlos R. Mafra Counting objects: 1148, done.
712932cffcSCarlos R. Mafra Delta compression using up to 2 threads.
722932cffcSCarlos R. Mafra Compressing objects: 100% (450/450), done.
732932cffcSCarlos R. Mafra Writing objects: 100% (1148/1148), done.
742932cffcSCarlos R. Mafra Total 1148 (delta 690), reused 1148 (delta 690)
752932cffcSCarlos R. Mafra [ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ]
762932cffcSCarlos R. Mafra
772932cffcSCarlos R. MafraTo check which functions generated page allocations:
782932cffcSCarlos R. Mafra
792932cffcSCarlos R. Mafra titan:~/git> perf report
802932cffcSCarlos R. Mafra # Samples: 10646
812932cffcSCarlos R. Mafra #
822932cffcSCarlos R. Mafra # Overhead          Command               Shared Object
832932cffcSCarlos R. Mafra # ........  ...............  ..........................
842932cffcSCarlos R. Mafra #
852932cffcSCarlos R. Mafra    23.57%       git-repack  /lib64/libc-2.5.so
862932cffcSCarlos R. Mafra    21.81%              git  /lib64/libc-2.5.so
872932cffcSCarlos R. Mafra    14.59%              git  ./git
882932cffcSCarlos R. Mafra    11.79%       git-repack  ./git
892932cffcSCarlos R. Mafra     7.12%              git  /lib64/ld-2.5.so
902932cffcSCarlos R. Mafra     3.16%       git-repack  /lib64/libpthread-2.5.so
912932cffcSCarlos R. Mafra     2.09%       git-repack  /bin/bash
922932cffcSCarlos R. Mafra     1.97%               rm  /lib64/libc-2.5.so
932932cffcSCarlos R. Mafra     1.39%               mv  /lib64/ld-2.5.so
942932cffcSCarlos R. Mafra     1.37%               mv  /lib64/libc-2.5.so
952932cffcSCarlos R. Mafra     1.12%       git-repack  /lib64/ld-2.5.so
962932cffcSCarlos R. Mafra     0.95%               rm  /lib64/ld-2.5.so
972932cffcSCarlos R. Mafra     0.90%  git-update-serv  /lib64/libc-2.5.so
982932cffcSCarlos R. Mafra     0.73%  git-update-serv  /lib64/ld-2.5.so
992932cffcSCarlos R. Mafra     0.68%             perf  /lib64/libpthread-2.5.so
1002932cffcSCarlos R. Mafra     0.64%       git-repack  /usr/lib64/libz.so.1.2.3
1012932cffcSCarlos R. Mafra
1022932cffcSCarlos R. MafraOr to see it on a more finegrained level:
1032932cffcSCarlos R. Mafra
1042932cffcSCarlos R. Mafratitan:~/git> perf report --sort comm,dso,symbol
1052932cffcSCarlos R. Mafra# Samples: 10646
1062932cffcSCarlos R. Mafra#
1072932cffcSCarlos R. Mafra# Overhead          Command               Shared Object  Symbol
1082932cffcSCarlos R. Mafra# ........  ...............  ..........................  ......
1092932cffcSCarlos R. Mafra#
1102932cffcSCarlos R. Mafra     9.35%       git-repack  ./git                       [.] insert_obj_hash
1112932cffcSCarlos R. Mafra     9.12%              git  ./git                       [.] insert_obj_hash
1122932cffcSCarlos R. Mafra     7.31%              git  /lib64/libc-2.5.so          [.] memcpy
1132932cffcSCarlos R. Mafra     6.34%       git-repack  /lib64/libc-2.5.so          [.] _int_malloc
1142932cffcSCarlos R. Mafra     6.24%       git-repack  /lib64/libc-2.5.so          [.] memcpy
1152932cffcSCarlos R. Mafra     5.82%       git-repack  /lib64/libc-2.5.so          [.] __GI___fork
1162932cffcSCarlos R. Mafra     5.47%              git  /lib64/libc-2.5.so          [.] _int_malloc
1172932cffcSCarlos R. Mafra     2.99%              git  /lib64/libc-2.5.so          [.] memset
1182932cffcSCarlos R. Mafra
1192932cffcSCarlos R. MafraFurthermore, call-graph sampling can be done too, of page
1202932cffcSCarlos R. Mafraallocations - to see precisely what kind of page allocations there
1212932cffcSCarlos R. Mafraare:
1222932cffcSCarlos R. Mafra
1234a4d371aSJiri Olsa titan:~/git> perf record -g -e kmem:mm_page_alloc -c 1 ./git gc
1242932cffcSCarlos R. Mafra Counting objects: 1148, done.
1252932cffcSCarlos R. Mafra Delta compression using up to 2 threads.
1262932cffcSCarlos R. Mafra Compressing objects: 100% (450/450), done.
1272932cffcSCarlos R. Mafra Writing objects: 100% (1148/1148), done.
1282932cffcSCarlos R. Mafra Total 1148 (delta 690), reused 1148 (delta 690)
1292932cffcSCarlos R. Mafra [ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ]
1302932cffcSCarlos R. Mafra
1312932cffcSCarlos R. Mafra titan:~/git> perf report -g
1322932cffcSCarlos R. Mafra # Samples: 10686
1332932cffcSCarlos R. Mafra #
1342932cffcSCarlos R. Mafra # Overhead          Command               Shared Object
1352932cffcSCarlos R. Mafra # ........  ...............  ..........................
1362932cffcSCarlos R. Mafra #
1372932cffcSCarlos R. Mafra    23.25%       git-repack  /lib64/libc-2.5.so
1382932cffcSCarlos R. Mafra                |
1392932cffcSCarlos R. Mafra                |--50.00%-- _int_free
1402932cffcSCarlos R. Mafra                |
1412932cffcSCarlos R. Mafra                |--37.50%-- __GI___fork
1422932cffcSCarlos R. Mafra                |          make_child
1432932cffcSCarlos R. Mafra                |
1442932cffcSCarlos R. Mafra                |--12.50%-- ptmalloc_unlock_all2
1452932cffcSCarlos R. Mafra                |          make_child
1462932cffcSCarlos R. Mafra                |
1472932cffcSCarlos R. Mafra                 --6.25%-- __GI_strcpy
1482932cffcSCarlos R. Mafra    21.61%              git  /lib64/libc-2.5.so
1492932cffcSCarlos R. Mafra                |
1502932cffcSCarlos R. Mafra                |--30.00%-- __GI_read
1512932cffcSCarlos R. Mafra                |          |
1522932cffcSCarlos R. Mafra                |           --83.33%-- git_config_from_file
1532932cffcSCarlos R. Mafra                |                     git_config
1542932cffcSCarlos R. Mafra                |                     |
1552932cffcSCarlos R. Mafra   [...]
1562932cffcSCarlos R. Mafra
1572932cffcSCarlos R. MafraOr you can observe the whole system's page allocations for 10
1582932cffcSCarlos R. Mafraseconds:
1592932cffcSCarlos R. Mafra
1602932cffcSCarlos R. Mafratitan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e
16190a5d5afSKonstantin Khlebnikovkmem:mm_page_alloc -e kmem:mm_page_free_batched -e
16290a5d5afSKonstantin Khlebnikovkmem:mm_page_free sleep 10
1632932cffcSCarlos R. Mafra
1642932cffcSCarlos R. Mafra Performance counter stats for 'sleep 10':
1652932cffcSCarlos R. Mafra
1662932cffcSCarlos R. Mafra         171585  kmem:mm_page_pcpu_drain
1672932cffcSCarlos R. Mafra         322114  kmem:mm_page_alloc
16890a5d5afSKonstantin Khlebnikov          73623  kmem:mm_page_free_batched
16990a5d5afSKonstantin Khlebnikov         254115  kmem:mm_page_free
1702932cffcSCarlos R. Mafra
1712932cffcSCarlos R. Mafra   10.000591410  seconds time elapsed
1722932cffcSCarlos R. Mafra
1732932cffcSCarlos R. MafraOr observe how fluctuating the page allocations are, via statistical
1742932cffcSCarlos R. Mafraanalysis done over ten 1-second intervals:
1752932cffcSCarlos R. Mafra
1762932cffcSCarlos R. Mafra titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e
17790a5d5afSKonstantin Khlebnikov   kmem:mm_page_alloc -e kmem:mm_page_free_batched -e
17890a5d5afSKonstantin Khlebnikov   kmem:mm_page_free sleep 1
1792932cffcSCarlos R. Mafra
1802932cffcSCarlos R. Mafra Performance counter stats for 'sleep 1' (10 runs):
1812932cffcSCarlos R. Mafra
1822932cffcSCarlos R. Mafra          17254  kmem:mm_page_pcpu_drain    ( +-   3.709% )
1832932cffcSCarlos R. Mafra          34394  kmem:mm_page_alloc         ( +-   4.617% )
18490a5d5afSKonstantin Khlebnikov           7509  kmem:mm_page_free_batched  ( +-   4.820% )
18590a5d5afSKonstantin Khlebnikov          25653  kmem:mm_page_free	    ( +-   3.672% )
1862932cffcSCarlos R. Mafra
1872932cffcSCarlos R. Mafra    1.058135029  seconds time elapsed   ( +-   3.089% )
1882932cffcSCarlos R. Mafra
1892932cffcSCarlos R. MafraOr you can annotate the recorded 'git gc' run on a per symbol basis
1902932cffcSCarlos R. Mafraand check which instructions/source-code generated page allocations:
1912932cffcSCarlos R. Mafra
1922932cffcSCarlos R. Mafra titan:~/git> perf annotate __GI___fork
1932932cffcSCarlos R. Mafra ------------------------------------------------
1942932cffcSCarlos R. Mafra  Percent |      Source code & Disassembly of libc-2.5.so
1952932cffcSCarlos R. Mafra ------------------------------------------------
1962932cffcSCarlos R. Mafra          :
1972932cffcSCarlos R. Mafra          :
1982932cffcSCarlos R. Mafra          :      Disassembly of section .plt:
1992932cffcSCarlos R. Mafra          :      Disassembly of section .text:
2002932cffcSCarlos R. Mafra          :
2012932cffcSCarlos R. Mafra          :      00000031a2e95560 <__fork>:
2022932cffcSCarlos R. Mafra [...]
2032932cffcSCarlos R. Mafra     0.00 :        31a2e95602:   b8 38 00 00 00          mov    $0x38,%eax
2042932cffcSCarlos R. Mafra     0.00 :        31a2e95607:   0f 05                   syscall
2052932cffcSCarlos R. Mafra    83.42 :        31a2e95609:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
2062932cffcSCarlos R. Mafra     0.00 :        31a2e9560f:   0f 87 4d 01 00 00       ja     31a2e95762 <__fork+0x202>
2072932cffcSCarlos R. Mafra     0.00 :        31a2e95615:   85 c0                   test   %eax,%eax
2082932cffcSCarlos R. Mafra
2092932cffcSCarlos R. Mafra( this shows that 83.42% of __GI___fork's page allocations come from
2102932cffcSCarlos R. Mafra  the 0x38 system call it performs. )
2112932cffcSCarlos R. Mafra
2122932cffcSCarlos R. Mafraetc. etc. - a lot more is possible. I could list a dozen of
2132932cffcSCarlos R. Mafraother different usecases straight away - neither of which is
2142932cffcSCarlos R. Mafrapossible via /proc/vmstat.
2152932cffcSCarlos R. Mafra
2162932cffcSCarlos R. Mafra/proc/vmstat is not in the same league really, in terms of
2172932cffcSCarlos R. Mafraexpressive power of system analysis and performance
2182932cffcSCarlos R. Mafraanalysis.
2192932cffcSCarlos R. Mafra
2202932cffcSCarlos R. MafraAll that the above results needed were those new tracepoints
2212932cffcSCarlos R. Mafrain include/tracing/events/kmem.h.
2222932cffcSCarlos R. Mafra
2232932cffcSCarlos R. Mafra	Ingo
2242932cffcSCarlos R. Mafra
2252932cffcSCarlos R. Mafra
226