xref: /linux/tools/perf/Documentation/examples.txt (revision 2932cffc89e9a1476b28a59896fa4f81e0d4f131)
1*2932cffcSCarlos R. Mafra
2*2932cffcSCarlos R. Mafra		------------------------------
3*2932cffcSCarlos R. Mafra		****** perf by examples ******
4*2932cffcSCarlos R. Mafra		------------------------------
5*2932cffcSCarlos R. Mafra
6*2932cffcSCarlos R. Mafra[ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ]
7*2932cffcSCarlos R. Mafra
8*2932cffcSCarlos R. Mafra
9*2932cffcSCarlos R. MafraFirst, discovery/enumeration of available counters can be done via
10*2932cffcSCarlos R. Mafra'perf list':
11*2932cffcSCarlos R. Mafra
12*2932cffcSCarlos R. Mafratitan:~> perf list
13*2932cffcSCarlos R. Mafra  [...]
14*2932cffcSCarlos R. Mafra  kmem:kmalloc                             [Tracepoint event]
15*2932cffcSCarlos R. Mafra  kmem:kmem_cache_alloc                    [Tracepoint event]
16*2932cffcSCarlos R. Mafra  kmem:kmalloc_node                        [Tracepoint event]
17*2932cffcSCarlos R. Mafra  kmem:kmem_cache_alloc_node               [Tracepoint event]
18*2932cffcSCarlos R. Mafra  kmem:kfree                               [Tracepoint event]
19*2932cffcSCarlos R. Mafra  kmem:kmem_cache_free                     [Tracepoint event]
20*2932cffcSCarlos R. Mafra  kmem:mm_page_free_direct                 [Tracepoint event]
21*2932cffcSCarlos R. Mafra  kmem:mm_pagevec_free                     [Tracepoint event]
22*2932cffcSCarlos R. Mafra  kmem:mm_page_alloc                       [Tracepoint event]
23*2932cffcSCarlos R. Mafra  kmem:mm_page_alloc_zone_locked           [Tracepoint event]
24*2932cffcSCarlos R. Mafra  kmem:mm_page_pcpu_drain                  [Tracepoint event]
25*2932cffcSCarlos R. Mafra  kmem:mm_page_alloc_extfrag               [Tracepoint event]
26*2932cffcSCarlos R. Mafra
27*2932cffcSCarlos R. MafraThen any (or all) of the above event sources can be activated and
28*2932cffcSCarlos R. Maframeasured. For example the page alloc/free properties of a 'hackbench
29*2932cffcSCarlos R. Mafrarun' are:
30*2932cffcSCarlos R. Mafra
31*2932cffcSCarlos R. Mafra titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc
32*2932cffcSCarlos R. Mafra -e kmem:mm_pagevec_free -e kmem:mm_page_free_direct ./hackbench 10
33*2932cffcSCarlos R. Mafra Time: 0.575
34*2932cffcSCarlos R. Mafra
35*2932cffcSCarlos R. Mafra Performance counter stats for './hackbench 10':
36*2932cffcSCarlos R. Mafra
37*2932cffcSCarlos R. Mafra          13857  kmem:mm_page_pcpu_drain
38*2932cffcSCarlos R. Mafra          27576  kmem:mm_page_alloc
39*2932cffcSCarlos R. Mafra           6025  kmem:mm_pagevec_free
40*2932cffcSCarlos R. Mafra          20934  kmem:mm_page_free_direct
41*2932cffcSCarlos R. Mafra
42*2932cffcSCarlos R. Mafra    0.613972165  seconds time elapsed
43*2932cffcSCarlos R. Mafra
44*2932cffcSCarlos R. MafraYou can observe the statistical properties as well, by using the
45*2932cffcSCarlos R. Mafra'repeat the workload N times' feature of perf stat:
46*2932cffcSCarlos R. Mafra
47*2932cffcSCarlos R. Mafra titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e
48*2932cffcSCarlos R. Mafra   kmem:mm_page_alloc -e kmem:mm_pagevec_free -e
49*2932cffcSCarlos R. Mafra   kmem:mm_page_free_direct ./hackbench 10
50*2932cffcSCarlos R. Mafra Time: 0.627
51*2932cffcSCarlos R. Mafra Time: 0.644
52*2932cffcSCarlos R. Mafra Time: 0.564
53*2932cffcSCarlos R. Mafra Time: 0.559
54*2932cffcSCarlos R. Mafra Time: 0.626
55*2932cffcSCarlos R. Mafra
56*2932cffcSCarlos R. Mafra Performance counter stats for './hackbench 10' (5 runs):
57*2932cffcSCarlos R. Mafra
58*2932cffcSCarlos R. Mafra          12920  kmem:mm_page_pcpu_drain    ( +-   3.359% )
59*2932cffcSCarlos R. Mafra          25035  kmem:mm_page_alloc         ( +-   3.783% )
60*2932cffcSCarlos R. Mafra           6104  kmem:mm_pagevec_free       ( +-   0.934% )
61*2932cffcSCarlos R. Mafra          18376  kmem:mm_page_free_direct   ( +-   4.941% )
62*2932cffcSCarlos R. Mafra
63*2932cffcSCarlos R. Mafra    0.643954516  seconds time elapsed   ( +-   2.363% )
64*2932cffcSCarlos R. Mafra
65*2932cffcSCarlos R. MafraFurthermore, these tracepoints can be used to sample the workload as
66*2932cffcSCarlos R. Mafrawell. For example the page allocations done by a 'git gc' can be
67*2932cffcSCarlos R. Mafracaptured the following way:
68*2932cffcSCarlos R. Mafra
69*2932cffcSCarlos R. Mafra titan:~/git> perf record -f -e kmem:mm_page_alloc -c 1 ./git gc
70*2932cffcSCarlos R. Mafra Counting objects: 1148, done.
71*2932cffcSCarlos R. Mafra Delta compression using up to 2 threads.
72*2932cffcSCarlos R. Mafra Compressing objects: 100% (450/450), done.
73*2932cffcSCarlos R. Mafra Writing objects: 100% (1148/1148), done.
74*2932cffcSCarlos R. Mafra Total 1148 (delta 690), reused 1148 (delta 690)
75*2932cffcSCarlos R. Mafra [ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ]
76*2932cffcSCarlos R. Mafra
77*2932cffcSCarlos R. MafraTo check which functions generated page allocations:
78*2932cffcSCarlos R. Mafra
79*2932cffcSCarlos R. Mafra titan:~/git> perf report
80*2932cffcSCarlos R. Mafra # Samples: 10646
81*2932cffcSCarlos R. Mafra #
82*2932cffcSCarlos R. Mafra # Overhead          Command               Shared Object
83*2932cffcSCarlos R. Mafra # ........  ...............  ..........................
84*2932cffcSCarlos R. Mafra #
85*2932cffcSCarlos R. Mafra    23.57%       git-repack  /lib64/libc-2.5.so
86*2932cffcSCarlos R. Mafra    21.81%              git  /lib64/libc-2.5.so
87*2932cffcSCarlos R. Mafra    14.59%              git  ./git
88*2932cffcSCarlos R. Mafra    11.79%       git-repack  ./git
89*2932cffcSCarlos R. Mafra     7.12%              git  /lib64/ld-2.5.so
90*2932cffcSCarlos R. Mafra     3.16%       git-repack  /lib64/libpthread-2.5.so
91*2932cffcSCarlos R. Mafra     2.09%       git-repack  /bin/bash
92*2932cffcSCarlos R. Mafra     1.97%               rm  /lib64/libc-2.5.so
93*2932cffcSCarlos R. Mafra     1.39%               mv  /lib64/ld-2.5.so
94*2932cffcSCarlos R. Mafra     1.37%               mv  /lib64/libc-2.5.so
95*2932cffcSCarlos R. Mafra     1.12%       git-repack  /lib64/ld-2.5.so
96*2932cffcSCarlos R. Mafra     0.95%               rm  /lib64/ld-2.5.so
97*2932cffcSCarlos R. Mafra     0.90%  git-update-serv  /lib64/libc-2.5.so
98*2932cffcSCarlos R. Mafra     0.73%  git-update-serv  /lib64/ld-2.5.so
99*2932cffcSCarlos R. Mafra     0.68%             perf  /lib64/libpthread-2.5.so
100*2932cffcSCarlos R. Mafra     0.64%       git-repack  /usr/lib64/libz.so.1.2.3
101*2932cffcSCarlos R. Mafra
102*2932cffcSCarlos R. MafraOr to see it on a more finegrained level:
103*2932cffcSCarlos R. Mafra
104*2932cffcSCarlos R. Mafratitan:~/git> perf report --sort comm,dso,symbol
105*2932cffcSCarlos R. Mafra# Samples: 10646
106*2932cffcSCarlos R. Mafra#
107*2932cffcSCarlos R. Mafra# Overhead          Command               Shared Object  Symbol
108*2932cffcSCarlos R. Mafra# ........  ...............  ..........................  ......
109*2932cffcSCarlos R. Mafra#
110*2932cffcSCarlos R. Mafra     9.35%       git-repack  ./git                       [.] insert_obj_hash
111*2932cffcSCarlos R. Mafra     9.12%              git  ./git                       [.] insert_obj_hash
112*2932cffcSCarlos R. Mafra     7.31%              git  /lib64/libc-2.5.so          [.] memcpy
113*2932cffcSCarlos R. Mafra     6.34%       git-repack  /lib64/libc-2.5.so          [.] _int_malloc
114*2932cffcSCarlos R. Mafra     6.24%       git-repack  /lib64/libc-2.5.so          [.] memcpy
115*2932cffcSCarlos R. Mafra     5.82%       git-repack  /lib64/libc-2.5.so          [.] __GI___fork
116*2932cffcSCarlos R. Mafra     5.47%              git  /lib64/libc-2.5.so          [.] _int_malloc
117*2932cffcSCarlos R. Mafra     2.99%              git  /lib64/libc-2.5.so          [.] memset
118*2932cffcSCarlos R. Mafra
119*2932cffcSCarlos R. MafraFurthermore, call-graph sampling can be done too, of page
120*2932cffcSCarlos R. Mafraallocations - to see precisely what kind of page allocations there
121*2932cffcSCarlos R. Mafraare:
122*2932cffcSCarlos R. Mafra
123*2932cffcSCarlos R. Mafra titan:~/git> perf record -f -g -e kmem:mm_page_alloc -c 1 ./git gc
124*2932cffcSCarlos R. Mafra Counting objects: 1148, done.
125*2932cffcSCarlos R. Mafra Delta compression using up to 2 threads.
126*2932cffcSCarlos R. Mafra Compressing objects: 100% (450/450), done.
127*2932cffcSCarlos R. Mafra Writing objects: 100% (1148/1148), done.
128*2932cffcSCarlos R. Mafra Total 1148 (delta 690), reused 1148 (delta 690)
129*2932cffcSCarlos R. Mafra [ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ]
130*2932cffcSCarlos R. Mafra
131*2932cffcSCarlos R. Mafra titan:~/git> perf report -g
132*2932cffcSCarlos R. Mafra # Samples: 10686
133*2932cffcSCarlos R. Mafra #
134*2932cffcSCarlos R. Mafra # Overhead          Command               Shared Object
135*2932cffcSCarlos R. Mafra # ........  ...............  ..........................
136*2932cffcSCarlos R. Mafra #
137*2932cffcSCarlos R. Mafra    23.25%       git-repack  /lib64/libc-2.5.so
138*2932cffcSCarlos R. Mafra                |
139*2932cffcSCarlos R. Mafra                |--50.00%-- _int_free
140*2932cffcSCarlos R. Mafra                |
141*2932cffcSCarlos R. Mafra                |--37.50%-- __GI___fork
142*2932cffcSCarlos R. Mafra                |          make_child
143*2932cffcSCarlos R. Mafra                |
144*2932cffcSCarlos R. Mafra                |--12.50%-- ptmalloc_unlock_all2
145*2932cffcSCarlos R. Mafra                |          make_child
146*2932cffcSCarlos R. Mafra                |
147*2932cffcSCarlos R. Mafra                 --6.25%-- __GI_strcpy
148*2932cffcSCarlos R. Mafra    21.61%              git  /lib64/libc-2.5.so
149*2932cffcSCarlos R. Mafra                |
150*2932cffcSCarlos R. Mafra                |--30.00%-- __GI_read
151*2932cffcSCarlos R. Mafra                |          |
152*2932cffcSCarlos R. Mafra                |           --83.33%-- git_config_from_file
153*2932cffcSCarlos R. Mafra                |                     git_config
154*2932cffcSCarlos R. Mafra                |                     |
155*2932cffcSCarlos R. Mafra   [...]
156*2932cffcSCarlos R. Mafra
157*2932cffcSCarlos R. MafraOr you can observe the whole system's page allocations for 10
158*2932cffcSCarlos R. Mafraseconds:
159*2932cffcSCarlos R. Mafra
160*2932cffcSCarlos R. Mafratitan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e
161*2932cffcSCarlos R. Mafrakmem:mm_page_alloc -e kmem:mm_pagevec_free -e
162*2932cffcSCarlos R. Mafrakmem:mm_page_free_direct sleep 10
163*2932cffcSCarlos R. Mafra
164*2932cffcSCarlos R. Mafra Performance counter stats for 'sleep 10':
165*2932cffcSCarlos R. Mafra
166*2932cffcSCarlos R. Mafra         171585  kmem:mm_page_pcpu_drain
167*2932cffcSCarlos R. Mafra         322114  kmem:mm_page_alloc
168*2932cffcSCarlos R. Mafra          73623  kmem:mm_pagevec_free
169*2932cffcSCarlos R. Mafra         254115  kmem:mm_page_free_direct
170*2932cffcSCarlos R. Mafra
171*2932cffcSCarlos R. Mafra   10.000591410  seconds time elapsed
172*2932cffcSCarlos R. Mafra
173*2932cffcSCarlos R. MafraOr observe how fluctuating the page allocations are, via statistical
174*2932cffcSCarlos R. Mafraanalysis done over ten 1-second intervals:
175*2932cffcSCarlos R. Mafra
176*2932cffcSCarlos R. Mafra titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e
177*2932cffcSCarlos R. Mafra   kmem:mm_page_alloc -e kmem:mm_pagevec_free -e
178*2932cffcSCarlos R. Mafra   kmem:mm_page_free_direct sleep 1
179*2932cffcSCarlos R. Mafra
180*2932cffcSCarlos R. Mafra Performance counter stats for 'sleep 1' (10 runs):
181*2932cffcSCarlos R. Mafra
182*2932cffcSCarlos R. Mafra          17254  kmem:mm_page_pcpu_drain    ( +-   3.709% )
183*2932cffcSCarlos R. Mafra          34394  kmem:mm_page_alloc         ( +-   4.617% )
184*2932cffcSCarlos R. Mafra           7509  kmem:mm_pagevec_free       ( +-   4.820% )
185*2932cffcSCarlos R. Mafra          25653  kmem:mm_page_free_direct   ( +-   3.672% )
186*2932cffcSCarlos R. Mafra
187*2932cffcSCarlos R. Mafra    1.058135029  seconds time elapsed   ( +-   3.089% )
188*2932cffcSCarlos R. Mafra
189*2932cffcSCarlos R. MafraOr you can annotate the recorded 'git gc' run on a per symbol basis
190*2932cffcSCarlos R. Mafraand check which instructions/source-code generated page allocations:
191*2932cffcSCarlos R. Mafra
192*2932cffcSCarlos R. Mafra titan:~/git> perf annotate __GI___fork
193*2932cffcSCarlos R. Mafra ------------------------------------------------
194*2932cffcSCarlos R. Mafra  Percent |      Source code & Disassembly of libc-2.5.so
195*2932cffcSCarlos R. Mafra ------------------------------------------------
196*2932cffcSCarlos R. Mafra          :
197*2932cffcSCarlos R. Mafra          :
198*2932cffcSCarlos R. Mafra          :      Disassembly of section .plt:
199*2932cffcSCarlos R. Mafra          :      Disassembly of section .text:
200*2932cffcSCarlos R. Mafra          :
201*2932cffcSCarlos R. Mafra          :      00000031a2e95560 <__fork>:
202*2932cffcSCarlos R. Mafra [...]
203*2932cffcSCarlos R. Mafra     0.00 :        31a2e95602:   b8 38 00 00 00          mov    $0x38,%eax
204*2932cffcSCarlos R. Mafra     0.00 :        31a2e95607:   0f 05                   syscall
205*2932cffcSCarlos R. Mafra    83.42 :        31a2e95609:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
206*2932cffcSCarlos R. Mafra     0.00 :        31a2e9560f:   0f 87 4d 01 00 00       ja     31a2e95762 <__fork+0x202>
207*2932cffcSCarlos R. Mafra     0.00 :        31a2e95615:   85 c0                   test   %eax,%eax
208*2932cffcSCarlos R. Mafra
209*2932cffcSCarlos R. Mafra( this shows that 83.42% of __GI___fork's page allocations come from
210*2932cffcSCarlos R. Mafra  the 0x38 system call it performs. )
211*2932cffcSCarlos R. Mafra
212*2932cffcSCarlos R. Mafraetc. etc. - a lot more is possible. I could list a dozen of
213*2932cffcSCarlos R. Mafraother different usecases straight away - neither of which is
214*2932cffcSCarlos R. Mafrapossible via /proc/vmstat.
215*2932cffcSCarlos R. Mafra
216*2932cffcSCarlos R. Mafra/proc/vmstat is not in the same league really, in terms of
217*2932cffcSCarlos R. Mafraexpressive power of system analysis and performance
218*2932cffcSCarlos R. Mafraanalysis.
219*2932cffcSCarlos R. Mafra
220*2932cffcSCarlos R. MafraAll that the above results needed were those new tracepoints
221*2932cffcSCarlos R. Mafrain include/tracing/events/kmem.h.
222*2932cffcSCarlos R. Mafra
223*2932cffcSCarlos R. Mafra	Ingo
224*2932cffcSCarlos R. Mafra
225*2932cffcSCarlos R. Mafra
226