1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd May 25, 2018 29.Dt PMCSTAT 8 30.Os 31.Sh NAME 32.Nm pmcstat 33.Nd "performance measurement with performance monitoring hardware" 34.Sh SYNOPSIS 35.Nm 36.Op Fl C 37.Op Fl D Ar pathname 38.Op Fl E 39.Op Fl F Ar pathname 40.Op Fl G Ar pathname 41.Op Fl I 42.Op Fl L Ar lwp 43.Op Fl M Ar mapfilename 44.Op Fl N 45.Op Fl O Ar logfilename 46.Op Fl P Ar event-spec 47.Op Fl R Ar logfilename 48.Op Fl S Ar event-spec 49.Op Fl T 50.Op Fl W 51.Op Fl a Ar pathname 52.Op Fl c Ar cpu-spec 53.Op Fl d 54.Op Fl e 55.Op Fl f Ar pluginopt 56.Op Fl g 57.Op Fl k Ar kerneldir 58.Op Fl l Ar secs 59.Op Fl m Ar pathname 60.Op Fl n Ar rate 61.Op Fl o Ar outputfile 62.Op Fl p Ar event-spec 63.Op Fl q 64.Op Fl r Ar fsroot 65.Op Fl s Ar event-spec 66.Op Fl t Ar process-spec 67.Op Fl v 68.Op Fl w Ar secs 69.Op Fl z Ar graphdepth 70.Op Ar command Op Ar args 71.Sh DESCRIPTION 72The 73.Nm 74utility measures system performance using the facilities provided by 75.Xr hwpmc 4 . 76.Pp 77The 78.Nm 79utility can measure both hardware events seen by the system as a 80whole, and those seen when a specified set of processes are executing 81on the system's CPUs. 82If a specific set of processes is being targeted (for example, 83if the 84.Fl t Ar process-spec 85option is specified, or if a command line is specified using 86.Ar command ) , 87then measurement occurs till 88.Ar command 89exits, or till all target processes specified by the 90.Fl t Ar process-spec 91options exit, or till the 92.Nm 93utility is interrupted by the user. 94If a specific set of processes is not targeted for measurement, then 95.Nm 96will perform system-wide measurements till interrupted by the 97user. 98.Pp 99A given invocation of 100.Nm 101can mix allocations of system-mode and process-mode PMCs, of both 102counting and sampling flavors. 103The values of all counting PMCs are printed in human readable form 104at regular intervals by 105.Nm . 106The output of sampling PMCs may be configured to go to a log file for 107subsequent offline analysis, or, at the expense of greater 108overhead, may be configured to be printed in text form on the fly. 109.Pp 110Hardware events to measure are specified to 111.Nm 112using event specifier strings 113.Ar event-spec . 114The syntax of these event specifiers is machine dependent and is 115documented in 116.Xr pmc 3 . 117.Pp 118A process-mode PMC may be configured to be inheritable by the target 119process' current and future children. 120.Sh OPTIONS 121The following options are available: 122.Bl -tag -width indent 123.It Fl C 124Toggle between showing cumulative or incremental counts for 125subsequent counting mode PMCs specified on the command line. 126The default is to show incremental counts. 127.It Fl D Ar pathname 128Create files with per-program samples in the directory named 129by 130.Ar pathname . 131The default is to create these files in the current directory. 132.It Fl E 133Toggle showing per-process counts at the time a tracked process 134exits for subsequent process-mode PMCs specified on the command line. 135This option is useful for mapping the performance characteristics of a 136complex pipeline of processes when used in conjunction with the 137.Fl d 138option. 139The default is to not to enable per-process tracking. 140.It Fl F Ar pathname 141Print calltree (Kcachegrind) information to file 142.Ar pathname . 143If argument 144.Ar pathname 145is a 146.Dq Li - 147this information is sent to the output file specified by the 148.Fl o 149option. 150.It Fl G Ar pathname 151Print callchain information to file 152.Ar pathname . 153If argument 154.Ar pathname 155is a 156.Dq Li - 157this information is sent to the output file specified by the 158.Fl o 159option. 160.It Fl I 161Skip symbol lookup and display address instead. 162.It Fl L Ar lwp 163Filter on thread ID 164.Ar lwp , 165which you can get from 166.Xr ps 1 167.Fl o 168.Li lwp . 169.It Fl M Ar mapfilename 170Write the mapping between executable objects encountered in the event 171log and the abbreviated pathnames used for 172.Xr gprof 1 173profiles to file 174.Ar mapfilename . 175If this option is not specified, mapping information is not written. 176Argument 177.Ar mapfilename 178may be a 179.Dq Li - 180in which case this mapping information is sent to the output 181file configured by the 182.Fl o 183option. 184.It Fl N 185Toggle capturing callchain information for subsequent sampling PMCs. 186The default is for sampling PMCs to capture callchain information. 187.It Fl O Ar logfilename 188Send logging output to file 189.Ar logfilename . 190If 191.Ar logfilename 192is of the form 193.Ar hostname Ns : Ns Ar port , 194where 195.Ar hostname 196does not start with a 197.Ql \&. 198or a 199.Ql / , 200then 201.Nm 202will open a network socket to host 203.Ar hostname 204on port 205.Ar port . 206.Pp 207If the 208.Fl O 209option is not specified and one of the logging options is requested, 210then 211.Nm 212will print a textual form of the logged events to the configured 213output file. 214.It Fl P Ar event-spec 215Allocate a process mode sampling PMC measuring hardware events 216specified in 217.Ar event-spec . 218.It Fl R Ar logfilename 219Perform offline analysis using sampling data in file 220.Ar logfilename . 221.It Fl S Ar event-spec 222Allocate a system mode sampling PMC measuring hardware events 223specified in 224.Ar event-spec . 225.It Fl T 226Use a top like mode for sampling PMCs. The following hotkeys 227can be used: 'c+a' switch to accumulative mode, 'c+d' switch 228to delta mode, 'm' merge PMCs, 'n' change view, 'p' show next 229PMC, ' ' pause, 'q' quit. calltree only: 'f' cost under threshold 230is seen as a dot. 231.It Fl W 232Toggle logging the incremental counts seen by the threads of a 233tracked process each time they are scheduled on a CPU. 234This is an experimental feature intended to help analyse the 235dynamic behaviour of processes in the system. 236It may incur substantial overhead if enabled. 237The default is for this feature to be disabled. 238.It Fl a Ar pathname 239Perform a symbol and file:line lookup for each address in each 240callgraph and save the output to 241.Ar pathname . 242Unlike 243.Fl m 244that only resolves the first symbol in the graph, this resolves 245every node in the callgraph, or prints out addresses if no 246lookup information is available. 247This option requires the 248.Fl R 249option to read in samples that were previously collected and 250saved with the 251.Fl O 252option. 253.It Fl c Ar cpu-spec 254Set the cpus for subsequent system mode PMCs specified on the 255command line to 256.Ar cpu-spec . 257Argument 258.Ar cpu-spec 259is a comma separated list of CPU numbers, or the literal 260.Sq * 261denoting all available CPUs. 262The default is to allocate system mode PMCs on all available 263CPUs. 264.It Fl d 265Toggle between process mode PMCs measuring events for the target 266process' current and future children or only measuring events for 267the target process. 268The default is to measure events for the target process alone. 269(it has to be passed in the command line prior to 270.Fl p , 271.Fl s , 272.Fl P , 273or 274.Fl S ) . 275.It Fl e 276Specify that the gprof profile files will use a wide history counter. 277These files are produced in a format compatible with 278.Xr gprof 1 . 279However, other tools that cannot fully parse a BSD-style 280gmon header might be unable to correctly parse these files. 281.It Fl f Ar pluginopt 282Pass option string to the active plugin. 283.br 284threshold=<float> do not display cost under specified value (Top). 285.br 286skiplink=0|1 replace node with cost under threshold by a dot (Top). 287.It Fl g 288Produce profiles in a format compatible with 289.Xr gprof 1 . 290A separate profile file is generated for each executable object 291encountered. 292Profile files are placed in sub-directories named by their PMC 293event name. 294.It Fl k Ar kerneldir 295Set the pathname of the kernel directory to argument 296.Ar kerneldir . 297This directory specifies where 298.Nm 299should look for the kernel and its modules. 300The default is to use the path of the running kernel obtained from the 301.Va kern.bootfile 302sysctl. 303.It Fl l Ar secs 304Set system-wide performance measurement duration for 305.Ar secs 306seconds. 307The argument 308.Ar secs 309may be a fractional value. 310.It Fl m Ar pathname 311Print the sampled PCs with the name, the start and ending addresses 312of the function within they live. 313The 314.Ar pathname 315argument is mandatory and indicates where the information will be stored. 316If argument 317.Ar pathname 318is a 319.Dq Li - 320this information is sent to the output file specified by the 321.Fl o 322option. 323This option requires the 324.Fl R 325option to read in samples that were previously collected and 326saved with the 327.Fl O 328option. 329.It Fl n Ar rate 330Set the default sampling rate for subsequent sampling mode 331PMCs specified on the command line. 332The default is to configure PMCs to sample the CPU's instruction 333pointer every 65536 events. 334.It Fl o Ar outputfile 335Send counter readings and textual representations of logged data 336to file 337.Ar outputfile . 338The default is to send output to 339.Pa stderr 340when collecting live data and to 341.Pa stdout 342when processing a pre-existing logfile. 343.It Fl p Ar event-spec 344Allocate a process mode counting PMC measuring hardware events 345specified in 346.Ar event-spec . 347.It Fl q 348Decrease verbosity. 349.It Fl r Ar fsroot 350Set the top of the filesystem hierarchy under which executables 351are located to argument 352.Ar fsroot . 353The default is 354.Pa / . 355.It Fl s Ar event-spec 356Allocate a system mode counting PMC measuring hardware events 357specified in 358.Ar event-spec . 359.It Fl t Ar process-spec 360Attach process mode PMCs to the processes named by argument 361.Ar process-spec . 362Argument 363.Ar process-spec 364may be a non-negative integer denoting a specific process id, or a 365regular expression for selecting processes based on their command names. 366.It Fl v 367Increase verbosity. 368.It Fl w Ar secs 369Print the values of all counting mode PMCs or sampling mode PMCs 370for top mode every 371.Ar secs 372seconds. 373The argument 374.Ar secs 375may be a fractional value. 376The default interval is 5 seconds. 377.It Fl z Ar graphdepth 378When printing system-wide callgraphs, limit callgraphs to the depth 379specified by argument 380.Ar graphdepth . 381.El 382.Pp 383If 384.Ar command 385is specified, it is executed using 386.Xr execvp 3 . 387.Sh EXAMPLES 388To perform system-wide statistical sampling on an AMD Athlon CPU with 389samples taken every 32768 instruction retirals and data being sampled 390to file 391.Pa sample.stat , 392use: 393.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 394.Pp 395To execute 396.Nm firefox 397and measure the number of data cache misses suffered 398by it and its children every 12 seconds on an AMD Athlon, use: 399.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 400.Pp 401To measure instructions retired for all processes named 402.Dq emacs 403use: 404.Dl "pmcstat -t '^emacs$' -p instructions" 405.Pp 406To measure instructions retired for processes named 407.Dq emacs 408for a period of 10 seconds use: 409.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 410.Pp 411To count instruction tlb-misses on CPUs 0 and 2 on a Intel 412Pentium Pro/Pentium III SMP system use: 413.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 414.Pp 415To collect profiling information for a specific process with pid 1234 416based on instruction cache misses seen by it use: 417.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 418.Pp 419To perform system-wide sampling on all configured processors 420based on processor instructions retired use: 421.Dl "pmcstat -S instructions -O /tmp/sample.out" 422If callgraph capture is not desired use: 423.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 424.Pp 425To send the generated event log to a remote machine use: 426.Dl "pmcstat -S instructions -O remotehost:port" 427On the remote machine, the sample log can be collected using 428.Xr nc 1 : 429.Dl "nc -l remotehost port > /tmp/sample.out" 430.Pp 431To generate 432.Xr gprof 1 433compatible profiles from a sample file use: 434.Dl "pmcstat -R /tmp/sample.out -g" 435.Pp 436To print a system-wide profile with callgraphs to file 437.Pa "foo.graph" 438use: 439.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 440.Sh DIAGNOSTICS 441If option 442.Fl v 443is specified, 444.Nm 445may issue the following diagnostic messages: 446.Bl -diag 447.It "#callchain/dubious-frames" 448The number of callchain records that had an 449.Dq impossible 450value for a return address. 451.It "#exec handling errors" 452The number of 453.Xr exec 2 454events in the log file that named executables that could not be 455analyzed. 456.It "#exec/elf" 457The number of 458.Xr exec 2 459events that named ELF executables. 460.It "#exec/unknown" 461The number of 462.Xr exec 2 463events that named executables with unrecognized formats. 464.It "#samples/total" 465The total number of samples in the log file. 466.It "#samples/unclaimed" 467The number of samples that could not be correlated to a known 468executable object (i.e., to an executable, shared library, the 469kernel or the runtime loader). 470.It "#samples/unknown-object" 471The number of samples that were associated with an executable 472with an unrecognized object format. 473.El 474.Pp 475.Ex -std 476.Sh COMPATIBILITY 477Due to the limitations of the 478.Pa gmon.out 479file format, 480.Xr gprof 1 481compatible profiles generated by the 482.Fl g 483option do not contain information about calls that cross executable 484boundaries. 485The generated 486.Pa gmon.out 487files are also only meaningful for native executables. 488.Sh SEE ALSO 489.Xr gprof 1 , 490.Xr nc 1 , 491.Xr execvp 3 , 492.Xr pmc 3 , 493.Xr pmclog 3 , 494.Xr hwpmc 4 , 495.Xr pmccontrol 8 , 496.Xr sysctl 8 497.Sh HISTORY 498The 499.Nm 500utility first appeared in 501.Fx 6.0 . 502It is 503.Ud 504.Sh AUTHORS 505.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org 506.Sh BUGS 507The 508.Nm 509utility cannot yet analyse 510.Xr hwpmc 4 511logs generated by non-native architectures. 512