1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd April 29, 2014 29.Dt PMCSTAT 8 30.Os 31.Sh NAME 32.Nm pmcstat 33.Nd "performance measurement with performance monitoring hardware" 34.Sh SYNOPSIS 35.Nm 36.Op Fl C 37.Op Fl D Ar pathname 38.Op Fl E 39.Op Fl F Ar pathname 40.Op Fl G Ar pathname 41.Op Fl M Ar mapfilename 42.Op Fl N 43.Op Fl O Ar logfilename 44.Op Fl P Ar event-spec 45.Op Fl R Ar logfilename 46.Op Fl S Ar event-spec 47.Op Fl T 48.Op Fl W 49.Op Fl a Ar pathname 50.Op Fl c Ar cpu-spec 51.Op Fl d 52.Op Fl f Ar pluginopt 53.Op Fl g 54.Op Fl k Ar kerneldir 55.Op Fl m Ar pathname 56.Op Fl n Ar rate 57.Op Fl o Ar outputfile 58.Op Fl p Ar event-spec 59.Op Fl q 60.Op Fl r Ar fsroot 61.Op Fl s Ar event-spec 62.Op Fl t Ar process-spec 63.Op Fl v 64.Op Fl w Ar secs 65.Op Fl z Ar graphdepth 66.Op Ar command Op Ar args 67.Sh DESCRIPTION 68The 69.Nm 70utility measures system performance using the facilities provided by 71.Xr hwpmc 4 . 72.Pp 73The 74.Nm 75utility can measure both hardware events seen by the system as a 76whole, and those seen when a specified set of processes are executing 77on the system's CPUs. 78If a specific set of processes is being targeted (for example, 79if the 80.Fl t Ar process-spec 81option is specified, or if a command line is specified using 82.Ar command ) , 83then measurement occurs till 84.Ar command 85exits, or till all target processes specified by the 86.Fl t Ar process-spec 87options exit, or till the 88.Nm 89utility is interrupted by the user. 90If a specific set of processes is not targeted for measurement, then 91.Nm 92will perform system-wide measurements till interrupted by the 93user. 94.Pp 95A given invocation of 96.Nm 97can mix allocations of system-mode and process-mode PMCs, of both 98counting and sampling flavors. 99The values of all counting PMCs are printed in human readable form 100at regular intervals by 101.Nm . 102The output of sampling PMCs may be configured to go to a log file for 103subsequent offline analysis, or, at the expense of greater 104overhead, may be configured to be printed in text form on the fly. 105.Pp 106Hardware events to measure are specified to 107.Nm 108using event specifier strings 109.Ar event-spec . 110The syntax of these event specifiers is machine dependent and is 111documented in 112.Xr pmc 3 . 113.Pp 114A process-mode PMC may be configured to be inheritable by the target 115process' current and future children. 116.Sh OPTIONS 117The following options are available: 118.Bl -tag -width indent 119.It Fl C 120Toggle between showing cumulative or incremental counts for 121subsequent counting mode PMCs specified on the command line. 122The default is to show incremental counts. 123.It Fl D Ar pathname 124Create files with per-program samples in the directory named 125by 126.Ar pathname . 127The default is to create these files in the current directory. 128.It Fl E 129Toggle showing per-process counts at the time a tracked process 130exits for subsequent process-mode PMCs specified on the command line. 131This option is useful for mapping the performance characteristics of a 132complex pipeline of processes when used in conjunction with the 133.Fl d 134option. 135The default is to not to enable per-process tracking. 136.It Fl F Ar pathname 137Print calltree (Kcachegrind) information to file 138.Ar pathname . 139If argument 140.Ar pathname 141is a 142.Dq Li - 143this information is sent to the output file specified by the 144.Fl o 145option. 146.It Fl G Ar pathname 147Print callchain information to file 148.Ar pathname . 149If argument 150.Ar pathname 151is a 152.Dq Li - 153this information is sent to the output file specified by the 154.Fl o 155option. 156.It Fl M Ar mapfilename 157Write the mapping between executable objects encountered in the event 158log and the abbreviated pathnames used for 159.Xr gprof 1 160profiles to file 161.Ar mapfilename . 162If this option is not specified, mapping information is not written. 163Argument 164.Ar mapfilename 165may be a 166.Dq Li - 167in which case this mapping information is sent to the output 168file configured by the 169.Fl o 170option. 171.It Fl N 172Toggle capturing callchain information for subsequent sampling PMCs. 173The default is for sampling PMCs to capture callchain information. 174.It Fl O Ar logfilename 175Send logging output to file 176.Ar logfilename . 177If 178.Ar logfilename 179is of the form 180.Ar hostname Ns : Ns Ar port , 181where 182.Ar hostname 183does not start with a 184.Ql \&. 185or a 186.Ql / , 187then 188.Nm 189will open a network socket to host 190.Ar hostname 191on port 192.Ar port . 193.Pp 194If the 195.Fl O 196option is not specified and one of the logging options is requested, 197then 198.Nm 199will print a textual form of the logged events to the configured 200output file. 201.It Fl P Ar event-spec 202Allocate a process mode sampling PMC measuring hardware events 203specified in 204.Ar event-spec . 205.It Fl R Ar logfilename 206Perform offline analysis using sampling data in file 207.Ar logfilename . 208.It Fl S Ar event-spec 209Allocate a system mode sampling PMC measuring hardware events 210specified in 211.Ar event-spec . 212.It Fl T 213Use a top like mode for sampling PMCs. The following hotkeys 214can be used: 'c+a' switch to accumulative mode, 'c+d' switch 215to delta mode, 'm' merge PMCs, 'n' change view, 'p' show next 216PMC, ' ' pause, 'q' quit. calltree only: 'f' cost under threshold 217is seen as a dot. 218.It Fl W 219Toggle logging the incremental counts seen by the threads of a 220tracked process each time they are scheduled on a CPU. 221This is an experimental feature intended to help analyse the 222dynamic behaviour of processes in the system. 223It may incur substantial overhead if enabled. 224The default is for this feature to be disabled. 225.It Fl a Ar pathname 226Perform a symbol and file:line lookup for each address in each 227callgraph and save the output to 228.Ar pathname . 229Unlike 230.Fl m 231that only resolves the first symbol in the graph, this resolves 232every node in the callgraph, or prints out addresses if no 233lookup information is available. 234This option requires the 235.Fl R 236option to read in samples that were previously collected and 237saved with the 238.Fl o 239option. 240.It Fl c Ar cpu-spec 241Set the cpus for subsequent system mode PMCs specified on the 242command line to 243.Ar cpu-spec . 244Argument 245.Ar cpu-spec 246is a comma separated list of CPU numbers, or the literal 247.Sq * 248denoting all unhalted CPUs. 249The default is to allocate system mode PMCs on all unhalted 250CPUs. 251.It Fl d 252Toggle between process mode PMCs measuring events for the target 253process' current and future children or only measuring events for 254the target process. 255The default is to measure events for the target process alone. 256.It Fl f Ar pluginopt 257Pass option string to the active plugin. 258.br 259threshold=<float> do not display cost under specified value (Top). 260.br 261skiplink=0|1 replace node with cost under threshold by a dot (Top). 262.It Fl g 263Produce profiles in a format compatible with 264.Xr gprof 1 . 265A separate profile file is generated for each executable object 266encountered. 267Profile files are placed in sub-directories named by their PMC 268event name. 269.It Fl k Ar kerneldir 270Set the pathname of the kernel directory to argument 271.Ar kerneldir . 272This directory specifies where 273.Nm 274should look for the kernel and its modules. 275The default is 276.Pa /boot/kernel . 277.It Fl m Ar pathname 278Print the sampled PCs with the name, the start and ending addresses 279of the function within they live. 280The 281.Ar pathname 282argument is mandatory and indicates where the information will be stored. 283If argument 284.Ar pathname 285is a 286.Dq Li - 287this information is sent to the output file specified by the 288.Fl o 289option. 290.It Fl n Ar rate 291Set the default sampling rate for subsequent sampling mode 292PMCs specified on the command line. 293The default is to configure PMCs to sample the CPU's instruction 294pointer every 65536 events. 295.It Fl o Ar outputfile 296Send counter readings and textual representations of logged data 297to file 298.Ar outputfile . 299The default is to send output to 300.Pa stderr 301when collecting live data and to 302.Pa stdout 303when processing a pre-existing logfile. 304.It Fl p Ar event-spec 305Allocate a process mode counting PMC measuring hardware events 306specified in 307.Ar event-spec . 308.It Fl q 309Decrease verbosity. 310.It Fl r Ar fsroot 311Set the top of the filesystem hierarchy under which executables 312are located to argument 313.Ar fsroot . 314The default is 315.Pa / . 316.It Fl s Ar event-spec 317Allocate a system mode counting PMC measuring hardware events 318specified in 319.Ar event-spec . 320.It Fl t Ar process-spec 321Attach process mode PMCs to the processes named by argument 322.Ar process-spec . 323Argument 324.Ar process-spec 325may be a non-negative integer denoting a specific process id, or a 326regular expression for selecting processes based on their command names. 327.It Fl v 328Increase verbosity. 329.It Fl w Ar secs 330Print the values of all counting mode PMCs or sampling mode PMCs 331for top mode every 332.Ar secs 333seconds. 334The argument 335.Ar secs 336may be a fractional value. 337The default interval is 5 seconds. 338.It Fl z Ar graphdepth 339When printing system-wide callgraphs, limit callgraphs to the depth 340specified by argument 341.Ar graphdepth . 342.El 343.Pp 344If 345.Ar command 346is specified, it is executed using 347.Xr execvp 3 . 348.Sh EXAMPLES 349To perform system-wide statistical sampling on an AMD Athlon CPU with 350samples taken every 32768 instruction retirals and data being sampled 351to file 352.Pa sample.stat , 353use: 354.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 355.Pp 356To execute 357.Nm firefox 358and measure the number of data cache misses suffered 359by it and its children every 12 seconds on an AMD Athlon, use: 360.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 361.Pp 362To measure instructions retired for all processes named 363.Dq emacs 364use: 365.Dl "pmcstat -t '^emacs$' -p instructions" 366.Pp 367To measure instructions retired for processes named 368.Dq emacs 369for a period of 10 seconds use: 370.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 371.Pp 372To count instruction tlb-misses on CPUs 0 and 2 on a Intel 373Pentium Pro/Pentium III SMP system use: 374.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 375.Pp 376To collect profiling information for a specific process with pid 1234 377based on instruction cache misses seen by it use: 378.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 379.Pp 380To perform system-wide sampling on all configured processors 381based on processor instructions retired use: 382.Dl "pmcstat -S instructions -O /tmp/sample.out" 383If callgraph capture is not desired use: 384.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 385.Pp 386To send the generated event log to a remote machine use: 387.Dl "pmcstat -S instructions -O remotehost:port" 388On the remote machine, the sample log can be collected using 389.Xr nc 1 : 390.Dl "nc -l remotehost port > /tmp/sample.out" 391.Pp 392To generate 393.Xr gprof 1 394compatible profiles from a sample file use: 395.Dl "pmcstat -R /tmp/sample.out -g" 396.Pp 397To print a system-wide profile with callgraphs to file 398.Pa "foo.graph" 399use: 400.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 401.Sh DIAGNOSTICS 402If option 403.Fl v 404is specified, 405.Nm 406may issue the following diagnostic messages: 407.Bl -diag 408.It "#callchain/dubious-frames" 409The number of callchain records that had an 410.Dq impossible 411value for a return address. 412.It "#exec handling errors" 413The number of 414.Xr exec 2 415events in the log file that named executables that could not be 416analyzed. 417.It "#exec/elf" 418The number of 419.Xr exec 2 420events that named ELF executables. 421.It "#exec/unknown" 422The number of 423.Xr exec 2 424events that named executables with unrecognized formats. 425.It "#samples/total" 426The total number of samples in the log file. 427.It "#samples/unclaimed" 428The number of samples that could not be correlated to a known 429executable object (i.e., to an executable, shared library, the 430kernel or the runtime loader). 431.It "#samples/unknown-object" 432The number of samples that were associated with an executable 433with an unrecognized object format. 434.El 435.Pp 436.Ex -std 437.Sh COMPATIBILITY 438Due to the limitations of the 439.Pa gmon.out 440file format, 441.Xr gprof 1 442compatible profiles generated by the 443.Fl g 444option do not contain information about calls that cross executable 445boundaries. 446The generated 447.Pa gmon.out 448files are also only meaningful for native executables. 449.Sh SEE ALSO 450.Xr gprof 1 , 451.Xr nc 1 , 452.Xr execvp 3 , 453.Xr pmc 3 , 454.Xr pmclog 3 , 455.Xr hwpmc 4 , 456.Xr pmccontrol 8 , 457.Xr sysctl 8 458.Sh HISTORY 459The 460.Nm 461utility first appeared in 462.Fx 6.0 . 463It is 464.Ud 465.Sh AUTHORS 466.An Joseph Koshy Aq jkoshy@FreeBSD.org 467.Sh BUGS 468The 469.Nm 470utility cannot yet analyse 471.Xr hwpmc 4 472logs generated by non-native architectures. 473