1.\" Copyright (c) 2003-2007 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd April 23, 2007 29.Os 30.Dt PMCSTAT 8 31.Sh NAME 32.Nm pmcstat 33.Nd "performance measurement with performance monitoring hardware" 34.Sh SYNOPSIS 35.Nm 36.Op Fl C 37.Op Fl D Ar pathname 38.Op Fl E 39.Op Fl G Ar pathname 40.Op Fl M Ar mapfilename 41.Op Fl N 42.Op Fl O Ar logfilename 43.Op Fl P Ar event-spec 44.Op Fl R Ar logfilename 45.Op Fl S Ar event-spec 46.Op Fl W 47.Op Fl c Ar cpu-spec 48.Op Fl d 49.Op Fl g 50.Op Fl k Ar kerneldir 51.Op Fl n Ar rate 52.Op Fl o Ar outputfile 53.Op Fl p Ar event-spec 54.Op Fl q 55.Op Fl r Ar fsroot 56.Op Fl s Ar event-spec 57.Op Fl t Ar process-spec 58.Op Fl v 59.Op Fl w Ar secs 60.Op Fl z Ar graphdepth 61.Op Ar command Op Ar args 62.Sh DESCRIPTION 63The 64.Nm 65utility measures system performance using the facilities provided by 66.Xr hwpmc 4 . 67.Pp 68The 69.Nm 70utility can measure both hardware events seen by the system as a 71whole, and those seen when a specified set of processes are executing 72on the system's CPUs. 73If a specific set of processes is being targeted (for example, 74if the 75.Fl t Ar process-spec 76option is specified, or if a command line is specified using 77.Ar command ) , 78then measurement occurs till 79.Ar command 80exits, or till all target processes specified by the 81.Fl t Ar process-spec 82options exit, or till the 83.Nm 84utility is interrupted by the user. 85If a specific set of processes is not targeted for measurement, then 86.Nm 87will perform system-wide measurements till interrupted by the 88user. 89.Pp 90A given invocation of 91.Nm 92can mix allocations of system-mode and process-mode PMCs, of both 93counting and sampling flavors. 94The values of all counting PMCs are printed in human readable form 95at regular intervals by 96.Nm . 97The output of sampling PMCs may be configured to go to a log file for 98subsequent offline analysis, or, at the expense of greater 99overhead, may be configured to be printed in text form on the fly. 100.Pp 101Hardware events to measure are specified to 102.Nm 103using event specifier strings 104.Ar event-spec . 105The syntax of these event specifiers is machine dependent and is 106documented in 107.Xr pmc 3 . 108.Pp 109A process-mode PMC may be configured to be inheritable by the target 110process' current and future children. 111.Sh OPTIONS 112The following options are available: 113.Bl -tag -width indent 114.It Fl C 115Toggle between showing cumulative or incremental counts for 116subsequent counting mode PMCs specified on the command line. 117The default is to show incremental counts. 118.It Fl D Ar pathname 119Create files with per-program samples in the directory named 120by 121.Ar pathname . 122The default is to create these files in the current directory. 123.It Fl E 124Toggle showing per-process counts at the time a tracked process 125exits for subsequent process-mode PMCs specified on the command line. 126This option is useful for mapping the performance characteristics of a 127complex pipeline of processes when used in conjunction with the 128.Fl d 129option. 130The default is to not to enable per-process tracking. 131.It Fl G Ar pathname 132Print callchain information to file 133.Ar pathname . 134If argument 135.Ar pathname 136is a 137.Dq Li - 138this information is sent to the output file specified by the 139.Fl o 140option. 141.It Fl M Ar mapfilename 142Write the mapping between executable objects encountered in the event 143log and the abbreviated pathnames used for 144.Xr gprof 1 145profiles to file 146.Ar mapfilename . 147If this option is not specified, mapping information is not written. 148Argument 149.Ar mapfilename 150may be a 151.Dq Li - 152in which case this mapping information is sent to the output 153file configured by the 154.Fl o 155option. 156.It Fl N 157Toggle capturing callchain information for subsequent sampling PMCs. 158The default is for sampling PMCs to capture callchain information. 159.It Fl O Ar logfilename 160Send logging output to file 161.Ar logfilename . 162If 163.Ar logfilename 164is of the form 165.Ar hostname Ns : Ns Ar port , 166where 167.Ar hostname 168does not start with a 169.Ql \&. 170or a 171.Ql / , 172then 173.Nm 174will open a network socket to host 175.Ar hostname 176on port 177.Ar port . 178.Pp 179If the 180.Fl O 181option is not specified and one of the logging options is requested, 182then 183.Nm 184will print a textual form of the logged events to the configured 185output file. 186.It Fl P Ar event-spec 187Allocate a process mode sampling PMC measuring hardware events 188specified in 189.Ar event-spec . 190.It Fl R Ar logfilename 191Perform offline analysis using sampling data in file 192.Ar logfilename . 193.It Fl S Ar event-spec 194Allocate a system mode sampling PMC measuring hardware events 195specified in 196.Ar event-spec . 197.It Fl W 198Toggle logging the incremental counts seen by the threads of a 199tracked process each time they are scheduled on a CPU. 200This is an experimental feature intended to help analyse the 201dynamic behaviour of processes in the system. 202It may incur substantial overhead if enabled. 203The default is for this feature to be disabled. 204.It Fl c Ar cpu-spec 205Set the cpus for subsequent system mode PMCs specified on the 206command line to 207.Ar cpu-spec . 208Argument 209.Ar cpu-spec 210is a comma separated list of CPU numbers, or the literal 211.Sq * 212denoting all CPUs. 213The default is to allocate system mode PMCs on all active CPUs in 214the system. 215.It Fl d 216Toggle between process mode PMCs measuring events for the target 217process' current and future children or only measuring events for 218the target process. 219The default is to measure events for the target process alone. 220.It Fl g 221Produce profiles in a format compatible with 222.Xr gprof 1 . 223A separate profile file is generated for each executable object 224encountered. 225Profile files are placed in sub-directories named by their PMC 226event name. 227.It Fl k Ar kerneldir 228Set the pathname of the kernel directory to argument 229.Ar kerneldir . 230This directory specifies where 231.Nm 232should look for the kernel and its modules. 233The default is 234.Pa /boot/kernel . 235.It Fl n Ar rate 236Set the default sampling rate for subsequent sampling mode 237PMCs specified on the command line. 238The default is to configure PMCs to sample the CPU's instruction 239pointer every 65536 events. 240.It Fl o Ar outputfile 241Send counter readings and textual representations of logged data 242to file 243.Ar outputfile . 244The default is to send output to 245.Pa stderr 246when collecting live data and to 247.Pa stdout 248when processing a pre-existing logfile. 249.It Fl p Ar event-spec 250Allocate a process mode counting PMC measuring hardware events 251specified in 252.Ar event-spec . 253.It Fl q 254Decrease verbosity. 255.It Fl r Ar fsroot 256Set the top of the filesystem hierarchy under which executables 257are located to argument 258.Ar fsroot . 259The default is 260.Pa / . 261.It Fl s Ar event-spec 262Allocate a system mode counting PMC measuring hardware events 263specified in 264.Ar event-spec . 265.It Fl t Ar process-spec 266Attach process mode PMCs to the processes named by argument 267.Ar process-spec . 268Argument 269.Ar process-spec 270may be a non-negative integer denoting a specific process id, or a 271regular expression for selecting processes based on their command names. 272.It Fl v 273Increase verbosity. 274.It Fl w Ar secs 275Print the values of all counting mode PMCs every 276.Ar secs 277seconds. 278The argument 279.Ar secs 280may be a fractional value. 281The default interval is 5 seconds. 282.It Fl z Ar graphdepth 283When printing system-wide callgraphs, limit callgraphs to the depth 284specified by argument 285.Ar graphdepth . 286.El 287.Pp 288If 289.Ar command 290is specified, it is executed using 291.Xr execvp 3 . 292.Sh EXAMPLES 293To perform system-wide statistical sampling on an AMD Athlon CPU with 294samples taken every 32768 instruction retirals and data being sampled 295to file 296.Pa sample.stat , 297use: 298.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 299.Pp 300To execute 301.Nm mozilla 302and measure the number of data cache misses suffered 303by it and its children every 12 seconds on an AMD Athlon, use: 304.Dl "pmcstat -d -w 12 -p k7-dc-misses mozilla" 305.Pp 306To measure processor instructions retired for all processes named 307.Dq emacs 308use: 309.Dl "pmcstat -t '^emacs$' -p instructions" 310.Pp 311To count instruction tlb-misses on CPUs 0 and 2 on a Intel 312Pentium Pro/Pentium III SMP system use: 313.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 314.Pp 315To collect profiling information for a specific process with pid 1234 316based on instruction cache misses seen by it use: 317.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 318.Pp 319To perform system-wide sampling on all configured processors 320based on processor instructions retired use: 321.Dl "pmcstat -S instructions -O /tmp/sample.out" 322If callgraph capture is not desired use: 323.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 324.Pp 325To send the generated event log to a remote machine use: 326.Dl "pmcstat -S instructions -O remotehost:port" 327On the remote machine, the sample log can be collected using 328.Xr nc 1 : 329.Dl "nc -l remotehost port > /tmp/sample.out" 330.Pp 331To generate 332.Xr gprof 1 333compatible profiles from a sample file use: 334.Dl "pmcstat -R /tmp/sample.out -g" 335.Pp 336To print a system-wide profile with callgraphs to file 337.Pa "foo.graph" 338use: 339.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 340.Sh DIAGNOSTICS 341.Ex -std 342.Sh COMPATIBILITY 343Due to the limitations of the 344.Pa gmon.out 345file format, 346.Xr gprof 1 347compatible profiles generated by the 348.Fl g 349option do not contain information about calls that cross executable 350boundaries. 351The generated 352.Pa gmon.out 353files are also only meaningful for native executables. 354.Sh SEE ALSO 355.Xr gprof 1 , 356.Xr nc 1 , 357.Xr execvp 3 , 358.Xr pmc 3 , 359.Xr pmclog 3 , 360.Xr hwpmc 4 , 361.Xr pmccontrol 8 , 362.Xr sysctl 8 363.Sh HISTORY 364The 365.Nm 366utility first appeared in 367.Fx 6.0 . 368It is 369.Ud 370.Sh AUTHORS 371.An Joseph Koshy Aq jkoshy@FreeBSD.org 372.Sh BUGS 373The 374.Nm 375utility cannot yet analyse 376.Xr hwpmc 4 377logs generated by non-native architectures. 378