1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd May 31, 2023 29.Dt PMCSTAT 8 30.Os 31.Sh NAME 32.Nm pmcstat 33.Nd "performance measurement with performance monitoring hardware" 34.Sh SYNOPSIS 35.Nm 36.Op Fl A 37.Op Fl C 38.Op Fl D Ar pathname 39.Op Fl E 40.Op Fl F Ar pathname 41.Op Fl G Ar pathname 42.Op Fl I 43.Op Fl L 44.Op Fl M Ar mapfilename 45.Op Fl N 46.Op Fl O Ar logfilename 47.Op Fl P Ar event-spec 48.Op Fl R Ar logfilename 49.Op Fl S Ar event-spec 50.Op Fl T 51.Op Fl U 52.Op Fl W 53.Op Fl a Ar pathname 54.Op Fl c Ar cpu-spec 55.Op Fl d 56.Op Fl e 57.Op Fl f Ar pluginopt 58.Op Fl g 59.Op Fl i Ar lwp 60.Op Fl l Ar secs 61.Op Fl m Ar pathname 62.Op Fl n Ar rate 63.Op Fl o Ar outputfile 64.Op Fl p Ar event-spec 65.Op Fl q 66.Op Fl r Ar fsroot 67.Op Fl s Ar event-spec 68.Op Fl t Ar process-spec 69.Op Fl u Ar event-spec 70.Op Fl v 71.Op Fl w Ar secs 72.Op Fl z Ar graphdepth 73.Op Ar command Op Ar args 74.Sh DESCRIPTION 75The 76.Nm 77utility measures system performance using the facilities provided by 78.Xr hwpmc 4 . 79.Pp 80The 81.Nm 82utility can measure both hardware events seen by the system as a 83whole, and those seen when a specified set of processes are executing 84on the system's CPUs. 85If a specific set of processes is being targeted (for example, 86if the 87.Fl t Ar process-spec 88option is specified, or if a command line is specified using 89.Ar command ) , 90then measurement occurs till 91.Ar command 92exits, or till all target processes specified by the 93.Fl t Ar process-spec 94options exit, or till the 95.Nm 96utility is interrupted by the user. 97If a specific set of processes is not targeted for measurement, then 98.Nm 99will perform system-wide measurements till interrupted by the 100user. 101.Pp 102A given invocation of 103.Nm 104can mix allocations of system-mode and process-mode PMCs, of both 105counting and sampling flavors. 106The values of all counting PMCs are printed in human readable form 107at regular intervals by 108.Nm . 109The format of 110.Nm Ns 's 111human-readable textual output is not stable, and could change 112in the future. 113The output of sampling PMCs may be configured to go to a log file for 114subsequent offline analysis, or, at the expense of greater 115overhead, may be configured to be printed in text form on the fly. 116.Pp 117Hardware events to measure are specified to 118.Nm 119using event specifier strings 120.Ar event-spec . 121The syntax of these event specifiers is machine dependent and is 122documented in 123.Xr pmc 3 . 124.Pp 125A process-mode PMC may be configured to be inheritable by the target 126process' current and future children. 127.Sh OPTIONS 128The following options are available: 129.Bl -tag -width indent 130.It Fl A 131Skip symbol lookup and display address instead. 132.It Fl C 133Toggle between showing cumulative or incremental counts for 134subsequent counting mode PMCs specified on the command line. 135The default is to show incremental counts. 136.It Fl D Ar pathname 137Create files with per-program samples in the directory named 138by 139.Ar pathname . 140The default is to create these files in the current directory. 141.It Fl E 142Toggle showing per-process counts at the time a tracked process 143exits for subsequent process-mode PMCs specified on the command line. 144This option is useful for mapping the performance characteristics of a 145complex pipeline of processes when used in conjunction with the 146.Fl d 147option. 148The default is to not to enable per-process tracking. 149.It Fl F Ar pathname 150Print calltree (Kcachegrind) information to file 151.Ar pathname . 152If argument 153.Ar pathname 154is a 155.Dq Li - 156this information is sent to the output file specified by the 157.Fl o 158option. 159.It Fl G Ar pathname 160Print callchain information to file 161.Ar pathname . 162If argument 163.Ar pathname 164is a 165.Dq Li - 166this information is sent to the output file specified by the 167.Fl o 168option. 169.It Fl I 170Show the offset of the instruction pointer into the symbol. 171.It Fl L 172List all event names. 173.It Fl M Ar mapfilename 174Write the mapping between executable objects encountered in the event 175log and the abbreviated pathnames used for 176.Xr gprof 1 177profiles to file 178.Ar mapfilename . 179If this option is not specified, mapping information is not written. 180Argument 181.Ar mapfilename 182may be a 183.Dq Li - 184in which case this mapping information is sent to the output 185file configured by the 186.Fl o 187option. 188.It Fl N 189Toggle capturing callchain information for subsequent sampling PMCs. 190The default is for sampling PMCs to capture callchain information. 191.It Fl O Ar logfilename 192Send logging output to file 193.Ar logfilename . 194If 195.Ar logfilename 196is of the form 197.Ar hostname Ns : Ns Ar port , 198where 199.Ar hostname 200does not start with a 201.Ql \&. 202or a 203.Ql / , 204then 205.Nm 206will open a network socket to host 207.Ar hostname 208on port 209.Ar port . 210.Pp 211If the 212.Fl O 213option is not specified and one of the logging options is requested, 214then 215.Nm 216will print a textual form of the logged events to the configured 217output file. 218.It Fl P Ar event-spec 219Allocate a process mode sampling PMC measuring hardware events 220specified in 221.Ar event-spec . 222.It Fl R Ar logfilename 223Perform offline analysis using sampling data in file 224.Ar logfilename . 225.It Fl S Ar event-spec 226Allocate a system mode sampling PMC measuring hardware events 227specified in 228.Ar event-spec . 229.It Fl T 230Use a 231.Xr top 1 Ns -like 232mode for sampling PMCs. 233The following hotkeys can be used: 234.Pp 235.Bl -tag -compact -width "Ctrl+a" -offset 4n 236.It Ic A 237Toggle symbol resolution 238.Sm off 239.It Ic Ctrl + a 240.Sm on 241Switch to accumulative mode 242.Sm off 243.It Ic Ctrl + d 244.Sm on 245Switch to delta mode 246.It Ic f 247Represent the 248.Dq f 249cost under 250threshold as a dot (calltree only) 251.It Ic I 252Toggle showing offsets into symbols 253.It Ic m 254Merge PMCs 255.It Ic n 256Change view 257.It Ic p 258Show next PMC 259.It Ic q 260Quit 261.It Ic Space 262Pause 263.El 264.It Fl U 265Toggle capturing user-space call traces while in kernel mode. 266The default is for sampling PMCs to capture user-space callchain information 267while in user-space mode, and kernel callchain information while in kernel mode. 268.It Fl W 269Toggle logging the incremental counts seen by the threads of a 270tracked process each time they are scheduled on a CPU. 271This is an experimental feature intended to help analyse the 272dynamic behaviour of processes in the system. 273It may incur substantial overhead if enabled. 274The default is for this feature to be disabled. 275.It Fl a Ar pathname 276Perform a symbol and file:line lookup for each address in each 277callgraph and save the output to 278.Ar pathname . 279Unlike 280.Fl m 281that only resolves the first symbol in the graph, this resolves 282every node in the callgraph, or prints out addresses if no 283lookup information is available. 284This option requires the 285.Fl R 286option to read in samples that were previously collected and 287saved with the 288.Fl O 289option. 290.It Fl c Ar cpu-spec 291Set the cpus for subsequent system mode PMCs specified on the 292command line to 293.Ar cpu-spec . 294Argument 295.Ar cpu-spec 296is a comma separated list of CPU numbers, or the literal 297.Sq * 298denoting all available CPUs. 299The default is to allocate system mode PMCs on all available 300CPUs. 301.It Fl d 302Toggle between process mode PMCs measuring events for the target 303process' current and future children or only measuring events for 304the target process. 305The default is to measure events for the target process alone. 306(it has to be passed in the command line prior to 307.Fl p , 308.Fl s , 309.Fl P , 310or 311.Fl S ) . 312.It Fl e 313Specify that the gprof profile files will use a wide history counter. 314These files are produced in a format compatible with 315.Xr gprof 1 . 316However, other tools that cannot fully parse a BSD-style 317gmon header might be unable to correctly parse these files. 318.It Fl f Ar pluginopt 319Pass option string to the active plugin. 320.br 321threshold=<float> do not display cost under specified value (Top). 322.br 323skiplink=0|1 replace node with cost under threshold by a dot (Top). 324.It Fl g 325Produce profiles in a format compatible with 326.Xr gprof 1 . 327A separate profile file is generated for each executable object 328encountered. 329Profile files are placed in sub-directories named by their PMC 330event name. 331.It Fl i Ar lwp 332Filter on thread ID 333.Ar lwp , 334which you can get from 335.Xr ps 1 336.Fl o 337.Li lwp . 338.It Fl l Ar secs 339Set system-wide performance measurement duration for 340.Ar secs 341seconds. 342The argument 343.Ar secs 344may be a fractional value. 345.It Fl m Ar pathname 346Print the sampled PCs with the name, the start and ending addresses 347of the function within they live. 348The 349.Ar pathname 350argument is mandatory and indicates where the information will be stored. 351If argument 352.Ar pathname 353is a 354.Dq Li - 355this information is sent to the output file specified by the 356.Fl o 357option. 358This option requires the 359.Fl R 360option to read in samples that were previously collected and 361saved with the 362.Fl O 363option. 364.It Fl n Ar rate 365Set the default sampling rate for subsequent sampling mode 366PMCs specified on the command line. 367The default is to configure PMCs to sample the CPU's instruction 368pointer every 65536 events. 369.It Fl o Ar outputfile 370Send counter readings and textual representations of logged data 371to file 372.Ar outputfile . 373The default is to send output to 374.Pa stderr 375when collecting live data and to 376.Pa stdout 377when processing a pre-existing logfile. 378.It Fl p Ar event-spec 379Allocate a process mode counting PMC measuring hardware events 380specified in 381.Ar event-spec . 382.It Fl q 383Decrease verbosity. 384.It Fl r Ar fsroot 385Set the top of the filesystem hierarchy under which executables 386are located to argument 387.Ar fsroot . 388The default is 389.Pa / . 390.It Fl s Ar event-spec 391Allocate a system mode counting PMC measuring hardware events 392specified in 393.Ar event-spec . 394.It Fl t Ar process-spec 395Attach process mode PMCs to the processes named by argument 396.Ar process-spec . 397Argument 398.Ar process-spec 399may be a non-negative integer denoting a specific process id, or a 400regular expression for selecting processes based on their command names. 401.It Fl u Ar event-spec 402Provide short description of event. 403.It Fl v 404Increase verbosity. 405.It Fl w Ar secs 406Print the values of all counting mode PMCs or sampling mode PMCs 407for top mode every 408.Ar secs 409seconds. 410The argument 411.Ar secs 412may be a fractional value. 413The default interval is 5 seconds. 414.It Fl z Ar graphdepth 415When printing system-wide callgraphs, limit callgraphs to the depth 416specified by argument 417.Ar graphdepth . 418.El 419.Pp 420If 421.Ar command 422is specified, it is executed using 423.Xr execvp 3 . 424.Sh EXAMPLES 425To perform system-wide statistical sampling on an AMD Athlon CPU with 426samples taken every 32768 instruction retirals and data being sampled 427to file 428.Pa sample.stat , 429use: 430.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 431.Pp 432To execute 433.Nm firefox 434and measure the number of data cache misses suffered 435by it and its children every 12 seconds on an AMD Athlon, use: 436.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 437.Pp 438To measure instructions retired for all processes named 439.Dq emacs 440use: 441.Dl "pmcstat -t '^emacs$' -p instructions" 442.Pp 443To measure instructions retired for processes named 444.Dq emacs 445for a period of 10 seconds use: 446.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 447.Pp 448To count instruction tlb-misses on CPUs 0 and 2 on a Intel 449Pentium Pro/Pentium III SMP system use: 450.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 451.Pp 452To collect profiling information for a specific process with pid 1234 453based on instruction cache misses seen by it use: 454.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 455.Pp 456To perform system-wide sampling on all configured processors 457based on processor instructions retired use: 458.Dl "pmcstat -S instructions -O /tmp/sample.out" 459If callgraph capture is not desired use: 460.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 461.Pp 462To send the generated event log to a remote machine use: 463.Dl "pmcstat -S instructions -O remotehost:port" 464On the remote machine, the sample log can be collected using 465.Xr nc 1 : 466.Dl "nc -l remotehost port > /tmp/sample.out" 467.Pp 468To generate 469.Xr gprof 1 470compatible profiles from a sample file use: 471.Dl "pmcstat -R /tmp/sample.out -g" 472.Pp 473To print a system-wide profile with callgraphs to file 474.Pa "foo.graph" 475use: 476.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 477.Sh DIAGNOSTICS 478If option 479.Fl v 480is specified, 481.Nm 482may issue the following diagnostic messages: 483.Bl -diag 484.It "#callchain/dubious-frames" 485The number of callchain records that had an 486.Dq impossible 487value for a return address. 488.It "#exec handling errors" 489The number of 490.Xr exec 2 491events in the log file that named executables that could not be 492analyzed. 493.It "#exec/elf" 494The number of 495.Xr exec 2 496events that named ELF executables. 497.It "#exec/unknown" 498The number of 499.Xr exec 2 500events that named executables with unrecognized formats. 501.It "#samples/total" 502The total number of samples in the log file. 503.It "#samples/unclaimed" 504The number of samples that could not be correlated to a known 505executable object (i.e., to an executable, shared library, the 506kernel or the runtime loader). 507.It "#samples/unknown-object" 508The number of samples that were associated with an executable 509with an unrecognized object format. 510.El 511.Pp 512.Ex -std 513.Sh COMPATIBILITY 514Due to the limitations of the 515.Pa gmon.out 516file format, 517.Xr gprof 1 518compatible profiles generated by the 519.Fl g 520option do not contain information about calls that cross executable 521boundaries. 522The generated 523.Pa gmon.out 524files are also only meaningful for native executables. 525.Sh SEE ALSO 526.Xr gprof 1 , 527.Xr nc 1 , 528.Xr execvp 3 , 529.Xr pmc 3 , 530.Xr pmclog 3 , 531.Xr hwpmc 4 , 532.Xr pmccontrol 8 , 533.Xr sysctl 8 534.Sh HISTORY 535The 536.Nm 537utility first appeared in 538.Fx 6.0 . 539.Sh AUTHORS 540.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org 541.Sh BUGS 542The 543.Nm 544utility cannot yet analyse 545.Xr hwpmc 4 546logs generated by non-native architectures. 547