1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.Dd May 31, 2023 27.Dt PMCSTAT 8 28.Os 29.Sh NAME 30.Nm pmcstat 31.Nd "performance measurement with performance monitoring hardware" 32.Sh SYNOPSIS 33.Nm 34.Op Fl A 35.Op Fl C 36.Op Fl D Ar pathname 37.Op Fl E 38.Op Fl F Ar pathname 39.Op Fl G Ar pathname 40.Op Fl I 41.Op Fl L 42.Op Fl M Ar mapfilename 43.Op Fl N 44.Op Fl O Ar logfilename 45.Op Fl P Ar event-spec 46.Op Fl R Ar logfilename 47.Op Fl S Ar event-spec 48.Op Fl T 49.Op Fl U 50.Op Fl W 51.Op Fl a Ar pathname 52.Op Fl c Ar cpu-spec 53.Op Fl d 54.Op Fl e 55.Op Fl f Ar pluginopt 56.Op Fl g 57.Op Fl i Ar lwp 58.Op Fl l Ar secs 59.Op Fl m Ar pathname 60.Op Fl n Ar rate 61.Op Fl o Ar outputfile 62.Op Fl p Ar event-spec 63.Op Fl q 64.Op Fl r Ar fsroot 65.Op Fl s Ar event-spec 66.Op Fl t Ar process-spec 67.Op Fl u Ar event-spec 68.Op Fl v 69.Op Fl w Ar secs 70.Op Fl z Ar graphdepth 71.Op Ar command Op Ar args 72.Sh DESCRIPTION 73The 74.Nm 75utility measures system performance using the facilities provided by 76.Xr hwpmc 4 . 77.Pp 78The 79.Nm 80utility can measure both hardware events seen by the system as a 81whole, and those seen when a specified set of processes are executing 82on the system's CPUs. 83If a specific set of processes is being targeted (for example, 84if the 85.Fl t Ar process-spec 86option is specified, or if a command line is specified using 87.Ar command ) , 88then measurement occurs till 89.Ar command 90exits, or till all target processes specified by the 91.Fl t Ar process-spec 92options exit, or till the 93.Nm 94utility is interrupted by the user. 95If a specific set of processes is not targeted for measurement, then 96.Nm 97will perform system-wide measurements till interrupted by the 98user. 99.Pp 100A given invocation of 101.Nm 102can mix allocations of system-mode and process-mode PMCs, of both 103counting and sampling flavors. 104The values of all counting PMCs are printed in human readable form 105at regular intervals by 106.Nm . 107The format of 108.Nm Ns 's 109human-readable textual output is not stable, and could change 110in the future. 111The output of sampling PMCs may be configured to go to a log file for 112subsequent offline analysis, or, at the expense of greater 113overhead, may be configured to be printed in text form on the fly. 114.Pp 115Hardware events to measure are specified to 116.Nm 117using event specifier strings 118.Ar event-spec . 119The syntax of these event specifiers is machine dependent and is 120documented in 121.Xr pmc 3 . 122.Pp 123A process-mode PMC may be configured to be inheritable by the target 124process' current and future children. 125.Sh OPTIONS 126The following options are available: 127.Bl -tag -width indent 128.It Fl A 129Skip symbol lookup and display address instead. 130.It Fl C 131Toggle between showing cumulative or incremental counts for 132subsequent counting mode PMCs specified on the command line. 133The default is to show incremental counts. 134.It Fl D Ar pathname 135Create files with per-program samples in the directory named 136by 137.Ar pathname . 138The default is to create these files in the current directory. 139.It Fl E 140Toggle showing per-process counts at the time a tracked process 141exits for subsequent process-mode PMCs specified on the command line. 142This option is useful for mapping the performance characteristics of a 143complex pipeline of processes when used in conjunction with the 144.Fl d 145option. 146The default is to not to enable per-process tracking. 147.It Fl F Ar pathname 148Print calltree (Kcachegrind) information to file 149.Ar pathname . 150If argument 151.Ar pathname 152is a 153.Dq Li - 154this information is sent to the output file specified by the 155.Fl o 156option. 157.It Fl G Ar pathname 158Print callchain information to file 159.Ar pathname . 160If argument 161.Ar pathname 162is a 163.Dq Li - 164this information is sent to the output file specified by the 165.Fl o 166option. 167.It Fl I 168Show the offset of the instruction pointer into the symbol. 169.It Fl L 170List all event names. 171.It Fl M Ar mapfilename 172Write the mapping between executable objects encountered in the event 173log and the abbreviated pathnames used for 174.Xr gprof 1 175profiles to file 176.Ar mapfilename . 177If this option is not specified, mapping information is not written. 178Argument 179.Ar mapfilename 180may be a 181.Dq Li - 182in which case this mapping information is sent to the output 183file configured by the 184.Fl o 185option. 186.It Fl N 187Toggle capturing callchain information for subsequent sampling PMCs. 188The default is for sampling PMCs to capture callchain information. 189.It Fl O Ar logfilename 190Send logging output to file 191.Ar logfilename . 192If 193.Ar logfilename 194is of the form 195.Ar hostname Ns : Ns Ar port , 196where 197.Ar hostname 198does not start with a 199.Ql \&. 200or a 201.Ql / , 202then 203.Nm 204will open a network socket to host 205.Ar hostname 206on port 207.Ar port . 208.Pp 209If the 210.Fl O 211option is not specified and one of the logging options is requested, 212then 213.Nm 214will print a textual form of the logged events to the configured 215output file. 216.It Fl P Ar event-spec 217Allocate a process mode sampling PMC measuring hardware events 218specified in 219.Ar event-spec . 220.It Fl R Ar logfilename 221Perform offline analysis using sampling data in file 222.Ar logfilename . 223.It Fl S Ar event-spec 224Allocate a system mode sampling PMC measuring hardware events 225specified in 226.Ar event-spec . 227.It Fl T 228Use a 229.Xr top 1 Ns -like 230mode for sampling PMCs. 231The following hotkeys can be used: 232.Pp 233.Bl -tag -compact -width "Ctrl+a" -offset 4n 234.It Ic A 235Toggle symbol resolution 236.Sm off 237.It Ic Ctrl + a 238.Sm on 239Switch to accumulative mode 240.Sm off 241.It Ic Ctrl + d 242.Sm on 243Switch to delta mode 244.It Ic f 245Represent the 246.Dq f 247cost under 248threshold as a dot (calltree only) 249.It Ic I 250Toggle showing offsets into symbols 251.It Ic m 252Merge PMCs 253.It Ic n 254Change view 255.It Ic p 256Show next PMC 257.It Ic q 258Quit 259.It Ic Space 260Pause 261.El 262.It Fl U 263Toggle capturing user-space call traces while in kernel mode. 264The default is for sampling PMCs to capture user-space callchain information 265while in user-space mode, and kernel callchain information while in kernel mode. 266.It Fl W 267Toggle logging the incremental counts seen by the threads of a 268tracked process each time they are scheduled on a CPU. 269This is an experimental feature intended to help analyse the 270dynamic behaviour of processes in the system. 271It may incur substantial overhead if enabled. 272The default is for this feature to be disabled. 273.It Fl a Ar pathname 274Perform a symbol and file:line lookup for each address in each 275callgraph and save the output to 276.Ar pathname . 277Unlike 278.Fl m 279that only resolves the first symbol in the graph, this resolves 280every node in the callgraph, or prints out addresses if no 281lookup information is available. 282This option requires the 283.Fl R 284option to read in samples that were previously collected and 285saved with the 286.Fl O 287option. 288.It Fl c Ar cpu-spec 289Set the cpus for subsequent system mode PMCs specified on the 290command line to 291.Ar cpu-spec . 292Argument 293.Ar cpu-spec 294is a comma separated list of CPU numbers, or the literal 295.Sq * 296denoting all available CPUs. 297The default is to allocate system mode PMCs on all available 298CPUs. 299.It Fl d 300Toggle between process mode PMCs measuring events for the target 301process' current and future children or only measuring events for 302the target process. 303The default is to measure events for the target process alone. 304(it has to be passed in the command line prior to 305.Fl p , 306.Fl s , 307.Fl P , 308or 309.Fl S ) . 310.It Fl e 311Specify that the gprof profile files will use a wide history counter. 312These files are produced in a format compatible with 313.Xr gprof 1 . 314However, other tools that cannot fully parse a BSD-style 315gmon header might be unable to correctly parse these files. 316.It Fl f Ar pluginopt 317Pass option string to the active plugin. 318.br 319threshold=<float> do not display cost under specified value (Top). 320.br 321skiplink=0|1 replace node with cost under threshold by a dot (Top). 322.It Fl g 323Produce profiles in a format compatible with 324.Xr gprof 1 . 325A separate profile file is generated for each executable object 326encountered. 327Profile files are placed in sub-directories named by their PMC 328event name. 329.It Fl i Ar lwp 330Filter on thread ID 331.Ar lwp , 332which you can get from 333.Xr ps 1 334.Fl o 335.Li lwp . 336.It Fl l Ar secs 337Set system-wide performance measurement duration for 338.Ar secs 339seconds. 340The argument 341.Ar secs 342may be a fractional value. 343.It Fl m Ar pathname 344Print the sampled PCs with the name, the start and ending addresses 345of the function within they live. 346The 347.Ar pathname 348argument is mandatory and indicates where the information will be stored. 349If argument 350.Ar pathname 351is a 352.Dq Li - 353this information is sent to the output file specified by the 354.Fl o 355option. 356This option requires the 357.Fl R 358option to read in samples that were previously collected and 359saved with the 360.Fl O 361option. 362.It Fl n Ar rate 363Set the default sampling rate for subsequent sampling mode 364PMCs specified on the command line. 365The default is to configure PMCs to sample the CPU's instruction 366pointer every 65536 events. 367.It Fl o Ar outputfile 368Send counter readings and textual representations of logged data 369to file 370.Ar outputfile . 371The default is to send output to 372.Pa stderr 373when collecting live data and to 374.Pa stdout 375when processing a pre-existing logfile. 376.It Fl p Ar event-spec 377Allocate a process mode counting PMC measuring hardware events 378specified in 379.Ar event-spec . 380.It Fl q 381Decrease verbosity. 382.It Fl r Ar fsroot 383Set the top of the filesystem hierarchy under which executables 384are located to argument 385.Ar fsroot . 386The default is 387.Pa / . 388.It Fl s Ar event-spec 389Allocate a system mode counting PMC measuring hardware events 390specified in 391.Ar event-spec . 392.It Fl t Ar process-spec 393Attach process mode PMCs to the processes named by argument 394.Ar process-spec . 395Argument 396.Ar process-spec 397may be a non-negative integer denoting a specific process id, or a 398regular expression for selecting processes based on their command names. 399.It Fl u Ar event-spec 400Provide short description of event. 401.It Fl v 402Increase verbosity. 403.It Fl w Ar secs 404Print the values of all counting mode PMCs or sampling mode PMCs 405for top mode every 406.Ar secs 407seconds. 408The argument 409.Ar secs 410may be a fractional value. 411The default interval is 5 seconds. 412.It Fl z Ar graphdepth 413When printing system-wide callgraphs, limit callgraphs to the depth 414specified by argument 415.Ar graphdepth . 416.El 417.Pp 418If 419.Ar command 420is specified, it is executed using 421.Xr execvp 3 . 422.Sh EXAMPLES 423To perform system-wide statistical sampling on an AMD Athlon CPU with 424samples taken every 32768 instruction retirals and data being sampled 425to file 426.Pa sample.stat , 427use: 428.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 429.Pp 430To execute 431.Nm firefox 432and measure the number of data cache misses suffered 433by it and its children every 12 seconds on an AMD Athlon, use: 434.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 435.Pp 436To measure instructions retired for all processes named 437.Dq emacs 438use: 439.Dl "pmcstat -t '^emacs$' -p instructions" 440.Pp 441To measure instructions retired for processes named 442.Dq emacs 443for a period of 10 seconds use: 444.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 445.Pp 446To count instruction tlb-misses on CPUs 0 and 2 on a Intel 447Pentium Pro/Pentium III SMP system use: 448.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 449.Pp 450To collect profiling information for a specific process with pid 1234 451based on instruction cache misses seen by it use: 452.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 453.Pp 454To perform system-wide sampling on all configured processors 455based on processor instructions retired use: 456.Dl "pmcstat -S instructions -O /tmp/sample.out" 457If callgraph capture is not desired use: 458.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 459.Pp 460To send the generated event log to a remote machine use: 461.Dl "pmcstat -S instructions -O remotehost:port" 462On the remote machine, the sample log can be collected using 463.Xr nc 1 : 464.Dl "nc -l remotehost port > /tmp/sample.out" 465.Pp 466To generate 467.Xr gprof 1 468compatible profiles from a sample file use: 469.Dl "pmcstat -R /tmp/sample.out -g" 470.Pp 471To print a system-wide profile with callgraphs to file 472.Pa "foo.graph" 473use: 474.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 475.Sh DIAGNOSTICS 476If option 477.Fl v 478is specified, 479.Nm 480may issue the following diagnostic messages: 481.Bl -diag 482.It "#callchain/dubious-frames" 483The number of callchain records that had an 484.Dq impossible 485value for a return address. 486.It "#exec handling errors" 487The number of 488.Xr exec 2 489events in the log file that named executables that could not be 490analyzed. 491.It "#exec/elf" 492The number of 493.Xr exec 2 494events that named ELF executables. 495.It "#exec/unknown" 496The number of 497.Xr exec 2 498events that named executables with unrecognized formats. 499.It "#samples/total" 500The total number of samples in the log file. 501.It "#samples/unclaimed" 502The number of samples that could not be correlated to a known 503executable object (i.e., to an executable, shared library, the 504kernel or the runtime loader). 505.It "#samples/unknown-object" 506The number of samples that were associated with an executable 507with an unrecognized object format. 508.El 509.Pp 510.Ex -std 511.Sh COMPATIBILITY 512Due to the limitations of the 513.Pa gmon.out 514file format, 515.Xr gprof 1 516compatible profiles generated by the 517.Fl g 518option do not contain information about calls that cross executable 519boundaries. 520The generated 521.Pa gmon.out 522files are also only meaningful for native executables. 523.Sh SEE ALSO 524.Xr gprof 1 , 525.Xr nc 1 , 526.Xr execvp 3 , 527.Xr pmc 3 , 528.Xr pmclog 3 , 529.Xr hwpmc 4 , 530.Xr pmccontrol 8 , 531.Xr sysctl 8 532.Sh HISTORY 533The 534.Nm 535utility first appeared in 536.Fx 6.0 . 537.Sh AUTHORS 538.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org 539.Sh BUGS 540The 541.Nm 542utility cannot yet analyse 543.Xr hwpmc 4 544logs generated by non-native architectures. 545