1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.Dd April 15, 2026 27.Dt PMCSTAT 8 28.Os 29.Sh NAME 30.Nm pmcstat 31.Nd "performance measurement with performance monitoring hardware" 32.Sh SYNOPSIS 33.Nm 34.Op Fl A 35.Op Fl C 36.Op Fl D Ar pathname 37.Op Fl E 38.Op Fl F Ar pathname 39.Op Fl G Ar pathname 40.Op Fl I 41.Op Fl L 42.Op Fl M Ar mapfilename 43.Op Fl N 44.Op Fl O Ar logfilename 45.Op Fl P Ar event-spec 46.Op Fl R Ar logfilename 47.Op Fl S Ar event-spec 48.Op Fl T 49.Op Fl U 50.Op Fl W 51.Op Fl a Ar pathname 52.Op Fl c Ar cpu-spec 53.Op Fl d 54.Op Fl e 55.Op Fl f Ar pluginopt 56.Op Fl g 57.Op Fl i Ar lwp 58.Op Fl l Ar secs 59.Op Fl m Ar pathname 60.Op Fl n Ar rate 61.Op Fl o Ar outputfile 62.Op Fl p Ar event-spec 63.Op Fl q 64.Op Fl r Ar fsroot 65.Op Fl s Ar event-spec 66.Op Fl t Ar process-spec 67.Op Fl u Ar event-spec 68.Op Fl v 69.Op Fl w Ar secs 70.Op Fl z Ar graphdepth 71.Op Ar command Op Ar args 72.Sh DESCRIPTION 73The 74.Nm 75utility measures system performance using the facilities provided by 76.Xr hwpmc 4 . 77.Pp 78The 79.Nm 80utility can measure both hardware events seen by the system as a 81whole, and those seen when a specified set of processes are executing 82on the system's CPUs. 83If a specific set of processes is being targeted (for example, 84if the 85.Fl t Ar process-spec 86option is specified, or if a command line is specified using 87.Ar command ) , 88then measurement occurs till 89.Ar command 90exits, or till all target processes specified by the 91.Fl t Ar process-spec 92options exit, or till the 93.Nm 94utility is interrupted by the user. 95If a specific set of processes is not targeted for measurement, then 96.Nm 97will perform system-wide measurements till interrupted by the 98user. 99.Pp 100A given invocation of 101.Nm 102can mix allocations of system-mode and process-mode PMCs, of both 103counting and sampling flavors. 104The values of all counting PMCs are printed in human readable form 105at regular intervals by 106.Nm . 107The format of 108.Nm Ns 's 109human-readable textual output is not stable, and could change 110in the future. 111The output of sampling PMCs may be configured to go to a log file for 112subsequent offline analysis, or, at the expense of greater 113overhead, may be configured to be printed in text form on the fly. 114.Pp 115Hardware events to measure are specified to 116.Nm 117using event specifier strings 118.Ar event-spec . 119The syntax of these event specifiers is machine dependent and is 120documented in 121.Xr pmc 3 . 122.Pp 123A process-mode PMC may be configured to be inheritable by the target 124process' current and future children. 125.Sh OPTIONS 126The following options are available: 127.Bl -tag -width indent 128.It Fl A 129Skip symbol lookup and display address instead. 130.It Fl C 131Toggle between showing cumulative or incremental counts for 132subsequent counting mode PMCs specified on the command line. 133The default is to show incremental counts. 134.It Fl D Ar pathname 135Create files with per-program samples in the directory named 136by 137.Ar pathname . 138The default is to create these files in the current directory. 139.It Fl E 140Toggle showing per-process counts at the time a tracked process 141exits for subsequent process-mode PMCs specified on the command line. 142This option is useful for mapping the performance characteristics of a 143complex pipeline of processes when used in conjunction with the 144.Fl d 145option. 146The default is to not to enable per-process tracking. 147.It Fl F Ar pathname 148Print calltree (Kcachegrind) information to file 149.Ar pathname . 150If argument 151.Ar pathname 152is a 153.Dq Li - 154this information is sent to the output file specified by the 155.Fl o 156option. 157.It Fl G Ar pathname 158Print callchain information to file 159.Ar pathname . 160If argument 161.Ar pathname 162is a 163.Dq Li - 164this information is sent to the output file specified by the 165.Fl o 166option. 167.It Fl I 168Show the offset of the instruction pointer into the symbol. 169.It Fl L 170List all event names. 171.It Fl M Ar mapfilename 172Write the mapping between executable objects encountered in the event 173log and the abbreviated pathnames used for 174.Xr gprof 1 175profiles to file 176.Ar mapfilename . 177If this option is not specified, mapping information is not written. 178Argument 179.Ar mapfilename 180may be a 181.Dq Li - 182in which case this mapping information is sent to the output 183file configured by the 184.Fl o 185option. 186.It Fl N 187Toggle capturing callchain information for subsequent sampling PMCs. 188The default is for sampling PMCs to capture callchain information. 189.It Fl O Ar logfilename 190Send logging output to file 191.Ar logfilename . 192If 193.Ar logfilename 194is of the form 195.Ar hostname Ns : Ns Ar port , 196where 197.Ar hostname 198does not start with a 199.Ql \&. 200or a 201.Ql / , 202then 203.Nm 204will open a network socket to host 205.Ar hostname 206on port 207.Ar port . 208.Pp 209If the 210.Fl O 211option is not specified and one of the logging options is requested, 212then 213.Nm 214will print a textual form of the logged events to the configured 215output file. 216.It Fl P Ar event-spec 217Allocate a process mode sampling PMC measuring hardware events 218specified in 219.Ar event-spec . 220.It Fl R Ar logfilename 221Perform offline analysis using sampling data in file 222.Ar logfilename . 223Each decoded record is printed as a single line with the following fields: 224a record type (e.g., 225.Dq callchain , 226.Dq initlog ) , 227type-specific data, and a trailing 20-digit raw TSC value recording the 228CPU cycle at which the event occurred. 229The 230.Dq initlog 231record additionally prints 232.Dq Li tsc_freq=<hz> , 233the TSC tick rate in Hz measured by the kernel at boot. 234To convert a TSC delta to nanoseconds: 235.Pp 236.Dl elapsed_ns = (tsc_end - tsc_start) * 1e9 / tsc_freq 237.Pp 238TSC-based timestamps and 239.Dq Li tsc_freq 240are only meaningful on x86 architectures 241.Pq amd64 and i386 . 242On all other architectures 243.Pq including arm64 and powerpc 244the 245.Dq Li tsc_freq 246field will be zero. 247.It Fl S Ar event-spec 248Allocate a system mode sampling PMC measuring hardware events 249specified in 250.Ar event-spec . 251.It Fl T 252Use a 253.Xr top 1 Ns -like 254mode for sampling PMCs. 255The following hotkeys can be used: 256.Pp 257.Bl -tag -compact -width "Ctrl+a" -offset 4n 258.It Ic A 259Toggle symbol resolution 260.Sm off 261.It Ic Ctrl + a 262.Sm on 263Switch to accumulative mode 264.Sm off 265.It Ic Ctrl + d 266.Sm on 267Switch to delta mode 268.It Ic f 269Represent the 270.Dq f 271cost under 272threshold as a dot (calltree only) 273.It Ic I 274Toggle showing offsets into symbols 275.It Ic m 276Merge PMCs 277.It Ic n 278Change view 279.It Ic p 280Show next PMC 281.It Ic q 282Quit 283.It Ic Space 284Pause 285.El 286.It Fl U 287Toggle capturing user-space call traces while in kernel mode. 288The default is for sampling PMCs to capture user-space callchain information 289while in user-space mode, and kernel callchain information while in kernel mode. 290.It Fl W 291Toggle logging the incremental counts seen by the threads of a 292tracked process each time they are scheduled on a CPU. 293This is an experimental feature intended to help analyse the 294dynamic behaviour of processes in the system. 295It may incur substantial overhead if enabled. 296The default is for this feature to be disabled. 297.It Fl a Ar pathname 298Perform a symbol and file:line lookup for each address in each 299callgraph and save the output to 300.Ar pathname . 301Unlike 302.Fl m 303that only resolves the first symbol in the graph, this resolves 304every node in the callgraph, or prints out addresses if no 305lookup information is available. 306This option requires the 307.Fl R 308option to read in samples that were previously collected and 309saved with the 310.Fl O 311option. 312.It Fl c Ar cpu-spec 313Set the cpus for subsequent system mode PMCs specified on the 314command line to 315.Ar cpu-spec . 316Argument 317.Ar cpu-spec 318is a comma separated list of CPU numbers, or the literal 319.Sq * 320denoting all available CPUs. 321The default is to allocate system mode PMCs on all available 322CPUs. 323.It Fl d 324Toggle between process mode PMCs measuring events for the target 325process' current and future children or only measuring events for 326the target process. 327The default is to measure events for the target process alone. 328(it has to be passed in the command line prior to 329.Fl p , 330.Fl s , 331.Fl P , 332or 333.Fl S ) . 334.It Fl e 335Specify that the gprof profile files will use a wide history counter. 336These files are produced in a format compatible with 337.Xr gprof 1 . 338However, other tools that cannot fully parse a BSD-style 339gmon header might be unable to correctly parse these files. 340.It Fl f Ar pluginopt 341Pass option string to the active plugin. 342.br 343threshold=<float> do not display cost under specified value (Top). 344.br 345skiplink=0|1 replace node with cost under threshold by a dot (Top). 346.It Fl g 347Produce profiles in a format compatible with 348.Xr gprof 1 . 349A separate profile file is generated for each executable object 350encountered. 351Profile files are placed in sub-directories named by their PMC 352event name. 353.It Fl i Ar lwp 354Filter on thread ID 355.Ar lwp , 356which you can get from 357.Xr ps 1 358.Fl o 359.Li lwp . 360.It Fl l Ar secs 361Set system-wide performance measurement duration for 362.Ar secs 363seconds. 364The argument 365.Ar secs 366may be a fractional value. 367.It Fl m Ar pathname 368Print the sampled PCs with the name, the start and ending addresses 369of the function within they live. 370The 371.Ar pathname 372argument is mandatory and indicates where the information will be stored. 373If argument 374.Ar pathname 375is a 376.Dq Li - 377this information is sent to the output file specified by the 378.Fl o 379option. 380This option requires the 381.Fl R 382option to read in samples that were previously collected and 383saved with the 384.Fl O 385option. 386.It Fl n Ar rate 387Set the default sampling rate for subsequent sampling mode 388PMCs specified on the command line. 389The default is to configure PMCs to sample the CPU's instruction 390pointer every 65536 events. 391.It Fl o Ar outputfile 392Send counter readings and textual representations of logged data 393to file 394.Ar outputfile . 395The default is to send output to 396.Pa stderr 397when collecting live data and to 398.Pa stdout 399when processing a pre-existing logfile. 400.It Fl p Ar event-spec 401Allocate a process mode counting PMC measuring hardware events 402specified in 403.Ar event-spec . 404.It Fl q 405Decrease verbosity. 406.It Fl r Ar fsroot 407Set the top of the filesystem hierarchy under which executables 408are located to argument 409.Ar fsroot . 410The default is 411.Pa / . 412.It Fl s Ar event-spec 413Allocate a system mode counting PMC measuring hardware events 414specified in 415.Ar event-spec . 416.It Fl t Ar process-spec 417Attach process mode PMCs to the processes named by argument 418.Ar process-spec . 419Argument 420.Ar process-spec 421may be a non-negative integer denoting a specific process id, or a 422regular expression for selecting processes based on their command names. 423.It Fl u Ar event-spec 424Provide short description of event. 425.It Fl v 426Increase verbosity. 427.It Fl w Ar secs 428Print the values of all counting mode PMCs or sampling mode PMCs 429for top mode every 430.Ar secs 431seconds. 432The argument 433.Ar secs 434may be a fractional value. 435The default interval is 5 seconds. 436.It Fl z Ar graphdepth 437When printing system-wide callgraphs, limit callgraphs to the depth 438specified by argument 439.Ar graphdepth . 440.El 441.Pp 442If 443.Ar command 444is specified, it is executed using 445.Xr execvp 3 . 446.Sh EXAMPLES 447To perform system-wide statistical sampling on an AMD Athlon CPU with 448samples taken every 32768 instruction retirals and data being sampled 449to file 450.Pa sample.stat , 451use: 452.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 453.Pp 454To execute 455.Nm firefox 456and measure the number of data cache misses suffered 457by it and its children every 12 seconds on an AMD Athlon, use: 458.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 459.Pp 460To measure instructions retired for all processes named 461.Dq emacs 462use: 463.Dl "pmcstat -t '^emacs$' -p instructions" 464.Pp 465To measure instructions retired for processes named 466.Dq emacs 467for a period of 10 seconds use: 468.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 469.Pp 470To count instruction tlb-misses on CPUs 0 and 2 on a Intel 471Pentium Pro/Pentium III SMP system use: 472.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 473.Pp 474To collect profiling information for a specific process with pid 1234 475based on instruction cache misses seen by it use: 476.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 477.Pp 478To perform system-wide sampling on all configured processors 479based on processor instructions retired use: 480.Dl "pmcstat -S instructions -O /tmp/sample.out" 481If callgraph capture is not desired use: 482.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 483.Pp 484To send the generated event log to a remote machine use: 485.Dl "pmcstat -S instructions -O remotehost:port" 486On the remote machine, the sample log can be collected using 487.Xr nc 1 : 488.Dl "nc -l remotehost port > /tmp/sample.out" 489.Pp 490To generate 491.Xr gprof 1 492compatible profiles from a sample file use: 493.Dl "pmcstat -R /tmp/sample.out -g" 494.Pp 495To print a system-wide profile with callgraphs to file 496.Pa "foo.graph" 497use: 498.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 499.Sh DIAGNOSTICS 500If option 501.Fl v 502is specified, 503.Nm 504may issue the following diagnostic messages: 505.Bl -diag 506.It "#callchain/dubious-frames" 507The number of callchain records that had an 508.Dq impossible 509value for a return address. 510.It "#exec handling errors" 511The number of 512.Xr execve 2 513events in the log file that named executables that could not be 514analyzed. 515.It "#exec/elf" 516The number of 517.Xr execve 2 518events that named ELF executables. 519.It "#exec/unknown" 520The number of 521.Xr execve 2 522events that named executables with unrecognized formats. 523.It "#samples/total" 524The total number of samples in the log file. 525.It "#samples/unclaimed" 526The number of samples that could not be correlated to a known 527executable object (i.e., to an executable, shared library, the 528kernel or the runtime loader). 529.It "#samples/unknown-object" 530The number of samples that were associated with an executable 531with an unrecognized object format. 532.El 533.Pp 534.Ex -std 535.Sh COMPATIBILITY 536Due to the limitations of the 537.Pa gmon.out 538file format, 539.Xr gprof 1 540compatible profiles generated by the 541.Fl g 542option do not contain information about calls that cross executable 543boundaries. 544The generated 545.Pa gmon.out 546files are also only meaningful for native executables. 547.Sh SEE ALSO 548.Xr gprof 1 , 549.Xr nc 1 , 550.Xr execvp 3 , 551.Xr pmc 3 , 552.Xr pmclog 3 , 553.Xr hwpmc 4 , 554.Xr pmccontrol 8 , 555.Xr sysctl 8 556.Sh HISTORY 557The 558.Nm 559utility first appeared in 560.Fx 6.0 . 561.Sh AUTHORS 562.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org 563.Sh BUGS 564The 565.Nm 566utility cannot yet analyse 567.Xr hwpmc 4 568logs generated by non-native architectures. 569