1.\" Copyright (c) 2003-2008 Joseph Koshy 2.\" Copyright (c) 2007 The FreeBSD Foundation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" This software is provided by Joseph Koshy ``as is'' and 15.\" any express or implied warranties, including, but not limited to, the 16.\" implied warranties of merchantability and fitness for a particular purpose 17.\" are disclaimed. in no event shall Joseph Koshy be liable 18.\" for any direct, indirect, incidental, special, exemplary, or consequential 19.\" damages (including, but not limited to, procurement of substitute goods 20.\" or services; loss of use, data, or profits; or business interruption) 21.\" however caused and on any theory of liability, whether in contract, strict 22.\" liability, or tort (including negligence or otherwise) arising in any way 23.\" out of the use of this software, even if advised of the possibility of 24.\" such damage. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd March 24, 2023 29.Dt PMCSTAT 8 30.Os 31.Sh NAME 32.Nm pmcstat 33.Nd "performance measurement with performance monitoring hardware" 34.Sh SYNOPSIS 35.Nm 36.Op Fl A 37.Op Fl C 38.Op Fl D Ar pathname 39.Op Fl E 40.Op Fl F Ar pathname 41.Op Fl G Ar pathname 42.Op Fl I 43.Op Fl L 44.Op Fl M Ar mapfilename 45.Op Fl N 46.Op Fl O Ar logfilename 47.Op Fl P Ar event-spec 48.Op Fl R Ar logfilename 49.Op Fl S Ar event-spec 50.Op Fl T 51.Op Fl U 52.Op Fl W 53.Op Fl a Ar pathname 54.Op Fl c Ar cpu-spec 55.Op Fl d 56.Op Fl e 57.Op Fl f Ar pluginopt 58.Op Fl g 59.Op Fl i Ar lwp 60.Op Fl k Ar kerneldir 61.Op Fl l Ar secs 62.Op Fl m Ar pathname 63.Op Fl n Ar rate 64.Op Fl o Ar outputfile 65.Op Fl p Ar event-spec 66.Op Fl q 67.Op Fl r Ar fsroot 68.Op Fl s Ar event-spec 69.Op Fl t Ar process-spec 70.Op Fl u Ar event-spec 71.Op Fl v 72.Op Fl w Ar secs 73.Op Fl z Ar graphdepth 74.Op Ar command Op Ar args 75.Sh DESCRIPTION 76The 77.Nm 78utility measures system performance using the facilities provided by 79.Xr hwpmc 4 . 80.Pp 81The 82.Nm 83utility can measure both hardware events seen by the system as a 84whole, and those seen when a specified set of processes are executing 85on the system's CPUs. 86If a specific set of processes is being targeted (for example, 87if the 88.Fl t Ar process-spec 89option is specified, or if a command line is specified using 90.Ar command ) , 91then measurement occurs till 92.Ar command 93exits, or till all target processes specified by the 94.Fl t Ar process-spec 95options exit, or till the 96.Nm 97utility is interrupted by the user. 98If a specific set of processes is not targeted for measurement, then 99.Nm 100will perform system-wide measurements till interrupted by the 101user. 102.Pp 103A given invocation of 104.Nm 105can mix allocations of system-mode and process-mode PMCs, of both 106counting and sampling flavors. 107The values of all counting PMCs are printed in human readable form 108at regular intervals by 109.Nm . 110The format of 111.Nm Ns 's 112human-readable textual output is not stable, and could change 113in the future. 114The output of sampling PMCs may be configured to go to a log file for 115subsequent offline analysis, or, at the expense of greater 116overhead, may be configured to be printed in text form on the fly. 117.Pp 118Hardware events to measure are specified to 119.Nm 120using event specifier strings 121.Ar event-spec . 122The syntax of these event specifiers is machine dependent and is 123documented in 124.Xr pmc 3 . 125.Pp 126A process-mode PMC may be configured to be inheritable by the target 127process' current and future children. 128.Sh OPTIONS 129The following options are available: 130.Bl -tag -width indent 131.It Fl A 132Skip symbol lookup and display address instead. 133.It Fl C 134Toggle between showing cumulative or incremental counts for 135subsequent counting mode PMCs specified on the command line. 136The default is to show incremental counts. 137.It Fl D Ar pathname 138Create files with per-program samples in the directory named 139by 140.Ar pathname . 141The default is to create these files in the current directory. 142.It Fl E 143Toggle showing per-process counts at the time a tracked process 144exits for subsequent process-mode PMCs specified on the command line. 145This option is useful for mapping the performance characteristics of a 146complex pipeline of processes when used in conjunction with the 147.Fl d 148option. 149The default is to not to enable per-process tracking. 150.It Fl F Ar pathname 151Print calltree (Kcachegrind) information to file 152.Ar pathname . 153If argument 154.Ar pathname 155is a 156.Dq Li - 157this information is sent to the output file specified by the 158.Fl o 159option. 160.It Fl G Ar pathname 161Print callchain information to file 162.Ar pathname . 163If argument 164.Ar pathname 165is a 166.Dq Li - 167this information is sent to the output file specified by the 168.Fl o 169option. 170.It Fl I 171Show the offset of the instruction pointer into the symbol. 172.It Fl L 173List all event names. 174.It Fl M Ar mapfilename 175Write the mapping between executable objects encountered in the event 176log and the abbreviated pathnames used for 177.Xr gprof 1 178profiles to file 179.Ar mapfilename . 180If this option is not specified, mapping information is not written. 181Argument 182.Ar mapfilename 183may be a 184.Dq Li - 185in which case this mapping information is sent to the output 186file configured by the 187.Fl o 188option. 189.It Fl N 190Toggle capturing callchain information for subsequent sampling PMCs. 191The default is for sampling PMCs to capture callchain information. 192.It Fl O Ar logfilename 193Send logging output to file 194.Ar logfilename . 195If 196.Ar logfilename 197is of the form 198.Ar hostname Ns : Ns Ar port , 199where 200.Ar hostname 201does not start with a 202.Ql \&. 203or a 204.Ql / , 205then 206.Nm 207will open a network socket to host 208.Ar hostname 209on port 210.Ar port . 211.Pp 212If the 213.Fl O 214option is not specified and one of the logging options is requested, 215then 216.Nm 217will print a textual form of the logged events to the configured 218output file. 219.It Fl P Ar event-spec 220Allocate a process mode sampling PMC measuring hardware events 221specified in 222.Ar event-spec . 223.It Fl R Ar logfilename 224Perform offline analysis using sampling data in file 225.Ar logfilename . 226.It Fl S Ar event-spec 227Allocate a system mode sampling PMC measuring hardware events 228specified in 229.Ar event-spec . 230.It Fl T 231Use a 232.Xr top 1 Ns -like 233mode for sampling PMCs. The following hotkeys 234can be used: 235.Pp 236.Bl -tag -compact -width "Ctrl+a" -offset 4n 237.It Ic A 238Toggle symbol resolution 239.Sm off 240.It Ic Ctrl + a 241.Sm on 242Switch to accumulative mode 243.Sm off 244.It Ic Ctrl + d 245.Sm on 246Switch to delta mode 247.It Ic f 248Represent the 249.Dq f 250cost under 251threshold as a dot (calltree only) 252.It Ic I 253Toggle showing offsets into symbols 254.It Ic m 255Merge PMCs 256.It Ic n 257Change view 258.It Ic p 259Show next PMC 260.It Ic q 261Quit 262.It Ic Space 263Pause 264.El 265.It Fl U 266Toggle capturing user-space call traces while in kernel mode. 267The default is for sampling PMCs to capture user-space callchain information 268while in user-space mode, and kernel callchain information while in kernel mode. 269.It Fl W 270Toggle logging the incremental counts seen by the threads of a 271tracked process each time they are scheduled on a CPU. 272This is an experimental feature intended to help analyse the 273dynamic behaviour of processes in the system. 274It may incur substantial overhead if enabled. 275The default is for this feature to be disabled. 276.It Fl a Ar pathname 277Perform a symbol and file:line lookup for each address in each 278callgraph and save the output to 279.Ar pathname . 280Unlike 281.Fl m 282that only resolves the first symbol in the graph, this resolves 283every node in the callgraph, or prints out addresses if no 284lookup information is available. 285This option requires the 286.Fl R 287option to read in samples that were previously collected and 288saved with the 289.Fl O 290option. 291.It Fl c Ar cpu-spec 292Set the cpus for subsequent system mode PMCs specified on the 293command line to 294.Ar cpu-spec . 295Argument 296.Ar cpu-spec 297is a comma separated list of CPU numbers, or the literal 298.Sq * 299denoting all available CPUs. 300The default is to allocate system mode PMCs on all available 301CPUs. 302.It Fl d 303Toggle between process mode PMCs measuring events for the target 304process' current and future children or only measuring events for 305the target process. 306The default is to measure events for the target process alone. 307(it has to be passed in the command line prior to 308.Fl p , 309.Fl s , 310.Fl P , 311or 312.Fl S ) . 313.It Fl e 314Specify that the gprof profile files will use a wide history counter. 315These files are produced in a format compatible with 316.Xr gprof 1 . 317However, other tools that cannot fully parse a BSD-style 318gmon header might be unable to correctly parse these files. 319.It Fl f Ar pluginopt 320Pass option string to the active plugin. 321.br 322threshold=<float> do not display cost under specified value (Top). 323.br 324skiplink=0|1 replace node with cost under threshold by a dot (Top). 325.It Fl g 326Produce profiles in a format compatible with 327.Xr gprof 1 . 328A separate profile file is generated for each executable object 329encountered. 330Profile files are placed in sub-directories named by their PMC 331event name. 332.It Fl i Ar lwp 333Filter on thread ID 334.Ar lwp , 335which you can get from 336.Xr ps 1 337.Fl o 338.Li lwp . 339.It Fl k Ar kerneldir 340Set the pathname of the kernel directory to argument 341.Ar kerneldir . 342This directory specifies where 343.Nm 344should look for the kernel and its modules. 345The default is to use the path of the running kernel obtained from the 346.Va kern.bootfile 347sysctl. 348Modules will also be searched for in /boot/modules if not found in 349.Ar kerneldir . 350.It Fl l Ar secs 351Set system-wide performance measurement duration for 352.Ar secs 353seconds. 354The argument 355.Ar secs 356may be a fractional value. 357.It Fl m Ar pathname 358Print the sampled PCs with the name, the start and ending addresses 359of the function within they live. 360The 361.Ar pathname 362argument is mandatory and indicates where the information will be stored. 363If argument 364.Ar pathname 365is a 366.Dq Li - 367this information is sent to the output file specified by the 368.Fl o 369option. 370This option requires the 371.Fl R 372option to read in samples that were previously collected and 373saved with the 374.Fl O 375option. 376.It Fl n Ar rate 377Set the default sampling rate for subsequent sampling mode 378PMCs specified on the command line. 379The default is to configure PMCs to sample the CPU's instruction 380pointer every 65536 events. 381.It Fl o Ar outputfile 382Send counter readings and textual representations of logged data 383to file 384.Ar outputfile . 385The default is to send output to 386.Pa stderr 387when collecting live data and to 388.Pa stdout 389when processing a pre-existing logfile. 390.It Fl p Ar event-spec 391Allocate a process mode counting PMC measuring hardware events 392specified in 393.Ar event-spec . 394.It Fl q 395Decrease verbosity. 396.It Fl r Ar fsroot 397Set the top of the filesystem hierarchy under which executables 398are located to argument 399.Ar fsroot . 400The default is 401.Pa / . 402.It Fl s Ar event-spec 403Allocate a system mode counting PMC measuring hardware events 404specified in 405.Ar event-spec . 406.It Fl t Ar process-spec 407Attach process mode PMCs to the processes named by argument 408.Ar process-spec . 409Argument 410.Ar process-spec 411may be a non-negative integer denoting a specific process id, or a 412regular expression for selecting processes based on their command names. 413.It Fl u Ar event-spec 414Provide short description of event. 415.It Fl v 416Increase verbosity. 417.It Fl w Ar secs 418Print the values of all counting mode PMCs or sampling mode PMCs 419for top mode every 420.Ar secs 421seconds. 422The argument 423.Ar secs 424may be a fractional value. 425The default interval is 5 seconds. 426.It Fl z Ar graphdepth 427When printing system-wide callgraphs, limit callgraphs to the depth 428specified by argument 429.Ar graphdepth . 430.El 431.Pp 432If 433.Ar command 434is specified, it is executed using 435.Xr execvp 3 . 436.Sh EXAMPLES 437To perform system-wide statistical sampling on an AMD Athlon CPU with 438samples taken every 32768 instruction retirals and data being sampled 439to file 440.Pa sample.stat , 441use: 442.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions" 443.Pp 444To execute 445.Nm firefox 446and measure the number of data cache misses suffered 447by it and its children every 12 seconds on an AMD Athlon, use: 448.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox" 449.Pp 450To measure instructions retired for all processes named 451.Dq emacs 452use: 453.Dl "pmcstat -t '^emacs$' -p instructions" 454.Pp 455To measure instructions retired for processes named 456.Dq emacs 457for a period of 10 seconds use: 458.Dl "pmcstat -t '^emacs$' -p instructions sleep 10" 459.Pp 460To count instruction tlb-misses on CPUs 0 and 2 on a Intel 461Pentium Pro/Pentium III SMP system use: 462.Dl "pmcstat -c 0,2 -s p6-itlb-miss" 463.Pp 464To collect profiling information for a specific process with pid 1234 465based on instruction cache misses seen by it use: 466.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out" 467.Pp 468To perform system-wide sampling on all configured processors 469based on processor instructions retired use: 470.Dl "pmcstat -S instructions -O /tmp/sample.out" 471If callgraph capture is not desired use: 472.Dl "pmcstat -N -S instructions -O /tmp/sample.out" 473.Pp 474To send the generated event log to a remote machine use: 475.Dl "pmcstat -S instructions -O remotehost:port" 476On the remote machine, the sample log can be collected using 477.Xr nc 1 : 478.Dl "nc -l remotehost port > /tmp/sample.out" 479.Pp 480To generate 481.Xr gprof 1 482compatible profiles from a sample file use: 483.Dl "pmcstat -R /tmp/sample.out -g" 484.Pp 485To print a system-wide profile with callgraphs to file 486.Pa "foo.graph" 487use: 488.Dl "pmcstat -R /tmp/sample.out -G foo.graph" 489.Sh DIAGNOSTICS 490If option 491.Fl v 492is specified, 493.Nm 494may issue the following diagnostic messages: 495.Bl -diag 496.It "#callchain/dubious-frames" 497The number of callchain records that had an 498.Dq impossible 499value for a return address. 500.It "#exec handling errors" 501The number of 502.Xr exec 2 503events in the log file that named executables that could not be 504analyzed. 505.It "#exec/elf" 506The number of 507.Xr exec 2 508events that named ELF executables. 509.It "#exec/unknown" 510The number of 511.Xr exec 2 512events that named executables with unrecognized formats. 513.It "#samples/total" 514The total number of samples in the log file. 515.It "#samples/unclaimed" 516The number of samples that could not be correlated to a known 517executable object (i.e., to an executable, shared library, the 518kernel or the runtime loader). 519.It "#samples/unknown-object" 520The number of samples that were associated with an executable 521with an unrecognized object format. 522.El 523.Pp 524.Ex -std 525.Sh COMPATIBILITY 526Due to the limitations of the 527.Pa gmon.out 528file format, 529.Xr gprof 1 530compatible profiles generated by the 531.Fl g 532option do not contain information about calls that cross executable 533boundaries. 534The generated 535.Pa gmon.out 536files are also only meaningful for native executables. 537.Sh SEE ALSO 538.Xr gprof 1 , 539.Xr nc 1 , 540.Xr execvp 3 , 541.Xr pmc 3 , 542.Xr pmclog 3 , 543.Xr hwpmc 4 , 544.Xr pmccontrol 8 , 545.Xr sysctl 8 546.Sh HISTORY 547The 548.Nm 549utility first appeared in 550.Fx 6.0 . 551It is 552.Ud 553.Sh AUTHORS 554.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org 555.Sh BUGS 556The 557.Nm 558utility cannot yet analyse 559.Xr hwpmc 4 560logs generated by non-native architectures. 561