1.\" Copyright (c) 2003-2005 Joseph Koshy 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd Apr 15, 2005 28.Dt HWPMC 4 29.Os 30.Sh NAME 31.Nm hwpmc 32.Nd Hardware performance monitoring counter support 33.Sh SYNOPSIS 34.Cd options PMC_HOOKS 35.br 36.Cd device hwpmc 37.Sh DESCRIPTION 38The 39.Nm 40driver virtualizes the hardware performance monitoring facilities in 41modern CPUs and provides support for using these facilities from 42user level processes. 43.Pp 44The driver supports multi-processor systems. 45.Pp 46PMCs are allocated using the 47.Ic PMC_OP_PMCALLOCATE 48request. 49A successful 50.Ic PMC_OP_PMCALLOCATE 51request will return an integer handle (typically a small integer) to 52the requesting process. 53Subsequent operations on the allocated PMC use this handle to denote 54the specific PMC. 55A process that has successfully allocated a PMC is termed an 56.Dq "owner process" . 57.Pp 58PMCs may be allocated to operate in process-private or in system-wide 59modes. 60.Bl -hang -width "XXXXXXXXXXXXXXX" 61.It Em Process-private 62In process-private mode, a PMC is active only when a thread belonging 63to a process it is attached to is scheduled on a CPU. 64.It Em System-wide 65In system-wide mode a PMC operates independently of processes and 66measures hardware events for the system as a whole. 67.El 68.Pp 69The 70.Nm 71driver supports the use of hardware PMCs for counting or for 72sampling: 73.Bl -hang -width "XXXXXXXXX" 74.It Em Counting 75In counting modes, the PMCs count hardware events. 76These counts are retrievable using the 77.Ic PMC_OP_PMCREAD 78system call on all architectures, though some architectures like the 79x86 and amd64 offer faster methods of reading these counts. 80.It Em Sampling 81In sampling modes, where PMCs are configured to sample the CPU 82instruction pointer after a configurable number of hardware events 83have been observed. 84These instruction pointer samples are directed to a log file for 85subsequent analysis. 86.El 87.Pp 88These modes of operation are orthogonal; a PMC may be configured to 89operate in one of four modes: 90.Bl -tag -width indent 91.It Process-private, counting 92These PMCs count hardware events whenever a thread in their attached process is 93scheduled on a CPU. 94These PMCs normally count from zero, but the initial count may be 95set using the 96.Ic PMC_OP_SETCOUNT 97operation. 98Applications can read the value of the PMC anytime using the 99.Ic PMC_OP_PMCRW 100operation. 101.It Process-private, sampling 102These PMCs sample the target processes instruction pointer after they 103have seen the configured number of hardware events. 104The PMCs only count events when a thread belonging to their attached 105process is active. 106The desired frequency of sampling is set using the 107.Ic PMC_OP_SETCOUNT 108operation prior to starting the PMC. 109Log files are configured using the 110.Ic PMC_OP_CONFIGURELOG 111operation. 112.It System-wide, counting 113These PMCs count hardware events seen by them independent of the 114processes that are executing. 115The current count on these PMCs can be read using the 116.Ic PMC_OP_PMCRW 117request. 118These PMCs normally count from zero, but the initial count may be 119set using the 120.Ic PMC_OP_SETCOUNT 121operation. 122.It System-wide, sampling 123These PMCs will periodically sample the instruction pointer of the CPU 124they are allocated on, and will write the sample to a log for further 125processing. 126The desired frequency of sampling is set using the 127.Ic PMC_OP_SETCOUNT 128operation prior to starting the PMC. 129Log files are configured using the 130.Ic PMC_OP_CONFIGURELOG 131operation. 132.Pp 133System-wide statistical sampling can only be enabled by a process with 134super-user privileges. 135.El 136.Pp 137Processes are allowed to allocate as many PMCs are the hardware and 138current operating conditions permit. 139Processes may mix allocations of system-wide and process-private 140PMCs. 141Multiple processes are allowed to be concurrently using the facilities 142of the 143.Nm 144driver. 145.Pp 146Allocated PMCs are started using the 147.Ic PMC_OP_PMCSTART 148operation, and stopped using the 149.Ic PMC_OP_PMCSTOP 150operation. 151Stopping and starting a PMC is permitted at any time the owner process 152has a valid handle to the PMC. 153.Pp 154Process-private PMCs need to be attached to a target process before 155they can be used. 156Attaching a process to a PMC is done using the 157.Ic PMC_OP_PMCATTACH 158operation. 159An already attached PMC may be detached from its target process 160using the converse 161.Ic PMC_OP_PMCDETACH 162operation. 163Issuing an 164.Ic PMC_OP_PMCSTART 165operation on an as yet unattached PMC will cause it to be attached 166to its owner process. 167The following rules determine whether a given process may attach 168a PMC to another target process: 169.Bl -bullet -compact 170.It 171A non-jailed process with super-user privileges is allowed to attach 172to any other process in the system. 173.It 174Other processes are only allowed to attach to targets that they would 175be able to attach to for debugging (as determined by 176.Xr p_candebug 9 ) . 177.El 178.Pp 179PMCs are released using 180.Ic PMC_OP_PMCRELEASE . 181After a successful 182.Ic PMC_OP_PMCRELEASE 183operation the handle to the PMC will become invalid. 184.Ss MODIFIER FLAGS 185The 186.Ic PMC_OP_PMCALLOCATE 187operation supports the following flags that modify the behavior 188of an allocated PMC: 189.Bl -tag -width indent -compact 190.It Dv PMC_F_DESCENDANTS 191This modifier is valid only for a PMC being allocated in process-private 192mode. 193It signifies that the PMC will track hardware events for its 194target process and the target's current and future descendants. 195.It Dv PMC_F_KGMON 196This modifier is valid only for a PMC being allocated in system-wide 197sampling mode. 198It signifies that the PMC's sampling interrupt is to be used to drive 199kernel profiling via 200.Xr kgmon 8 . 201.It Dv PMC_F_LOG_PROCCSW 202This modifier is valid only for a PMC being allocated in process-private 203mode. 204When this modifier is present, at every context switch, 205.Nm 206will log a record containing the number of hardware events 207seen by the target process when it was scheduled on the CPU. 208.It Dv PMC_F_LOG_PROCEXIT 209This modifier is valid only for a PMC being allocated in process-private 210mode. 211With this modifier present, 212.Nm 213will maintain per-process counts for each target process attached to 214a PMC. 215At process exit time, a record containing the target process' pid and 216the accumulated per-process count for that process will be written to the 217configured log file. 218.El 219Modifiers 220.Dv PMC_F_LOG_PROCEXIT 221and 222.Dv PMC_F_LOG_PROCCSW 223may be used in combination with modifier 224.Dv PMC_F_DESCENDANTS 225to track the behaviour of complex pipelines of processes. 226PMCs with modifiers 227.Dv PMC_F_LOG_PROCEXIT 228and 229.Dv PMC_F_LOG_PROCCSW 230cannot be started until their owner process has configured a log file. 231.Ss SIGNALS 232The 233.Nm 234driver may deliver signals to processes that have allocated PMCs: 235.Bl -tag -width "XXXXXXXX" -compact 236.It Bq SIGIO 237A 238.Ic PMC_OP_PMCRW 239operation was attempted on a process-private PMC that does not have 240attached target processes. 241.It Bq SIGBUS 242The 243.Nm 244driver is being unloaded from the kernel. 245.El 246.Sh PROGRAMMING API 247The recommended way for application programs to use the facilities of 248the 249.Nm 250driver is using the API provided by the library 251.Xr pmc 3 . 252.Pp 253The 254.Nm 255driver operates using a system call number that is dynamically 256allotted to it when it is loaded into the kernel. 257.Pp 258The 259.Nm 260driver supports the following operations: 261.Bl -tag -width indent 262.It Ic PMC_OP_CONFIGURELOG 263Configure a log file for sampling mode PMCs. 264.It Ic PMC_OP_FLUSHLOG 265Transfer buffered log data inside 266.Nm 267to a configured output file. 268This operation returns to the caller after the write operation 269has returned. 270.It Ic PMC_OP_GETCPUINFO 271Retrieve information about the number of CPUs on the system and 272the number of hardware performance monitoring counters available per-CPU. 273.It Ic PMC_OP_GETDRIVERSTATS 274Retrieve module statistics (for analyzing the behavior of 275.Nm 276itself). 277.It Ic PMC_OP_GETMODULEVERSION 278Retrieve the version number of API. 279.It Ic PMC_OP_GETPMCINFO 280Retrieve information about the current state of the PMCs on a 281given CPU. 282.It Ic PMC_OP_PMCADMIN 283Set the administrative state (i.e., whether enabled or disabled) for 284the hardware PMCs managed by the 285.Nm 286driver. 287.It Ic PMC_OP_PMCALLOCATE 288Allocate and configure a PMC. 289On successful allocation, a handle to the PMC (a small integer) 290is returned. 291.It Ic PMC_OP_PMCATTACH 292Attach a process mode PMC to a target process. 293The PMC will be active whenever a thread in the target process is 294scheduled on a CPU. 295.Pp 296If the 297.Dv PMC_F_DESCENDANTS 298flag had been specified at PMC allocation time, then the PMC is 299attached to all current and future descendants of the target process. 300.It Ic PMC_OP_PMCDETACH 301Detach a PMC from its target process. 302.It Ic PMC_OP_PMCRELEASE 303Release a PMC. 304.It Ic PMC_OP_PMCRW 305Read and write a PMC. 306This operation is valid only for PMCs configured in counting modes. 307.It Ic PMC_OP_SETCOUNT 308Set the initial count (for counting mode PMCs) or the desired sampling 309rate (for sampling mode PMCs). 310.It Ic PMC_OP_PMCSTART 311Start a PMC. 312.It Ic PMC_OP_PMCSTOP 313Stop a PMC. 314.It Ic PMC_OP_WRITELOG 315Insert a timestamped user record into the log file. 316.El 317.Ss i386 SPECIFIC API 318Some i386 family CPUs support the RDPMC instruction which allows a 319user process to read a PMC value without needing to invoke a 320.Ic PMC_OP_PMCRW 321operation. 322On such CPUs, the machine address associated with an allocated PMC is 323retrievable using the 324.Ic PMC_OP_PMCX86GETMSR 325system call. 326.Bl -tag -width indent 327.It Ic PMC_OP_PMCX86GETMSR 328Retrieve the MSR (machine specific register) number associated with 329the given PMC handle. 330.Pp 331The PMC needs to be in process-private mode and allocated without the 332.Va PMC_F_DESCENDANTS 333modifier flag, and should be attached only to its owner process at the 334time of the call. 335.El 336.Ss amd64 SPECIFIC API 337AMD64 cpus support the RDPMC instruction which allows a 338user process to read a PMC value without needing to invoke a 339.Ic PMC_OP_PMCRW 340operation. 341The machine address associated with an allocated PMC is 342retrievable using the 343.Ic PMC_OP_PMCX86GETMSR 344system call. 345.Bl -tag -width indent 346.It Ic PMC_OP_PMCX86GETMSR 347Retrieve the MSR (machine specific register) number associated with 348the given PMC handle. 349.Pp 350The PMC needs to be in process-private mode and allocated without the 351.Va PMC_F_DESCENDANTS 352modifier flag, and should be attached only to its owner process at the 353time of the call. 354.El 355.Sh SYSCTL TUNABLES 356The behavior of 357.Nm 358is influenced by the following 359.Xr sysctl 8 360and 361.Xr loader 8 362tunables: 363.Bl -tag -width indent 364.It Va kern.hwpmc.debugflags Pq string, read-write 365(Only available if the 366.Nm 367driver was compiled with 368.Fl DDEBUG ) . 369Control the verbosity of debug messages from the 370.Nm 371driver. 372.It Va kern.hwpmc.hashsize Pq integer, read-only 373The number of rows in the hash-tables used to keep track of owner and 374target processes. 375The default is 16. 376.It Va kern.hwpmc.logbuffersize Pq integer, read-only 377The size in kilobytes of each log buffer used by 378.Nm Ap s 379logging function. 380The default buffers size is 4KB. 381.It Va kern.hwpmc.mtxpoolsize Pq integer, read-only 382The size of the spin mutex pool used by the PMC driver. 383The default is 32. 384.It Va kern.hwpmc.nbuffers Pq integer, read-only 385The number of log buffers used by 386.Nm 387for logging. 388The default is 16. 389.It Va kern.hwpmc.nsamples Pq integer, read-only 390The number of entries in the per-cpu ring buffer used during sampling. 391The default is 16. 392.It Va security.bsd.unprivileged_syspmcs Pq boolean, read-write 393If set to non-zero, allow unprivileged processes to allocate system-wide 394PMCs. 395The default value is 0. 396.It Va security.bsd.unprivileged_proc_debug Pq boolean, read-write 397If set to 0, the 398.Nm 399driver will only allow privileged processes to attach PMCs to other 400processes. 401.El 402.Pp 403These variables may be set in the kernel environment using 404.Xr kenv 1 405before 406.Nm 407is loaded. 408.Sh SECURITY CONSIDERATIONS 409PMCs may be used to monitor the actual behaviour of the system on hardware. 410In situations where this constitutes an undesirable information leak, 411the following options are available: 412.Bl -enum 413.It 414Set the 415.Xr sysctl 8 416tunable 417.Va "security.bsd.unprivileged_syspmcs" 418to 0. 419This ensures that unprivileged processes cannot allocate system-wide 420PMCs and thus cannot observe the hardware behavior of the system 421as a whole. 422This tunable may also be set at boot time using 423.Xr loader 8 , 424or with 425.Xr kenv 1 426prior to loading the 427.Nm 428driver into the kernel. 429.It 430Set the 431.Xr sysctl 8 432tunable 433.Va "security.bsd.unprivileged_proc_debug" 434to 0. 435This will ensure that an unprivileged process cannot attach a PMC 436to any process other than itself and thus cannot observe the hardware 437behavior of other processes with the same credentials. 438.El 439.Pp 440System administrators should note that on IA-32 platforms 441.Fx 442makes the content of the IA-32 TSC counter available to all processes 443via the RDTSC instruction. 444.Sh IMPLEMENTATION NOTES 445.Ss SMP Symmetry 446The kernel driver requires all physical CPUs in an SMP system to have 447identical performance monitoring counter hardware. 448.Ss i386 TSC Handling 449Historically, on the x86 architecture, 450.Fx 451has permitted user processes running at a processor CPL of 3 to 452read the TSC using the RDTSC instruction. 453The 454.Nm 455driver preserves this semantic. 456.Ss Intel P4/HTT Handling 457On CPUs with HTT support, Intel P4 PMCs are capable of qualifying 458only a subset of hardware events on a per-logical CPU basis. 459Consequently, if HTT is enabled on a system with Intel Pentium P4 460PMCs, then the 461.Nm 462driver will reject allocation requests for process-private PMCs that 463request counting of hardware events that cannot be counted separately 464for each logical CPU. 465.Ss Intel Pentium-Pro Handling 466Writing a value to the PMC MSRs found ing Intel Pentium-Pro style PMCs 467(found in 468.Tn "Intel Pentium Pro" , 469.Tn "Pentium II" , 470.Tn "Pentium III" , 471.Tn "Pentium M" 472and 473.Tn "Celeron" 474processors) will replicate bit 31 of the 475value being written into the upper 8 bits of the MSR, 476bringing down the usable width of these PMCs to 31 bits. 477For process-virtual PMCs, the 478.Nm 479driver implements a workaround in software and makes the corrected 64 480bit count available via the 481.Ic PMC_OP_RW 482operation. 483Processes that intend to use RDPMC instructions directly or 484that intend to write values larger than 2^31 into these PMCs with 485.Ic PMC_OP_RW 486need to be aware of this hardware limitation. 487.Sh DIAGNOSTICS 488.Bl -diag 489.It hwpmc: tunable hashsize=%d must be greater than zero. 490A negative value was supplied for tunable 491.Va kern.hwpmc.hashsize . 492.It hwpmc: tunable logbuffersize=%d must be greater than zero. 493A negative value was supplied for tunable 494.Va kern.hwpmc.logbuffersize . 495.It hwpmc: tunable nlogbuffers=%d must be greater than zero. 496A negative value was supplied for tunable 497.Va kern.hwpmc.nlogbuffers . 498.It hwpmc: tunable nsamples=%d out of range. 499The value for tunable 500.Va kern.hwpmc.nsamples 501was negative or greater than 65535. 502.El 503.Sh ERRORS 504An command issued to the 505.Nm 506driver may fail with the following errors: 507.Bl -tag -width Er 508.It Bq Er EBUSY 509An 510.Ic OP_CONFIGURELOG 511operation was requested while an existing log was active. 512.It Bq Er EBUSY 513A 514.Ic DISABLE 515operation was requested using the 516.Ic PMC_OP_PMCADMIN 517request for a set of hardware resources currently in use for 518process-private PMCs. 519.It Bq Er EBUSY 520A 521.Ic PMC_OP_PMCADMIN 522operation was requested on an active system mode PMC. 523.It Bq Er EBUSY 524A 525.Ic PMC_OP_PMCATTACH 526operation was requested for a target process that already had another 527PMC using the same hardware resources attached to it. 528.It Bq Er EBUSY 529An 530.Ic PMC_OP_PMCRW 531request writing a new value was issued on a PMC that was active. 532.It Bq Er EDOOFUS 533A 534.Ic PMC_OP_PMCSTART 535operation was requested without a log file being configured for a 536PMC allocated with 537.Dv PMC_F_LOG_PROCCSW 538and 539.Dv PMC_F_LOG_PROCEXIT 540modifiers. 541.It Bq Er EBUSY 542An 543.Ic PMC_OP_PMCSETCOUNT 544request was issued on a PMC that was active. 545.It Bq Er EEXIST 546A 547.Ic PMC_OP_PMCATTACH 548request was reissued for a target process that already is the target 549of this PMC. 550.It Bq Er EFAULT 551A bad address was passed in to the driver. 552.It Bq Er EINVAL 553A process specified an invalid PMC handle. 554.It Bq Er EINVAL 555An invalid CPU number was passed in for an 556.Ic PMC_OP_GETPMCINFO 557operation. 558.It Bq Er EINVAL 559An invalid CPU number was passed in for an 560.Ic PMC_OP_PMCADMIN 561operation. 562.It Bq Er EINVAL 563An invalid operation request was passed in for an 564.Ic PMC_OP_PMCADMIN 565operation. 566.It Bq Er EINVAL 567An invalid PMC id was passed in for an 568.Ic PMC_OP_PMCADMIN 569operation. 570.It Bq Er EINVAL 571A suitable PMC matching the parameters passed in to a 572.Ic PMC_OP_PMCALLOCATE 573request could not be allocated. 574.It Bq Er EINVAL 575An invalid PMC mode was requested during a 576.Ic PMC_OP_PMCALLOCATE 577request. 578.It Bq Er EINVAL 579An invalid CPU number was specified during a 580.Ic PMC_OP_PMCALLOCATE 581request. 582.It Bq Er EINVAL 583A cpu other than 584.Li PMC_CPU_ANY 585was specified in a 586.Ic PMC_OP_ALLOCATE 587request for a process-private PMC. 588.It Bq Er EINVAL 589A cpu number of 590.Li PMC_CPU_ANY 591was specified in a 592.Ic PMC_OP_ALLOCATE 593request for a system-wide PMC. 594.It Bq Er EINVAL 595The 596.Ar pm_flags 597argument to an 598.Ic PMC_OP_PMCALLOCATE 599request contained unknown flags. 600.It Bq Er EINVAL 601A PMC allocated for system-wide operation was specified with a 602.Ic PMC_OP_PMCATTACH 603request. 604.It Bq Er EINVAL 605The 606.Ar pm_pid 607argument to a 608.Ic PMC_OP_PMCATTACH 609request specified an illegal process id. 610.It Bq Er EINVAL 611A 612.Ic PMC_OP_PMCDETACH 613request was issued for a PMC not attached to the target process. 614.It Bq Er EINVAL 615Argument 616.Ar pm_flags 617to a 618.Ic PMC_OP_PMCRW 619request contained illegal flags. 620.It Bq Er EINVAL 621A 622.Ic PMC_OP_PMCX86GETMSR 623operation was requested for a PMC not in process-virtual mode, or 624for a PMC that is not solely attached to its owner process, or for 625a PMC that was allocated with flag 626.Va PMC_F_DESCENDANTS . 627.It Bq Er EINVAL 628(On Intel Pentium 4 CPUs with HTT support) An allocation request for 629a process-private PMC was issued for an event that does not support 630counting on a per-logical CPU basis. 631.It Bq Er ENOMEM 632The system was not able to allocate kernel memory. 633.It Bq Er ENOSYS 634(i386 architectures) A 635.Ic PMC_OP_PMCX86GETMSR 636operation was requested for hardware that does not support reading 637PMCs directly with the RDPMC instruction. 638.It Bq Er ENXIO 639An 640.Ic OP_GETPMCINFO 641operation was requested for a disabled CPU. 642.It Bq Er ENXIO 643A system-wide PMC on a disabled CPU was requested to be allocated with 644.Ic PMC_OP_PMCALLOCATE . 645.It Bq Er ENXIO 646A 647.Ic PMC_OP_PMCSTART 648or 649.Ic PMC_OP_PMCSTOP 650request was issued for a system-wide PMC that was allocated on a 651currently disabled CPU. 652.It Bq Er EPERM 653An 654.Ic OP_PMCADMIN 655request was issued by a process without super-user 656privilege or by a jailed super-user process. 657.It Bq Er EPERM 658An 659.Ic PMC_OP_PMCATTACH 660operation was issued for a target process that the current process 661does not have permission to attach to. 662.It Bq Er EPERM 663.Pq "i386 and amd64 architectures" 664An 665.Ic PMC_OP_PMCATTACH 666operation was issued on a PMC whose MSR has been retrieved using 667.Ic PMC_OP_PMCX86GETMSR . 668.It Bq Er ESRCH 669A process issued a PMC operation request without having allocated any 670PMCs. 671.It Bq Er ESRCH 672A process issued a PMC operation request after the PMC was detached 673from all of its target processes. 674.It Bq Er ESRCH 675A 676.Ic PMC_OP_PMCATTACH 677request specified a non-existent process id. 678.It Bq Er ESRCH 679The target process for a 680.Ic PMC_OP_PMCDETACH 681operation is not being monitored by the 682.Nm 683driver. 684.El 685.Sh BUGS 686The driver samples the state of the kernel's logical processor support 687at the time of initialization (i.e., at module load time). 688On CPUs supporting logical processors, the driver could misbehave if 689logical processors are subsequently enabled or disabled while the 690driver is active. 691.Sh SEE ALSO 692.Xr kenv 1 , 693.Xr pmc 3 , 694.Xr kgmon 8 , 695.Xr kldload 8 , 696.Xr pmccontrol 8 , 697.Xr pmcstat 8 , 698.Xr sysctl 8 , 699.Xr p_candebug 9 700