1.\" Copyright (c) 2003-2008 Joseph Koshy. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" This software is provided by Joseph Koshy ``as is'' and 13.\" any express or implied warranties, including, but not limited to, the 14.\" implied warranties of merchantability and fitness for a particular purpose 15.\" are disclaimed. in no event shall Joseph Koshy be liable 16.\" for any direct, indirect, incidental, special, exemplary, or consequential 17.\" damages (including, but not limited to, procurement of substitute goods 18.\" or services; loss of use, data, or profits; or business interruption) 19.\" however caused and on any theory of liability, whether in contract, strict 20.\" liability, or tort (including negligence or otherwise) arising in any way 21.\" out of the use of this software, even if advised of the possibility of 22.\" such damage. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd March 14, 2008 27.Os 28.Dt PMC 3 29.Sh NAME 30.Nm pmc 31.Nd library for accessing hardware performance monitoring counters 32.Sh LIBRARY 33.Lb libpmc 34.Sh SYNOPSIS 35.In pmc.h 36.Sh DESCRIPTION 37The 38.Lb libpmc 39provides a programming interface that allows applications to use 40hardware performance counters to gather performance data about 41specific processes or for the system as a whole. 42The library is implemented using the lower-level facilities offered by 43the 44.Xr hwpmc 4 45driver. 46.Ss Key Concepts 47Performance monitoring counters (PMCs) are represented by the library 48using a software abstraction. 49These 50.Dq abstract 51PMCs can have one two scopes: 52.Bl -bullet 53.It 54System scope. 55These PMCs measure events in a whole-system manner, i.e., independent 56of the currently executing thread. 57System scope PMCs are allocated on specific CPUs and do not 58migrate between CPUs. 59Non-privileged process are allowed to allocate system scope PMCs if the 60.Xr hwpmc 4 61sysctl tunable: 62.Va security.bsd.unprivileged_syspmcs 63is non-zero. 64.It 65Process scope. 66These PMCs only measure hardware events when the processes they are 67attached to are executing on a CPU. 68In an SMP system, process scope PMCs migrate between CPUs along with 69their target processes. 70.El 71.Pp 72Orthogonal to PMC scope, PMCs may be allocated in one of two 73operational modes: 74.Bl -bullet 75.It 76Counting PMCs measure events according to their scope 77(system or process). 78The application needs to explicitly read these counters 79to retrieve their value. 80.It 81Sampling PMCs cause the CPU to be periodically interrupted 82and information about its state of execution to be collected. 83Sampling PMCs are used to profile specific processes and kernel 84threads or to profile the system as a whole. 85.El 86.Pp 87The scope and operational mode for a software PMC are specified at 88PMC allocation time. 89An application is allowed to allocate multiple PMCs subject 90to availability of hardware resources. 91.Pp 92The library uses human-readable strings to name the event being 93measured by hardware. 94The syntax used for specifying a hardware event along with additional 95event specific qualifiers (if any) is described in detail in section 96.Sx "EVENT SPECIFIERS" 97below. 98.Pp 99PMCs are associated with the process that allocated them and 100will be automatically reclaimed by the system when the process exits. 101Additionally, process-scope PMCs have to be attached to one or more 102target processes before they can perform measurements. 103A process-scope PMC may be attached to those target processes 104that its owner process would otherwise be permitted to debug. 105An owner process may attach PMCs to itself allowing 106it to measure its own behavior. 107Additionally, on some machine architectures, such self-attached PMCs 108may be read cheaply using specialized instructions supported by the 109processor. 110.Pp 111Certain kinds of PMCs require that a log file be configured before 112they may be started. 113These include: 114.Bl -bullet -compact 115.It 116System scope sampling PMCs. 117.It 118Process scope sampling PMCs. 119.It 120Process scope counting PMCs that have been configured to report PMC 121readings on process context switches or process exits. 122.El 123Upto one log file may be configured per owner process. 124Events logged to a log file may be subsequently analyzed using the 125.Xr pmclog 3 126family of functions. 127.Ss Supported CPUs 128The CPUs known to the PMC library are named by the 129.Vt "enum pmc_cputype" 130enumeration. 131Supported CPUs include: 132.Bl -tag -width PMC_CPU_INTEL_PIII -compact 133.It PMC_CPU_AMD_K7 134.Tn "AMD Athlon" 135CPUs. 136.It PMC_CPU_AMD_K8 137.Tn "AMD Athlon64" 138CPUs. 139.It PMC_CPU_INTEL_P5 140.Tn Intel 141.Tn "Pentium" 142CPUs. 143.It PMC_CPU_INTEL_P6 144.Tn Intel 145.Tn "Pentium Pro" 146CPUs. 147.It PMC_CPU_INTEL_PII 148.Tn "Intel Pentium II" 149CPUs. 150.It PMC_CPU_INTEL_PIII 151.Tn "Intel Pentium III" 152CPUs. 153.It PMC_CPU_INTEL_PM 154.Tn "Intel Pentium M" 155CPUs. 156.It PMC_CPU_INTEL_PIV 157.Tn "Intel Pentium 4" 158CPUs. 159.El 160.Ss Supported PMCs 161PMC supported by this library are named by the 162.Vt enum pmc_class 163enumeration. 164Supported PMC kinds include: 165.Bl -tag -width PMC_CLASS_TSC -compact 166.It PMC_CLASS_TSC 167The timestamp counter on i386 and amd64 architecture CPUs. 168.It PMC_CLASS_K7 169Programmable hardware counters present in 170.Tn "AMD Athlon" 171CPUs. 172.It PMC_CLASS_K8 173Programmable hardware counters present in 174.Tn "AMD Athlon64" 175CPUs. 176.It PMC_CLASS_P5 177Programmable hardware counters present in 178.Tn Intel 179.Tn Pentium 180CPUs. 181.It PMC_CLASS_P6 182Programmable hardware counters present in 183.Tn Intel 184.Tn "Pentium Pro" , 185.Tn "Pentium II" , 186.Tn "Pentium III" , 187.Tn "Celeron" , 188and 189.Tn "Pentium M" 190CPUs. 191.It PMC_CLASS_P4 192Programmable hardware counters present in 193.Tn "Intel Pentium 4" 194CPUs. 195.El 196.Ss PMC Capabilities 197.Pp 198Capabilities of performance monitoring hardware are denoted using 199the 200.Vt "enum pmc_caps" 201enumeration. 202Supported capabilities include: 203.Bl -tag -width "PMC_CAP_INTERRUPT" -compact 204.It PMC_CAP_EDGE 205The ability to count negated to asserted transitions of the hardware 206conditions being probed for. 207.It PMC_CAP_INTERRUPT 208The ability to interrupt the CPU. 209.It PMC_CAP_INVERT 210The ability to invert the sense of the hardware conditions being 211measured. 212.It PMC_CAP_READ 213PMC hardware allows the CPU to read performance counters. 214.It PMC_CAP_QUALIFIER 215The hardware allows monitored to be further qualified in some 216system dependent way. 217.It PMC_CAP_SYSTEM 218The ability to restrict counting of hardware events to when the CPU is 219running privileged code. 220.It PMC_CAP_THRESHOLD 221The ability to ignore simultaneous hardware events below a 222programmable threshold. 223.It PMC_CAP_USER 224The ability to restrict counting of hardware events to those when the 225CPU is running unprivileged code. 226.It PMC_CAP_WRITE 227PMC hardware allows CPUs write to counters. 228.El 229.Ss Functional Grouping 230This section contains a brief overview of the available functionality 231in the PMC library. 232Each function listed here is described further in its own manual page. 233.Bl -tag -width indent 234.It Administration 235.Bl -tag -compact 236.It Fn pmc_disable , Fn pmc_enable 237Administratively disable (enable) specific performance monitoring 238counter hardware. 239Counters that are disabled will not be available to applications to 240use. 241.El 242.It "Convenience Functions" 243.Bl -tag -compact 244.It Fn pmc_event_names_of_class 245Returns a list of event names supported by a given PMC type. 246.It Fn pmc_name_of_capability 247Convert a 248.Dv PMC_CAP_* 249flag to a human-readable string. 250.It Fn pmc_name_of_class 251Convert a 252.Dv PMC_CLASS_* 253constant to a human-readable string. 254.It Fn pmc_name_of_cputype 255Return a human-readable name for a CPU type. 256.It Fn pmc_name_of_disposition 257Return a human-readable string describing a PMC's disposition. 258.It Fn pmc_name_of_event 259Convert a numeric event code to a human-readable string. 260.It Fn pmc_name_of_mode 261Convert a 262.Dv PMC_MODE_* 263constant to a human-readable name. 264.It Fn pmc_name_of_state 265Return a human-readable string describing a PMC's current state. 266.El 267.It "Library Initialization" 268.Bl -tag -compact 269.It Fn pmc_init 270Initialize the library. 271This function must be called before any other library function. 272.El 273.It "Log File Handling" 274.Bl -tag -compact 275.It Fn pmc_configure_logfile 276Configure a log file for 277.Xr hwpmc 4 278to write logged events to. 279.It Fn pmc_flush_logfile 280Flush all pending log data in 281.Xr hwpmc 4 Ns Ap s 282buffers. 283.It Fn pmc_writelog 284Append arbitrary user data to the current log file. 285.El 286.It "PMC Management" 287.Bl -tag -compact 288.It Fn pmc_allocate , Fn pmc_release 289Allocate (free) a PMC. 290.It Fn pmc_attach , Fn pmc_detach 291Attach (detach) a process scope PMC to a target. 292.It Fn pmc_read , Fn pmc_write , Fn pmc_rw 293Read (write) a value from (to) a PMC. 294.It Fn pmc_start , Fn pmc_stop 295Start (stop) a software PMC. 296.It Fn pmc_set 297Set the reload value for a sampling PMC. 298.El 299.It "Queries" 300.Bl -tag -compact 301.It Fn pmc_capabilities 302Retrieve the capabilities for a given PMC. 303.It Fn pmc_cpuinfo 304Retrieve information about the CPUs and PMC hardware present in the 305system. 306.It Fn pmc_get_driver_stats 307Retrieve statistics maintained by 308.Xr hwpmc 4 . 309.It Fn pmc_ncpu 310Determine the number of CPUs in the system. 311.It Fn pmc_npmc 312Return the number of hardware PMCs present in a given CPU. 313.It Fn pmc_pmcinfo 314Return information about the state of a given CPU's PMCs. 315.It Fn pmc_width 316Determine the width of a hardware counter in bits. 317.El 318.It "x86 Architecture Specific API" 319.Bl -tag -compact 320.It Fn pmc_get_msr 321Returns the processor model specific register number 322associated with 323.Fa pmc . 324Applications may then use the x86 325.Ic RDPMC 326instruction to directly read the contents of the PMC. 327.El 328.El 329.Ss Signal Handling Requirements 330Applications using PMCs are required to handle the following signals: 331.Bl -tag -width ".Dv SIGBUS" 332.It Dv SIGBUS 333When the 334.Xr hwpmc 4 335module is unloaded using 336.Xr kldunload 8 , 337processes that have PMCs allocated to them will be sent a 338.Dv SIGBUS 339signal. 340.It Dv SIGIO 341The 342.Xr hwpmc 4 343driver will send a PMC owning process a 344.Dv SIGIO 345signal if: 346.Bl -bullet 347.It 348If any process-mode PMC allocated by it loses all its 349target processes. 350.It 351If the driver encounters an error when writing log data to a 352configured log file. 353This error may be retrieved by a subsequent call to 354.Fn pmc_flush_logfile . 355.El 356.El 357.Ss Typical Program Flow 358.Bl -enum 359.It 360An application would first invoke function 361.Fn pmc_init 362to allow the library to initialize itself. 363.It 364Signal handling would then be set up. 365.It 366Next the application would allocate the PMCs it desires using function 367.Fn pmc_allocate . 368.It 369Initial values for PMCs may be set using function 370.Fn pmc_set . 371.It 372If a log file is necessary for the PMCs to work, it would 373be configured using function 374.Fn pmc_configure_logfile . 375.It 376Process scope PMCs would then be attached to their target processes 377using function 378.Fn pmc_attach . 379.It 380The PMCs would then be started using function 381.Fn pmc_start . 382.It 383Once started, the values of counting PMCs may be read using function 384.Fn pmc_start . 385For PMCs that write events to the log file, this logged data would be 386read and parsed using the 387.Xr pmclog 3 388family of functions. 389.It 390PMCs are stopped using function 391.Fn pmc_stop , 392and process scope PMCs are detached from their targets using 393function 394.Fn pmc_detach . 395.It 396Before the process exits, its may release its PMCs using function 397.Fn pmc_release . 398Any configured log file may be closed using function 399.Fn pmc_configure_logfile . 400.El 401.Sh EVENT SPECIFIERS 402Event specifiers are strings comprising of an event name, followed by 403optional parameters modifying the semantics of the hardware event 404being probed. 405Event names are PMC architecture dependent, but the PMC library defines 406machine independent aliases for commonly used events. 407.Ss Event Name Aliases 408Event name aliases are CPU architecture independent names for commonly 409used events. 410The following aliases are known to this version of the 411.Nm pmc 412library: 413.Bl -tag -width indent 414.It Li branches 415Measure the number of branches retired. 416.It Li branch-mispredicts 417Measure the number of retired branches that were mispredicted. 418.It Li cycles 419Measure processor cycles. 420This event is implemented using the processor's Time Stamp Counter 421register. 422.It Li dc-misses 423Measure the number of data cache misses. 424.It Li ic-misses 425Measure the number of instruction cache misses. 426.It Li instructions 427Measure the number of instructions retired. 428.It Li interrupts 429Measure the number of interrupts seen. 430.It Li unhalted-cycles 431Measure the number of cycles the processor is not in a halted 432or sleep state. 433.El 434.Ss Time Stamp Counter (TSC) 435The timestamp counter is a monotonically non-decreasing counter that 436counts processor cycles. 437.Pp 438In the i386 architecture, this counter may 439be selected by requesting an event with event specifier 440.Dq Li tsc . 441The 442.Dq Li tsc 443event does not support any further qualifiers. 444It can only be allocated in system-wide counting mode, 445and is a read-only counter. 446Multiple processes are allowed to allocate the TSC. 447Once allocated, it may be read using the 448.Fn pmc_read 449function, or by using the RDTSC instruction. 450.Ss AMD (K7) PMCs 451These PMCs are present in the 452.Tn "AMD Athlon" 453series of CPUs and are documented in: 454.Rs 455.%B "AMD Athlon Processor x86 Code Optimization Guide" 456.%N "Publication No. 22007" 457.%D "February 2002" 458.%Q "Advanced Micro Devices, Inc." 459.Re 460.Pp 461Event specifiers for AMD K7 PMCs can have the following optional 462qualifiers: 463.Bl -tag -width indent 464.It Li count= Ns Ar value 465Configure the counter to increment only if the number of configured 466events measured in a cycle is greater than or equal to 467.Ar value . 468.It Li edge 469Configure the counter to only count negated-to-asserted transitions 470of the conditions expressed by the other qualifiers. 471In other words, the counter will increment only once whenever a given 472condition becomes true, irrespective of the number of clocks during 473which the condition remains true. 474.It Li inv 475Invert the sense of comparision when the 476.Dq Li count 477qualifier is present, making the counter to increment when the 478number of events per cycle is less than the value specified by 479the 480.Dq Li count 481qualifier. 482.It Li os 483Configure the PMC to count events happening at privilege level 0. 484.It Li unitmask= Ns Ar mask 485This qualifier is used to further qualify a select few events, 486.Dq Li k7-dc-refills-from-l2 , 487.Dq Li k7-dc-refills-from-system 488and 489.Dq Li k7-dc-writebacks . 490Here 491.Ar mask 492is a string of the following characters optionally separated by 493.Ql + 494characters: 495.Pp 496.Bl -tag -width indent -compact 497.It Li m 498Count operations for lines in the 499.Dq Modified 500state. 501.It Li o 502Count operations for lines in the 503.Dq Owner 504state. 505.It Li e 506Count operations for lines in the 507.Dq Exclusive 508state. 509.It Li s 510Count operations for lines in the 511.Dq Shared 512state. 513.It Li i 514Count operations for lines in the 515.Dq Invalid 516state. 517.El 518.Pp 519If no 520.Dq Li unitmask 521qualifier is specified, the default is to count events for caches 522lines in any of the above states. 523.It Li usr 524Configure the PMC to count events occurring at privilege levels 1, 2 525or 3. 526.El 527.Pp 528If neither of the 529.Dq Li os 530or 531.Dq Li usr 532qualifiers were specified, the default is to enable both. 533.Pp 534The event specifiers supported on AMD K7 PMCs are: 535.Bl -tag -width indent 536.It Li k7-dc-accesses 537Count data cache accesses. 538.It Li k7-dc-misses 539Count data cache misses. 540.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask 541Count data cache refills from L2 cache. 542This event may be further qualified using the 543.Dq Li unitmask 544qualifier. 545.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask 546Count data cache refills from system memory. 547This event may be further qualified using the 548.Dq Li unitmask 549qualifier. 550.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask 551Count data cache writebacks. 552This event may be further qualified using the 553.Dq Li unitmask 554qualifier. 555.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits 556Count L1 DTLB misses and L2 DTLB hits. 557.It Li k7-l1-and-l2-dtlb-misses 558Count L1 and L2 DTLB misses. 559.It Li k7-misaligned-references 560Count misaligned data references. 561.It Li k7-ic-fetches 562Count instruction cache fetches. 563.It Li k7-ic-misses 564Count instruction cache misses. 565.It Li k7-l1-itlb-misses 566Count L1 ITLB misses that are L2 ITLB hits. 567.It Li k7-l1-l2-itlb-misses 568Count L1 (and L2) ITLB misses. 569.It Li k7-retired-instructions 570Count all retired instructions. 571.It Li k7-retired-ops 572Count retired ops. 573.It Li k7-retired-branches 574Count all retired branches (conditional, unconditional, exceptions 575and interrupts). 576.It Li k7-retired-branches-mispredicted 577Count all misprediced retired branches. 578.It Li k7-retired-taken-branches 579Count retired taken branches. 580.It Li k7-retired-taken-branches-mispredicted 581Count mispredicted taken branches that were retired. 582.It Li k7-retired-far-control-transfers 583Count retired far control transfers. 584.It Li k7-retired-resync-branches 585Count retired resync branches (non control transfer branches). 586.It Li k7-interrupts-masked-cycles 587Count the number of cycles when the processor's 588.Va IF 589flag was zero. 590.It Li k7-interrupts-masked-while-pending-cycles 591Count the number of cycles interrupts were masked while pending due 592to the processor's 593.Va IF 594flag being zero. 595.It Li k7-hardware-interrupts 596Count the number of taken hardware interrupts. 597.El 598.Ss AMD (K8) PMCs 599These PMCs are present in the 600.Tn "AMD Athlon64" 601and 602.Tn "AMD Opteron" 603series of CPUs. 604They are documented in: 605.Rs 606.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors" 607.%N "Publication No. 26094" 608.%D "April 2004" 609.%Q "Advanced Micro Devices, Inc." 610.Re 611.Pp 612Event specifiers for AMD K8 PMCs can have the following optional 613qualifiers: 614.Bl -tag -width indent 615.It Li count= Ns Ar value 616Configure the counter to increment only if the number of configured 617events measured in a cycle is greater than or equal to 618.Ar value . 619.It Li edge 620Configure the counter to only count negated-to-asserted transitions 621of the conditions expressed by the other fields. 622In other words, the counter will increment only once whenever a given 623condition becomes true, irrespective of the number of clocks during 624which the condition remains true. 625.It Li inv 626Invert the sense of comparision when the 627.Dq Li count 628qualifier is present, making the counter to increment when the 629number of events per cycle is less than the value specified by 630the 631.Dq Li count 632qualifier. 633.It Li mask= Ns Ar qualifier 634Many event specifiers for AMD K8 PMCs need to be additionally 635qualified using a mask qualifier. 636These additional qualifiers are event-specific and are documented 637along with their associated event specifiers below. 638.It Li os 639Configure the PMC to count events happening at privilege level 0. 640.It Li usr 641Configure the PMC to count events occurring at privilege levels 1, 2 642or 3. 643.El 644.Pp 645If neither of the 646.Dq Li os 647or 648.Dq Li usr 649qualifiers were specified, the default is to enable both. 650.Pp 651The event specifiers supported on AMD K8 PMCs are: 652.Bl -tag -width indent 653.It Li k8-bu-cpu-clk-unhalted 654Count the number of clock cycles when the CPU is not in the HLT or 655STPCLK states. 656.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier 657Count fill requests that missed in the L2 cache. 658This event may be further qualified using 659.Ar qualifier , 660which is a 661.Ql + 662separated set of the following keywords: 663.Pp 664.Bl -tag -width indent -compact 665.It Li dc-fill 666Count data cache fill requests. 667.It Li ic-fill 668Count instruction cache fill requests. 669.It Li tlb-reload 670Count TLB reloads. 671.El 672.Pp 673The default is to count all types of requests. 674.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier 675Count internally generated requests to the L2 cache. 676This event may be further qualified using 677.Ar qualifier , 678which is a 679.Ql + 680separated set of the following keywords: 681.Pp 682.Bl -tag -width indent -compact 683.It Li cancelled 684Count cancelled requests. 685.It Li dc-fill 686Count data cache fill requests. 687.It Li ic-fill 688Count instruction cache fill requests. 689.It Li tag-snoop 690Count tag snoop requests. 691.It Li tlb-reload 692Count TLB reloads. 693.El 694.Pp 695The default is to count all types of requests. 696.It Li k8-dc-access 697Count data cache accesses including microcode scratchpad accesses. 698.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier 699Count data cache copyback operations. 700This event may be further qualified using 701.Ar qualifier , 702which is a 703.Ql + 704separated set of the following keywords: 705.Pp 706.Bl -tag -width indent -compact 707.It Li exclusive 708Count operations for lines in the 709.Dq exclusive 710state. 711.It Li invalid 712Count operations for lines in the 713.Dq invalid 714state. 715.It Li modified 716Count operations for lines in the 717.Dq modified 718state. 719.It Li owner 720Count operations for lines in the 721.Dq owner 722state. 723.It Li shared 724Count operations for lines in the 725.Dq shared 726state. 727.El 728.Pp 729The default is to count operations for lines in all the 730above states. 731.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier 732Count data cache accesses by lock instructions. 733This event is only available on processors of revision C or later 734vintage. 735This event may be further qualified using 736.Ar qualifier , 737which is a 738.Ql + 739separated set of the following keywords: 740.Pp 741.Bl -tag -width indent -compact 742.It Li accesses 743Count data cache accesses by lock instructions. 744.It Li misses 745Count data cache misses by lock instructions. 746.El 747.Pp 748The default is to count all accesses. 749.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier 750Count the number of dispatched prefetch instructions. 751This event may be further qualified using 752.Ar qualifier , 753which is a 754.Ql + 755separated set of the following keywords: 756.Pp 757.Bl -tag -width indent -compact 758.It Li load 759Count load operations. 760.It Li nta 761Count non-temporal operations. 762.It Li store 763Count store operations. 764.El 765.Pp 766The default is to count all operations. 767.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit 768Count L1 DTLB misses that are L2 DTLB hits. 769.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss 770Count L1 DTLB misses that are also misses in the L2 DTLB. 771.It Li k8-dc-microarchitectural-early-cancel-of-an-access 772Count microarchitectural early cancels of data cache accesses. 773.It Li k8-dc-microarchitectural-late-cancel-of-an-access 774Count microarchitectural late cancels of data cache accesses. 775.It Li k8-dc-misaligned-data-reference 776Count misaligned data references. 777.It Li k8-dc-miss 778Count data cache misses. 779.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier 780Count one bit ECC errors found by the scrubber. 781This event may be further qualified using 782.Ar qualifier , 783which is a 784.Ql + 785separated set of the following keywords: 786.Pp 787.Bl -tag -width indent -compact 788.It Li scrubber 789Count scrubber detected errors. 790.It Li piggyback 791Count piggyback scrubber errors. 792.El 793.Pp 794The default is to count both kinds of errors. 795.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier 796Count data cache refills from L2 cache. 797This event may be further qualified using 798.Ar qualifier , 799which is a 800.Ql + 801separated set of the following keywords: 802.Pp 803.Bl -tag -width indent -compact 804.It Li exclusive 805Count operations for lines in the 806.Dq exclusive 807state. 808.It Li invalid 809Count operations for lines in the 810.Dq invalid 811state. 812.It Li modified 813Count operations for lines in the 814.Dq modified 815state. 816.It Li owner 817Count operations for lines in the 818.Dq owner 819state. 820.It Li shared 821Count operations for lines in the 822.Dq shared 823state. 824.El 825.Pp 826The default is to count operations for lines in all the 827above states. 828.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier 829Count data cache refills from system memory. 830This event may be further qualified using 831.Ar qualifier , 832which is a 833.Ql + 834separated set of the following keywords: 835.Pp 836.Bl -tag -width indent -compact 837.It Li exclusive 838Count operations for lines in the 839.Dq exclusive 840state. 841.It Li invalid 842Count operations for lines in the 843.Dq invalid 844state. 845.It Li modified 846Count operations for lines in the 847.Dq modified 848state. 849.It Li owner 850Count operations for lines in the 851.Dq owner 852state. 853.It Li shared 854Count operations for lines in the 855.Dq shared 856state. 857.El 858.Pp 859The default is to count operations for lines in all the 860above states. 861.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier 862Count the number of dispatched FPU ops. 863This event is supported in revision B and later CPUs. 864This event may be further qualified using 865.Ar qualifier , 866which is a 867.Ql + 868separated set of the following keywords: 869.Pp 870.Bl -tag -width indent -compact 871.It Li add-pipe-excluding-junk-ops 872Count add pipe ops excluding junk ops. 873.It Li add-pipe-junk-ops 874Count junk ops in the add pipe. 875.It Li multiply-pipe-excluding-junk-ops 876Count multiply pipe ops excluding junk ops. 877.It Li multiply-pipe-junk-ops 878Count junk ops in the multiply pipe. 879.It Li store-pipe-excluding-junk-ops 880Count store pipe ops excluding junk ops 881.It Li store-pipe-junk-ops 882Count junk ops in the store pipe. 883.El 884.Pp 885The default is to count all types of ops. 886.It Li k8-fp-cycles-with-no-fpu-ops-retired 887Count cycles when no FPU ops were retired. 888This event is supported in revision B and later CPUs. 889.It Li k8-fp-dispatched-fpu-fast-flag-ops 890Count dispatched FPU ops that use the fast flag interface. 891This event is supported in revision B and later CPUs. 892.It Li k8-fr-decoder-empty 893Count cycles when there was nothing to dispatch (i.e., the decoder 894was empty). 895.It Li k8-fr-dispatch-stalls 896Count all dispatch stalls. 897.It Li k8-fr-dispatch-stall-for-segment-load 898Count dispatch stalls for segment loads. 899.It Li k8-fr-dispatch-stall-for-serialization 900Count dispatch stalls for serialization. 901.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire 902Count dispatch stalls from branch abort to retiral. 903.It Li k8-fr-dispatch-stall-when-fpu-is-full 904Count dispatch stalls when the FPU is full. 905.It Li k8-fr-dispatch-stall-when-ls-is-full 906Count dispatch stalls when the load/store unit is full. 907.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full 908Count dispatch stalls when the reorder buffer is full. 909.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full 910Count dispatch stalls when reservation stations are full. 911.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet 912Count dispatch stalls when waiting for all to be quiet. 913.\" XXX What does "waiting for all to be quiet" mean? 914.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending 915Count dispatch stalls when a far control transfer or a resync branch 916is pending. 917.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier 918Count FPU exceptions. 919This event is supported in revision B and later CPUs. 920This event may be further qualified using 921.Ar qualifier , 922which is a 923.Ql + 924separated set of the following keywords: 925.Pp 926.Bl -tag -width indent -compact 927.It Li sse-and-x87-microtraps 928Count SSE and x87 microtraps. 929.It Li sse-reclass-microfaults 930Count SSE reclass microfaults 931.It Li sse-retype-microfaults 932Count SSE retype microfaults 933.It Li x87-reclass-microfaults 934Count x87 reclass microfaults. 935.El 936.Pp 937The default is to count all types of exceptions. 938.It Li k8-fr-interrupts-masked-cycles 939Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero). 940.It Li k8-fr-interrupts-masked-while-pending-cycles 941Count cycles while interrupts were masked while pending (i.e., cycles 942when INTR was asserted while CPU RFLAGS field IF was zero). 943.It Li k8-fr-number-of-breakpoints-for-dr0 944Count the number of breakpoints for DR0. 945.It Li k8-fr-number-of-breakpoints-for-dr1 946Count the number of breakpoints for DR1. 947.It Li k8-fr-number-of-breakpoints-for-dr2 948Count the number of breakpoints for DR2. 949.It Li k8-fr-number-of-breakpoints-for-dr3 950Count the number of breakpoints for DR3. 951.It Li k8-fr-retired-branches 952Count retired branches including exceptions and interrupts. 953.It Li k8-fr-retired-branches-mispredicted 954Count mispredicted retired branches. 955.It Li k8-fr-retired-far-control-transfers 956Count retired far control transfers (which are always mispredicted). 957.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier 958Count retired fastpath double op instructions. 959This event is supported in revision B and later CPUs. 960This event may be further qualified using 961.Ar qualifier , 962which is a 963.Ql + 964separated set of the following keywords: 965.Pp 966.Bl -tag -width indent -compact 967.It Li low-op-pos-0 968Count instructions with the low op in position 0. 969.It Li low-op-pos-1 970Count instructions with the low op in position 1. 971.It Li low-op-pos-2 972Count instructions with the low op in position 2. 973.El 974.Pp 975The default is to count all types of instructions. 976.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier 977Count retired FPU instructions. 978This event is supported in revision B and later CPUs. 979This event may be further qualified using 980.Ar qualifier , 981which is a 982.Ql + 983separated set of the following keywords: 984.Pp 985.Bl -tag -width indent -compact 986.It Li mmx-3dnow 987Count MMX and 3DNow!\& instructions. 988.It Li packed-sse-sse2 989Count packed SSE and SSE2 instructions. 990.It Li scalar-sse-sse2 991Count scalar SSE and SSE2 instructions 992.It Li x87 993Count x87 instructions. 994.El 995.Pp 996The default is to count all types of instructions. 997.It Li k8-fr-retired-near-returns 998Count retired near returns. 999.It Li k8-fr-retired-near-returns-mispredicted 1000Count mispredicted near returns. 1001.It Li k8-fr-retired-resyncs 1002Count retired resyncs (non-control transfer branches). 1003.It Li k8-fr-retired-taken-hardware-interrupts 1004Count retired taken hardware interrupts. 1005.It Li k8-fr-retired-taken-branches 1006Count retired taken branches. 1007.It Li k8-fr-retired-taken-branches-mispredicted 1008Count retired taken branches that were mispredicted. 1009.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare 1010Count retired taken branches that were mispredicted only due to an 1011address miscompare. 1012.It Li k8-fr-retired-uops 1013Count retired uops. 1014.It Li k8-fr-retired-x86-instructions 1015Count retired x86 instructions including exceptions and interrupts. 1016.It Li k8-ic-fetch 1017Count instruction cache fetches. 1018.It Li k8-ic-instruction-fetch-stall 1019Count cycles in stalls due to instruction fetch. 1020.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit 1021Count L1 ITLB misses that are L2 ITLB hits. 1022.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss 1023Count ITLB misses that miss in both L1 and L2 ITLBs. 1024.It Li k8-ic-microarchitectural-resync-by-snoop 1025Count microarchitectural resyncs caused by snoops. 1026.It Li k8-ic-miss 1027Count instruction cache misses. 1028.It Li k8-ic-refill-from-l2 1029Count instruction cache refills from L2 cache. 1030.It Li k8-ic-refill-from-system 1031Count instruction cache refills from system memory. 1032.It Li k8-ic-return-stack-hits 1033Count hits to the return stack. 1034.It Li k8-ic-return-stack-overflow 1035Count overflows of the return stack. 1036.It Li k8-ls-buffer2-full 1037Count load/store buffer2 full events. 1038.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier 1039Count locked operations. 1040For revision C and later CPUs, the following qualifiers are supported: 1041.Pp 1042.Bl -tag -width indent -compact 1043.It Li cycles-in-request 1044Count the number of cycles in the lock request/grant stage. 1045.It Li cycles-to-complete 1046Count the number of cycles a lock takes to complete once it is 1047non-speculative and is the older load/store operation. 1048.It Li locked-instructions 1049Count the number of lock instructions executed. 1050.El 1051.Pp 1052The default is to count the number of lock instructions executed. 1053.It Li k8-ls-microarchitectural-late-cancel 1054Count microarchitectural late cancels of operations in the load/store 1055unit. 1056.It Li k8-ls-microarchitectural-resync-by-self-modifying-code 1057Count microarchitectural resyncs caused by self-modifying code. 1058.It Li k8-ls-microarchitectural-resync-by-snoop 1059Count microarchitectural resyncs caused by snoops. 1060.It Li k8-ls-retired-cflush-instructions 1061Count retired CFLUSH instructions. 1062.It Li k8-ls-retired-cpuid-instructions 1063Count retired CPUID instructions. 1064.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier 1065Count segment register loads. 1066This event may be further qualified using 1067.Ar qualifier , 1068which is a 1069.Ql + 1070separated set of the following keywords: 1071.Bl -tag -width indent -compact 1072.It Li cs 1073Count CS register loads. 1074.It Li ds 1075Count DS register loads. 1076.It Li es 1077Count ES register loads. 1078.It Li fs 1079Count FS register loads. 1080.It Li gs 1081Count GS register loads. 1082.\" .It Li hs 1083.\" Count HS register loads. 1084.\" XXX "HS" register? 1085.It Li ss 1086Count SS register loads. 1087.El 1088.Pp 1089The default is to count all types of loads. 1090.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier 1091Count memory controller bypass counter saturation events. 1092This event may be further qualified using 1093.Ar qualifier , 1094which is a 1095.Ql + 1096separated set of the following keywords: 1097.Pp 1098.Bl -tag -width indent -compact 1099.It Li dram-controller-interface-bypass 1100Count DRAM controller interface bypass. 1101.It Li dram-controller-queue-bypass 1102Count DRAM controller queue bypass. 1103.It Li memory-controller-hi-pri-bypass 1104Count memory controller high priority bypasses. 1105.It Li memory-controller-lo-pri-bypass 1106Count memory controller low priority bypasses. 1107.El 1108.Pp 1109.It Li k8-nb-memory-controller-dram-slots-missed 1110Count memory controller DRAM command slots missed (in MemClks). 1111.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier 1112Count memory controller page access events. 1113This event may be further qualified using 1114.Ar qualifier , 1115which is a 1116.Ql + 1117separated set of the following keywords: 1118.Pp 1119.Bl -tag -width indent -compact 1120.It Li page-conflict 1121Count page conflicts. 1122.It Li page-hit 1123Count page hits. 1124.It Li page-miss 1125Count page misses. 1126.El 1127.Pp 1128The default is to count all types of events. 1129.It Li k8-nb-memory-controller-page-table-overflow 1130Count memory control page table overflow events. 1131.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier 1132Count probe events. 1133This event may be further qualified using 1134.Ar qualifier , 1135which is a 1136.Ql + 1137separated set of the following keywords: 1138.Pp 1139.Bl -tag -width indent -compact 1140.It Li probe-hit 1141Count all probe hits. 1142.It Li probe-hit-dirty-no-memory-cancel 1143Count probe hits without memory cancels. 1144.It Li probe-hit-dirty-with-memory-cancel 1145Count probe hits with memory cancels. 1146.It Li probe-miss 1147Count probe misses. 1148.El 1149.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier 1150Count sized commands issued. 1151This event may be further qualified using 1152.Ar qualifier , 1153which is a 1154.Ql + 1155separated set of the following keywords: 1156.Pp 1157.Bl -tag -width indent -compact 1158.It Li nonpostwrszbyte 1159.It Li nonpostwrszdword 1160.It Li postwrszbyte 1161.It Li postwrszdword 1162.It Li rdszbyte 1163.It Li rdszdword 1164.It Li rdmodwr 1165.El 1166.Pp 1167The default is to count all types of commands. 1168.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier 1169Count memory control turnaround events. 1170This event may be further qualified using 1171.Ar qualifier , 1172which is a 1173.Ql + 1174separated set of the following keywords: 1175.Pp 1176.Bl -tag -width indent -compact 1177.\" XXX doc is unclear whether these are cycle counts or event counts 1178.It Li dimm-turnaround 1179Count DIMM turnarounds. 1180.It Li read-to-write-turnaround 1181Count read to write turnarounds. 1182.It Li write-to-read-turnaround 1183Count write to read turnarounds. 1184.El 1185.Pp 1186The default is to count all types of events. 1187.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier 1188.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier 1189.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier 1190Count events on the HyperTransport(tm) buses. 1191These events may be further qualified using 1192.Ar qualifier , 1193which is a 1194.Ql + 1195separated set of the following keywords: 1196.Pp 1197.Bl -tag -width indent -compact 1198.It Li buffer-release 1199Count buffer release messages sent. 1200.It Li command 1201Count command messages sent. 1202.It Li data 1203Count data messages sent. 1204.It Li nop 1205Count nop messages sent. 1206.El 1207.Pp 1208The default is to count all types of messages. 1209.El 1210.Ss Intel Pentium PMCS 1211Intel Pentium PMCs are present in Intel 1212.Tn Pentium 1213and 1214.Tn "Pentium MMX" 1215processors. 1216.Pp 1217These CPUs have two counters. 1218Some events may only be used on specific counters and some events 1219are defined only on processors supporting the MMX instruction set. 1220.Pp 1221These PMCs are documented in 1222.Rs 1223.%B "Intel 64 and IA-32 Intel(R) Architectures Software Developer's Manual" 1224.%T "Volume 3B: System Programming Guide, Part 2" 1225.%N "Order Number 253669-024US" 1226.%D "August 2007" 1227.%Q "Intel Corporation" 1228.Re 1229.Pp 1230Event specifiers for Intel Pentium PMCs can have the following common 1231qualifiers: 1232.Bl -tag -width indent 1233.It Li duration 1234Count duration (in clocks) of events. 1235The default is to count events. 1236.It Li os 1237Measure events at privilege levels 0, 1 and 2. 1238.It Li overflow 1239Assert the external processor pin associated with a counter on counter 1240overflow. 1241.It Li usr 1242Measure events at privilege level 3. 1243.El 1244.Pp 1245Note that these PMCs do not have the ability to interrupt the CPU. 1246.Pp 1247The event specifiers supported by Intel Pentium PMCs are: 1248.Bl -tag -width indent 1249.It Li p5-any-segment-register-loaded 1250The number of writes to any segment register, including the LDTR, 1251GDTR, TR and IDTR. 1252Far control transfers and task switches that involve privilege 1253level changes will count this event twice. 1254.It Li p5-bank-conflicts 1255The number of actual bank conflicts. 1256.It Li p5-branches 1257The number of taken and not taken branches including branches, jumps, calls, 1258software interrupts and interrupt returns. 1259.It Li p5-breakpoint-match-on-dr0-register 1260The number of matches on the DR0 breakpoint register. 1261.It Li p5-breakpoint-match-on-dr1-register 1262The number of matches on the DR1 breakpoint register. 1263.It Li p5-breakpoint-match-on-dr2-register 1264The number of matches on the DR2 breakpoint register. 1265.It Li p5-breakpoint-match-on-dr3-register 1266The number of matches on the DR3 breakpoint register. 1267.It Li p5-btb-false-entries 1268.Pq Tn Pentium MMX 1269The number of false entries in the BTB. 1270This event is only allocated on counter 0. 1271.It Li p5-btb-hits 1272The number of branches executed that hit in the branch table buffer. 1273.It Li p5-btb-miss-prediction-on-not-taken-branch 1274.Pq Tn Pentium MMX 1275The number of times the BTB predicted a not-taken branch as taken. 1276This event is only allocated on counter 1. 1277.It Li p5-bus-cycle-duration 1278The number of cycles while a bus cycle was in progress. 1279.It Li p5-bus-ownership-latency 1280.Pq Tn Pentium MMX 1281The time from bus ownership being requested to ownership being granted. 1282This event is only allocated on counter 0. 1283.It Li p5-bus-ownership-transfers 1284.Pq Tn Pentium MMX 1285The number of bus ownership transfers. 1286This event is only allocated on counter 1. 1287.It Li p5-bus-utilization-due-to-processor-activity 1288.Pq Tn Pentium MMX 1289The number of clocks the bus is busy due to the processor's own 1290activity. 1291This event is only allocated on counter 0. 1292.It Li p5-cache-line-sharing 1293.Pq Tn Pentium MMX 1294The number of shared data lines in L1 cache. 1295This event is only allocated on counter 1. 1296.It Li p5-cache-m-state-line-sharing 1297.Pq Tn Pentium MMX 1298The number of hits to an M- state line due to a memory access by 1299another processor. 1300This event is only allocated on counter 0. 1301.It Li p5-code-cache-miss 1302The number of instruction reads that miss the internal code cache. 1303Both cacheable and uncacheable misses are counted. 1304.It Li p5-code-read 1305The number of instruction reads to both cacheable and uncacheable regions. 1306.It Li p5-code-tlb-miss 1307The number of instruction reads that miss the instruction TLB. 1308Both cacheable and uncacheable unreads are counted. 1309.It Li p5-d1-starvation-and-fifo-is-empty 1310.Pq Tn Pentium MMX 1311The number of times the D1 stage cannot issue any instructions because 1312the FIFO was empty. 1313This event is only allocated on counter 0. 1314.It Li p5-d1-starvation-and-only-one-instruction-in-fifo 1315.Pq Tn Pentium MMX 1316The number of times the D1 stage could issue only one instruction 1317because the FIFO had one instruction ready. 1318This event is only allocated on counter 1. 1319.It Li p5-data-cache-lines-written-back 1320The number of data cache lines that are written back, including 1321those caused by internal and external snoops. 1322.It Li p5-data-cache-tlb-miss-stall-duration 1323.Pq Tn Pentium MMX 1324The number of clocks the pipeline is stalled due to a data cache 1325TLB miss. 1326This event is only allocated on counter 1. 1327.It Li p5-data-read 1328The number of memory data reads, counting internal data cache hits and 1329misses. 1330I/O and data memory accesses due to TLB miss processing are 1331not included. 1332Split cycle reads are counted individually. 1333.It Li p5-data-read-miss 1334The number of memory read accesses that miss the data cache, counting 1335both cacheable and uncacheable accesses. 1336Data accesses that are part of TLB miss processing are not included. 1337I/O accesses are not included. 1338.It Li p5-data-read-miss-or-write-miss 1339The number of data reads and writes that miss the internal data cache, 1340counting uncacheable accesses. 1341Data accesses due to TLB miss processing are not counted. 1342.It Li p5-data-read-or-write 1343The number of data reads and writes including internal data cache hits 1344and misses. 1345Data reads due to TLB miss processing are not counted. 1346.It Li p5-data-tlb-miss 1347The number of misses to the data cache translation lookaside buffer. 1348.It Li p5-data-write 1349The number of memory data writes, counting internal data cache hits 1350and misses. 1351I/O is not included and split cycle writes are counted individually. 1352.It Li p5-data-write-miss 1353The number of memory write accesses that miss the data cache, counting 1354both cacheable and uncacheable accesses. 1355I/O accesses are not counted. 1356.It Li p5-emms-instructions-executed 1357.Pq Tn Pentium MMX 1358The number of EMMS instructions executed. 1359This event is only allocated on counter 0. 1360.It Li p5-external-data-cache-snoop-hits 1361The number of external snoops to the data cache that hit a valid line, 1362or the data line fill buffer, or one of the write back buffers. 1363.It Li p5-external-snoops 1364The number of external snoop requests accepted, including snoops that 1365hit in the code cache, the data cache and that hit in neither. 1366.It Li p5-floating-point-stalls-duration 1367.Pq Tn Pentium MMX 1368The number of cycles the pipeline is stalled due to a floating point 1369freeze. 1370This event is only allocated on counter 0. 1371.It Li p5-flops 1372The number of floating point adds, subtracts, multiples, divides and 1373square roots. 1374Transcendental instructions trigger this event multiple times. 1375Instructions generating divide-by-zero, negative square root, special 1376operand and stack exceptions are not counted. 1377Integer multiply instructions that use the x87 FPU are counted. 1378.It Li p5-full-write-buffer-stall-duration-while-executing-mmx-instructions 1379.Pq Tn Pentium MMX 1380The number of clocks the pipeline has stalled due to full write 1381buffers when executing MMX instructions. 1382This event is only allocated on counter 0. 1383.It Li p5-hardware-interrupts 1384The number of taken INTR and NMI interrupts. 1385.It Li p5-instructions-executed 1386The number of instructions executed. 1387Repeat prefixed instructions are counted only once. 1388The HLT instruction is counted only once, irrespective of the number 1389of cycles spent in the halted state. 1390All hardware and software exceptions are counted as instructions, and 1391fault handler invocations are also counted as instructions. 1392.It Li p5-instructions-executed-v-pipe 1393The number of instructions that executed in the V pipe. 1394.It Li p5-io-read-or-write-cycle 1395The number of bus cycles directed to I/O space. 1396.It Li p5-locked-bus-cycle 1397The number of locked bus cycles that occur on account of the lock 1398prefixes, LOCK instructions, page table updates and descriptor table 1399updates. 1400.It Li p5-memory-accesses-in-both-pipes 1401The number of data memory reads or writes that are paired in both pipes. 1402.It Li p5-misaligned-data-memory-on-mmx-instructions 1403.Pq Tn Pentium MMX 1404The number of misaligned data memory references when executing MMX 1405instructions. 1406This event is only allocated on counter 0. 1407.It Li p5-misaligned-data-memory-or-io-references 1408The number of memory or I/O reads or writes that are not aligned on 1409natural boundaries. 14102- and 4-byte accesses are counted as misaligned if they cross a 4 1411byte boundary. 1412.It Li p5-mispredicted-or-unpredicted-returns 1413.Pq Tn Pentium MMX 1414The number of returns predicted incorrectly or not at all, only 1415counting RET instructions. 1416This event is only allocated on counter 0. 1417.It Li p5-mmx-instruction-data-read-misses 1418.Pq Tn Pentium MMX 1419The number of MMX instruction data read misses. 1420This event is only allocated on counter 1. 1421.It Li p5-mmx-instruction-data-reads 1422.Pq Tn Pentium MMX 1423The number of MMX instruction data reads. 1424This event is only allocated on counter 0. 1425.It Li p5-mmx-instruction-data-write-misses 1426.Pq Tn Pentium MMX 1427The number of data write misses caused by MMX instructions. 1428This event is only allocated on counter 1. 1429.It Li p5-mmx-instruction-data-writes 1430.Pq Tn Pentium MMX 1431The number of data writes caused by MMX instructions. 1432This event is only allocated on counter 0. 1433.It Li p5-mmx-instructions-executed-u-pipe 1434.Pq Tn Pentium MMX 1435The number of MMX instructions executed in the U pipe. 1436This event is only allocated on counter 0. 1437.It Li p5-mmx-instructions-executed-v-pipe 1438The number of MMX instructions executed in the V pipe. 1439This event is only allocated on counter 1. 1440.It Li p5-mmx-multiply-unit-interlock 1441.Pq Tn Pentium MMX 1442The number of clocks the pipeline is stalled because the destination 1443of a prior MMX multiply is not ready. 1444This event is only allocated on counter 0. 1445.It Li p5-movd-movq-store-stall-due-to-previous-mmx-operation 1446.Pq Tn Pentium MMX 1447The number of clocks a MOVD/MOVQ instruction stalled in the D2 stage 1448of the pipeline due to a previous MMX instruction. 1449This event is only allocated on counter 1. 1450.It Li p5-noncacheable-memory-reads 1451The number of bus cycles for non-cacheable instruction or data reads, 1452including cycles caused by TLB misses. 1453.It Li p5-number-of-cycles-not-in-halt-state 1454.Pq Tn Pentium MMX 1455The number of cycles the processor is not idle due to the HLT 1456instruction. 1457This event is only allocated on counter 0. 1458.It Li p5-pipeline-agi-stalls 1459The number of address generation interlock stalls. 1460An AGI that occurs in both the U and V pipelines in the same clock 1461signals the event twice. 1462.It Li p5-pipeline-flushes 1463The number of pipeline flushes that occur. 1464Pipeline flushes are caused by branch mispredicts, exceptions, 1465interrupts, some segment register loads, and BTB misses. 1466Prefetch queue flushes due to serializing instructions are not 1467counted. 1468.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions 1469.Pq Tn Pentium MMX 1470The number of pipeline flushes due to wrong branch predictions 1471resolved in either the E- or WB- stage of the pipeline. 1472This event is only allocated on counter 0. 1473.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions-resolved-in-wb-stage 1474.Pq Tn Pentium MMX 1475The number of pipeline flushes due to wrong branch predictions 1476resolved in the stage of the pipeline. 1477This event is only allocated on counter 1. 1478.It Li p5-pipeline-stall-for-mmx-instruction-data-memory-reads 1479.Pq Tn Pentium MMX 1480The number of clocks during pipeline stalls caused by waiting MMX data 1481memory reads. 1482This event is only allocated on counter 0. 1483.It Li p5-predicted-returns 1484.Pq Tn Pentium MMX 1485The number of predicted returns, whether correct or incorrect. 1486This counter only counts RET instructions. 1487This event is only allocated on counter 1. 1488.It Li p5-returns 1489.Pq Tn Pentium MMX 1490The number of RET instructions executed. 1491This event is only allocated on counter 0. 1492.It Li p5-saturating-mmx-instructions-executed 1493.Pq Tn Pentium MMX 1494The number of saturating MMX instructions executed. 1495This event is only allocated on counter 0. 1496.It Li p5-saturations-performed 1497.Pq Tn Pentium MMX 1498The number of saturating MMX instructions executed when at least one 1499of its results were actually saturated. 1500This event is only allocated on counter 1. 1501.It Li p5-stall-on-mmx-instruction-write-to-e-o-m-state-line 1502.Pq Tn Pentium MMX 1503The number of clocks during stalls on MMX instructions writing to 1504E- or M- state cache lines. 1505This event is only allocated on counter 1. 1506.It Li p5-stall-on-write-to-an-e-or-m-state-line 1507The number of stalls on a write to an exclusive or modified data cache 1508line. 1509.It Li p5-taken-branch-or-btb-hit 1510The number of events that may cause a hit in the BTB, namely either 1511taken branches or BTB hits. 1512.It Li p5-taken-branches 1513.Pq Tn Pentium MMX 1514The number of taken branches. 1515This event is only allocated on counter 1. 1516.It Li p5-transitions-between-mmx-and-fp-instructions 1517.Pq Tn Pentium MMX 1518The number of transitions between MMX and floating-point instructions 1519and vice-versa. 1520This event is only allocated on counter 1. 1521.It Li p5-waiting-for-data-memory-read-stall-duration 1522The number of clocks the pipeline was stalled waiting for data 1523memory reads. 1524Data TLB misses processing is included in this count. 1525.It Li p5-write-buffer-full-stall-duration 1526The number of clocks while the pipeline was stalled due to write 1527buffers being full. 1528.It Li p5-write-hit-to-m-or-e-state-lines 1529The number of writes that hit exclusive or modified lines in the data 1530cache. 1531.It Li p5-writes-to-noncacheable-memory 1532.Pq Tn Pentium MMX 1533The number of writes to non-cacheable memory, including write cycles 1534caused by TLB misses and I/O writes. 1535This event is only allocated on counter 1. 1536.El 1537.Ss Intel P6 PMCS 1538Intel P6 PMCs are present in Intel 1539.Tn "Pentium Pro" , 1540.Tn "Pentium II" , 1541.Tn Celeron , 1542.Tn "Pentium III" 1543and 1544.Tn "Pentium M" 1545processors. 1546.Pp 1547These CPUs have two counters. 1548Some events may only be used on specific counters and some events are 1549defined only on specific processor models. 1550.Pp 1551These PMCs are documented in 1552.Rs 1553.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 1554.%T "Volume 3: System Programming Guide" 1555.%N "Order Number 245472-012" 1556.%D 2003 1557.%Q "Intel Corporation" 1558.Re 1559.Pp 1560Some of these events are affected by processor errata described in 1561.Rs 1562.%B "Intel(R) Pentium(R) III Processor Specification Update" 1563.%N "Document Number: 244453-054" 1564.%D "April 2005" 1565.%Q "Intel Corporation" 1566.Re 1567.Pp 1568Event specifiers for Intel P6 PMCs can have the following common 1569qualifiers: 1570.Bl -tag -width indent 1571.It Li cmask= Ns Ar value 1572Configure the PMC to increment only if the number of configured 1573events measured in a cycle is greater than or equal to 1574.Ar value . 1575.It Li edge 1576Configure the PMC to count the number of deasserted to asserted 1577transitions of the conditions expressed by the other qualifiers. 1578If specified, the counter will increment only once whenever a 1579condition becomes true, irrespective of the number of clocks during 1580which the condition remains true. 1581.It Li inv 1582Invert the sense of comparision when the 1583.Dq Li cmask 1584qualifier is present, making the counter increment when the number of 1585events per cycle is less than the value specified by the 1586.Dq Li cmask 1587qualifier. 1588.It Li os 1589Configure the PMC to count events happening at processor privilege 1590level 0. 1591.It Li umask= Ns Ar value 1592This qualifier is used to further qualify the event selected (see 1593below). 1594.It Li usr 1595Configure the PMC to count events occurring at privilege levels 1, 2 1596or 3. 1597.El 1598.Pp 1599If neither of the 1600.Dq Li os 1601or 1602.Dq Li usr 1603qualifiers are specified, the default is to enable both. 1604.Pp 1605The event specifiers supported by Intel P6 PMCs are: 1606.Bl -tag -width indent 1607.It Li p6-baclears 1608Count the number of times a static branch prediction was made by the 1609branch decoder because the BTB did not have a prediction. 1610.It Li p6-br-bac-missp-exec 1611.Pq Tn "Pentium M" 1612Count the number of branch instructions executed that where 1613mispredicted at the Front End (BAC). 1614.It Li p6-br-bogus 1615Count the number of bogus branches. 1616.It Li p6-br-call-exec 1617.Pq Tn "Pentium M" 1618Count the number of call instructions executed. 1619.It Li p6-br-call-missp-exec 1620.Pq Tn "Pentium M" 1621Count the number of call instructions executed that were mispredicted. 1622.It Li p6-br-cnd-exec 1623.Pq Tn "Pentium M" 1624Count the number of conditional branch instructions executed. 1625.It Li p6-br-cnd-missp-exec 1626.Pq Tn "Pentium M" 1627Count the number of conditional branch instructions executed that were 1628mispredicted. 1629.It Li p6-br-ind-call-exec 1630.Pq Tn "Pentium M" 1631Count the number of indirect call instructions executed. 1632.It Li p6-br-ind-exec 1633.Pq Tn "Pentium M" 1634Count the number of indirect branch instructions executed. 1635.It Li p6-br-ind-missp-exec 1636.Pq Tn "Pentium M" 1637Count the number of indirect branch instructions executed that were 1638mispredicted. 1639.It Li p6-br-inst-decoded 1640Count the number of branch instructions decoded. 1641.It Li p6-br-inst-exec 1642.Pq Tn "Pentium M" 1643Count the number of branch instructions executed but necessarily retired. 1644.It Li p6-br-inst-retired 1645Count the number of branch instructions retired. 1646.It Li p6-br-miss-pred-retired 1647Count the number of mispredicted branch instructions retired. 1648.It Li p6-br-miss-pred-taken-ret 1649Count the number of taken mispredicted branches retired. 1650.It Li p6-br-missp-exec 1651.Pq Tn "Pentium M" 1652Count the number of branch instructions executed that were 1653mispredicted at execution. 1654.It Li p6-br-ret-bac-missp-exec 1655.Pq Tn "Pentium M" 1656Count the number of return instructions executed that were 1657mispredicted at the Front End (BAC). 1658.It Li p6-br-ret-exec 1659.Pq Tn "Pentium M" 1660Count the number of return instructions executed. 1661.It Li p6-br-ret-missp-exec 1662.Pq Tn "Pentium M" 1663Count the number of return instructions executed that were 1664mispredicted at execution. 1665.It Li p6-br-taken-retired 1666Count the number of taken branches retired. 1667.It Li p6-btb-misses 1668Count the number of branches for which the BTB did not produce a 1669prediction. 1670.It Li p6-bus-bnr-drv 1671Count the number of bus clock cycles during which this processor is 1672driving the BNR# pin. 1673.It Li p6-bus-data-rcv 1674Count the number of bus clock cycles during which this processor is 1675receiving data. 1676.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier 1677Count the number of clocks during which DRDY# is asserted. 1678An additional qualifier may be specified, and comprises one of the 1679following keywords: 1680.Pp 1681.Bl -tag -width indent -compact 1682.It Li any 1683Count transactions generated by any agent on the bus. 1684.It Li self 1685Count transactions generated by this processor. 1686.El 1687.Pp 1688The default is to count operations generated by this processor. 1689.It Li p6-bus-hit-drv 1690Count the number of bus clock cycles during which this processor is 1691driving the HIT# pin. 1692.It Li p6-bus-hitm-drv 1693Count the number of bus clock cycles during which this processor is 1694driving the HITM# pin. 1695.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier 1696Count the number of clocks during with LOCK# is asserted on the 1697external system bus. 1698An additional qualifier may be specified and comprises one of the following 1699keywords: 1700.Pp 1701.Bl -tag -width indent -compact 1702.It Li any 1703Count transactions generated by any agent on the bus. 1704.It Li self 1705Count transactions generated by this processor. 1706.El 1707.Pp 1708The default is to count operations generated by this processor. 1709.It Li p6-bus-req-outstanding 1710Count the number of bus requests outstanding in any given cycle. 1711.It Li p6-bus-snoop-stall 1712Count the number of clock cycles during which the bus is snoop stalled. 1713.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier 1714Count the number of completed bus transactions of any kind. 1715An additional qualifier may be specified and comprises one of the following 1716keywords: 1717.Pp 1718.Bl -tag -width indent -compact 1719.It Li any 1720Count transactions generated by any agent on the bus. 1721.It Li self 1722Count transactions generated by this processor. 1723.El 1724.Pp 1725The default is to count operations generated by this processor. 1726.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier 1727Count the number of burst read transactions. 1728An additional qualifier may be specified and comprises one of the following 1729keywords: 1730.Pp 1731.Bl -tag -width indent -compact 1732.It Li any 1733Count transactions generated by any agent on the bus. 1734.It Li self 1735Count transactions generated by this processor. 1736.El 1737.Pp 1738The default is to count operations generated by this processor. 1739.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier 1740Count the number of completed burst transactions. 1741An additional qualifier may be specified and comprises one of the following 1742keywords: 1743.Pp 1744.Bl -tag -width indent -compact 1745.It Li any 1746Count transactions generated by any agent on the bus. 1747.It Li self 1748Count transactions generated by this processor. 1749.El 1750.Pp 1751The default is to count operations generated by this processor. 1752.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier 1753Count the number of completed deferred transactions. 1754An additional qualifier may be specified and comprises one of the following 1755keywords: 1756.Pp 1757.Bl -tag -width indent -compact 1758.It Li any 1759Count transactions generated by any agent on the bus. 1760.It Li self 1761Count transactions generated by this processor. 1762.El 1763.Pp 1764The default is to count operations generated by this processor. 1765.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier 1766Count the number of completed instruction fetch transactions. 1767An additional qualifier may be specified and comprises one of the following 1768keywords: 1769.Pp 1770.Bl -tag -width indent -compact 1771.It Li any 1772Count transactions generated by any agent on the bus. 1773.It Li self 1774Count transactions generated by this processor. 1775.El 1776.Pp 1777The default is to count operations generated by this processor. 1778.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier 1779Count the number of completed invalidate transactions. 1780An additional qualifier may be specified and comprises one of the following 1781keywords: 1782.Pp 1783.Bl -tag -width indent -compact 1784.It Li any 1785Count transactions generated by any agent on the bus. 1786.It Li self 1787Count transactions generated by this processor. 1788.El 1789.Pp 1790The default is to count operations generated by this processor. 1791.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier 1792Count the number of completed memory transactions. 1793An additional qualifier may be specified and comprises one of the following 1794keywords: 1795.Pp 1796.Bl -tag -width indent -compact 1797.It Li any 1798Count transactions generated by any agent on the bus. 1799.It Li self 1800Count transactions generated by this processor. 1801.El 1802.Pp 1803The default is to count operations generated by this processor. 1804.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier 1805Count the number of completed partial write transactions. 1806An additional qualifier may be specified and comprises one of the following 1807keywords: 1808.Pp 1809.Bl -tag -width indent -compact 1810.It Li any 1811Count transactions generated by any agent on the bus. 1812.It Li self 1813Count transactions generated by this processor. 1814.El 1815.Pp 1816The default is to count operations generated by this processor. 1817.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier 1818Count the number of completed read-for-ownership transactions. 1819An additional qualifier may be specified and comprises one of the following 1820keywords: 1821.Pp 1822.Bl -tag -width indent -compact 1823.It Li any 1824Count transactions generated by any agent on the bus. 1825.It Li self 1826Count transactions generated by this processor. 1827.El 1828.Pp 1829The default is to count operations generated by this processor. 1830.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier 1831Count the number of completed I/O transactions. 1832An additional qualifier may be specified and comprises one of the following 1833keywords: 1834.Pp 1835.Bl -tag -width indent -compact 1836.It Li any 1837Count transactions generated by any agent on the bus. 1838.It Li self 1839Count transactions generated by this processor. 1840.El 1841.Pp 1842The default is to count operations generated by this processor. 1843.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier 1844Count the number of completed partial transactions. 1845An additional qualifier may be specified and comprises one of the following 1846keywords: 1847.Pp 1848.Bl -tag -width indent -compact 1849.It Li any 1850Count transactions generated by any agent on the bus. 1851.It Li self 1852Count transactions generated by this processor. 1853.El 1854.Pp 1855The default is to count operations generated by this processor. 1856.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier 1857Count the number of completed write-back transactions. 1858An additional qualifier may be specified and comprises one of the following 1859keywords: 1860.Pp 1861.Bl -tag -width indent -compact 1862.It Li any 1863Count transactions generated by any agent on the bus. 1864.It Li self 1865Count transactions generated by this processor. 1866.El 1867.Pp 1868The default is to count operations generated by this processor. 1869.It Li p6-cpu-clk-unhalted 1870Count the number of cycles during with the processor was not halted. 1871.Pp 1872.Pq Tn "Pentium M" 1873Count the number of cycles during with the processor was not halted 1874and not in a thermal trip. 1875.It Li p6-cycles-div-busy 1876Count the number of cycles during which the divider is busy and cannot 1877accept new divides. 1878This event is only allocated on counter 0. 1879.It Li p6-cycles-in-pending-and-masked 1880Count the number of processor cycles for which interrupts were 1881disabled and interrupts were pending. 1882.It Li p6-cycles-int-masked 1883Count the number of processor cycles for which interrupts were 1884disabled. 1885.It Li p6-data-mem-refs 1886Count all loads and all stores using any memory type, including 1887internal retries. 1888Each part of a split store is counted separately. 1889.It Li p6-dcu-lines-in 1890Count the total lines allocated in the data cache unit. 1891.It Li p6-dcu-m-lines-in 1892Count the number of M state lines allocated in the data cache unit. 1893.It Li p6-dcu-m-lines-out 1894Count the number of M state lines evicted from the data cache unit. 1895.It Li p6-dcu-miss-outstanding 1896Count the weighted number of cycles while a data cache unit miss is 1897outstanding, incremented by the number of outstanding cache misses at 1898any time. 1899.It Li p6-div 1900Count the number of integer and floating-point divides including 1901speculative divides. 1902This event is only allocated on counter 1. 1903.It Li p6-emon-esp-uops 1904.Pq Tn "Pentium M" 1905Count the total number of micro-ops. 1906.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier 1907.Pq Tn "Pentium M" 1908Count the number of 1909.Tn "Enhanced Intel SpeedStep" 1910transitions. 1911An additional qualifier may be specified, and can be one of the 1912following keywords: 1913.Pp 1914.Bl -tag -width indent -compact 1915.It Li all 1916Count all transitions. 1917.It Li freq 1918Count only frequency transitions. 1919.El 1920.Pp 1921The default is to count all transitions. 1922.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier 1923.Pq Tn "Pentium M" 1924Count the number of retired fused micro-ops. 1925An additional qualifier may be specified, and may be one of the 1926following keywords: 1927.Pp 1928.Bl -tag -width indent -compact 1929.It Li all 1930Count all fused micro-ops. 1931.It Li loadop 1932Count only load and op micro-ops. 1933.It Li stdsta 1934Count only STD/STA micro-ops. 1935.El 1936.Pp 1937The default is to count all fused micro-ops. 1938.It Li p6-emon-kni-comp-inst-ret 1939.Pq Tn "Pentium III" 1940Count the number of SSE computational instructions retired. 1941An additional qualifier may be specified, and comprises one of the 1942following keywords: 1943.Pp 1944.Bl -tag -width indent -compact 1945.It Li packed-and-scalar 1946Count packed and scalar operations. 1947.It Li scalar 1948Count scalar operations only. 1949.El 1950.Pp 1951The default is to count packed and scalar operations. 1952.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier 1953.Pq Tn "Pentium III" 1954Count the number of SSE instructions retired. 1955An additional qualifier may be specified, and comprises one of the 1956following keywords: 1957.Pp 1958.Bl -tag -width indent -compact 1959.It Li packed-and-scalar 1960Count packed and scalar operations. 1961.It Li scalar 1962Count scalar operations only. 1963.El 1964.Pp 1965The default is to count packed and scalar operations. 1966.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier 1967.Pq Tn "Pentium III" 1968Count the number of SSE prefetch or weakly ordered instructions 1969dispatched (including speculative prefetches). 1970An additional qualifier may be specified, and comprises one of the 1971following keywords: 1972.Pp 1973.Bl -tag -width indent -compact 1974.It Li nta 1975Count non-temporal prefetches. 1976.It Li t1 1977Count prefetches to L1. 1978.It Li t2 1979Count prefetches to L2. 1980.It Li wos 1981Count weakly ordered stores. 1982.El 1983.Pp 1984The default is to count non-temporal prefetches. 1985.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier 1986.Pq Tn "Pentium III" 1987Count the number of prefetch or weakly ordered instructions that miss 1988all caches. 1989An additional qualifier may be specified, and comprises one of the 1990following keywords: 1991.Pp 1992.Bl -tag -width indent -compact 1993.It Li nta 1994Count non-temporal prefetches. 1995.It Li t1 1996Count prefetches to L1. 1997.It Li t2 1998Count prefetches to L2. 1999.It Li wos 2000Count weakly ordered stores. 2001.El 2002.Pp 2003The default is to count non-temporal prefetches. 2004.It Li p6-emon-pref-rqsts-dn 2005.Pq Tn "Pentium M" 2006Count the number of downward prefetches issued. 2007.It Li p6-emon-pref-rqsts-up 2008.Pq Tn "Pentium M" 2009Count the number of upward prefetches issued. 2010.It Li p6-emon-simd-instr-retired 2011.Pq Tn "Pentium M" 2012Count the number of retired 2013.Tn MMX 2014instructions. 2015.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier 2016.Pq Tn "Pentium M" 2017Count the number of computational SSE instructions retired. 2018An additional qualifier may be specified and can be one of the 2019following keywords: 2020.Pp 2021.Bl -tag -width indent -compact 2022.It Li sse-packed-single 2023Count SSE packed-single instructions. 2024.It Li sse-scalar-single 2025Count SSE scalar-single instructions. 2026.It Li sse2-packed-double 2027Count SSE2 packed-double instructions. 2028.It Li sse2-scalar-double 2029Count SSE2 scalar-double instructions. 2030.El 2031.Pp 2032The default is to count SSE packed-single instructions. 2033.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer 2034.Pp 2035.Pq Tn "Pentium M" 2036Count the number of SSE instructions retired. 2037An additional qualifier can be specified, and can be one of the 2038following keywords: 2039.Pp 2040.Bl -tag -width indent -compact 2041.It Li sse-packed-single 2042Count SSE packed-single instructions. 2043.It Li sse-packed-single-scalar-single 2044Count SSE packed-single and scalar-single instructions. 2045.It Li sse2-packed-double 2046Count SSE2 packed-double instructions. 2047.It Li sse2-scalar-double 2048Count SSE2 scalar-double instructions. 2049.El 2050.Pp 2051The default is to count SSE packed-single instructions. 2052.It Li p6-emon-synch-uops 2053.Pq Tn "Pentium M" 2054Count the number of sync micro-ops. 2055.It Li p6-emon-thermal-trip 2056.Pq Tn "Pentium M" 2057Count the duration or occurrences of thermal trips. 2058Use the 2059.Dq Li edge 2060qualifier to count occurrences of thermal trips. 2061.It Li p6-emon-unfusion 2062.Pq Tn "Pentium M" 2063Count the number of unfusion events in the reorder buffer. 2064.It Li p6-flops 2065Count the number of computational floating point operations retired. 2066This event is only allocated on counter 0. 2067.It Li p6-fp-assist 2068Count the number of floating point exceptions handled by microcode. 2069This event is only allocated on counter 1. 2070.It Li p6-fp-comps-ops-exe 2071Count the number of computation floating point operations executed. 2072This event is only allocated on counter 0. 2073.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier 2074.Pq Tn "Pentium II" , Tn "Pentium III" 2075Count the number of transitions between MMX and floating-point 2076instructions. 2077An additional qualifier may be specified, and comprises one of the 2078following keywords: 2079.Pp 2080.Bl -tag -width indent -compact 2081.It Li mmxtofp 2082Count transitions from MMX instructions to floating-point instructions. 2083.It Li fptommx 2084Count transitions from floating-point instructions to MMX instructions. 2085.El 2086.Pp 2087The default is to count MMX to floating-point transitions. 2088.It Li p6-hw-int-rx 2089Count the number of hardware interrupts received. 2090.It Li p6-ifu-fetch 2091Count the number of instruction fetches, both cacheable and non-cacheable. 2092.It Li p6-ifu-fetch-miss 2093Count the number of instruction fetch misses (i.e., those that produce 2094memory accesses). 2095.It Li p6-ifu-mem-stall 2096Count the number of cycles instruction fetch is stalled for any reason. 2097.It Li p6-ild-stall 2098Count the number of cycles the instruction length decoder is stalled. 2099.It Li p6-inst-decoded 2100Count the number of instructions decoded. 2101.It Li p6-inst-retired 2102Count the number of instructions retired. 2103.It Li p6-itlb-miss 2104Count the number of instruction TLB misses. 2105.It Li p6-l2-ads 2106Count the number of L2 address strobes. 2107.It Li p6-l2-dbus-busy 2108Count the number of cycles during which the L2 cache data bus was busy. 2109.It Li p6-l2-dbus-busy-rd 2110Count the number of cycles during which the L2 cache data bus was busy 2111transferring read data from L2 to the processor. 2112.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier 2113Count the number of L2 instruction fetches. 2114An additional qualifier may be specified and comprises a list of the following 2115keywords separated by 2116.Ql + 2117characters: 2118.Pp 2119.Bl -tag -width indent -compact 2120.It Li e 2121Count operations affecting E (exclusive) state lines. 2122.It Li i 2123Count operations affecting I (invalid) state lines. 2124.It Li m 2125Count operations affecting M (modified) state lines. 2126.It Li s 2127Count operations affecting S (shared) state lines. 2128.El 2129.Pp 2130The default is to count operations affecting all (MESI) state lines. 2131.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier 2132Count the number of L2 data loads. 2133An additional qualifier may be specified and comprises a list of the following 2134keywords separated by 2135.Ql + 2136characters: 2137.Pp 2138.Bl -tag -width indent -compact 2139.It Li both 2140.Pq Tn "Pentium M" 2141Count both hardware-prefetched lines and non-hardware-prefetched lines. 2142.It Li e 2143Count operations affecting E (exclusive) state lines. 2144.It Li hw 2145.Pq Tn "Pentium M" 2146Count hardware-prefetched lines only. 2147.It Li i 2148Count operations affecting I (invalid) state lines. 2149.It Li m 2150Count operations affecting M (modified) state lines. 2151.It Li nonhw 2152.Pq Tn "Pentium M" 2153Exclude hardware-prefetched lines. 2154.It Li s 2155Count operations affecting S (shared) state lines. 2156.El 2157.Pp 2158The default on processors other than 2159.Tn "Pentium M" 2160processors is to count operations affecting all (MESI) state lines. 2161The default on 2162.Tn "Pentium M" 2163processors is to count both hardware-prefetched and 2164non-hardware-prefetch operations on all (MESI) state lines. 2165.Pq Errata 2166This event is affected by processor errata E53. 2167.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier 2168Count the number of L2 lines allocated. 2169An additional qualifier may be specified and comprises a list of the following 2170keywords separated by 2171.Ql + 2172characters: 2173.Pp 2174.Bl -tag -width indent -compact 2175.It Li both 2176.Pq Tn "Pentium M" 2177Count both hardware-prefetched lines and non-hardware-prefetched lines. 2178.It Li e 2179Count operations affecting E (exclusive) state lines. 2180.It Li hw 2181.Pq Tn "Pentium M" 2182Count hardware-prefetched lines only. 2183.It Li i 2184Count operations affecting I (invalid) state lines. 2185.It Li m 2186Count operations affecting M (modified) state lines. 2187.It Li nonhw 2188.Pq Tn "Pentium M" 2189Exclude hardware-prefetched lines. 2190.It Li s 2191Count operations affecting S (shared) state lines. 2192.El 2193.Pp 2194The default on processors other than 2195.Tn "Pentium M" 2196processors is to count operations affecting all (MESI) state lines. 2197The default on 2198.Tn "Pentium M" 2199processors is to count both hardware-prefetched and 2200non-hardware-prefetch operations on all (MESI) state lines. 2201.Pq Errata 2202This event is affected by processor errata E45. 2203.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier 2204Count the number of L2 lines evicted. 2205An additional qualifier may be specified and comprises a list of the following 2206keywords separated by 2207.Ql + 2208characters: 2209.Pp 2210.Bl -tag -width indent -compact 2211.It Li both 2212.Pq Tn "Pentium M" 2213Count both hardware-prefetched lines and non-hardware-prefetched lines. 2214.It Li e 2215Count operations affecting E (exclusive) state lines. 2216.It Li hw 2217.Pq Tn "Pentium M" 2218Count hardware-prefetched lines only. 2219.It Li i 2220Count operations affecting I (invalid) state lines. 2221.It Li m 2222Count operations affecting M (modified) state lines. 2223.It Li nonhw 2224.Pq Tn "Pentium M" only 2225Exclude hardware-prefetched lines. 2226.It Li s 2227Count operations affecting S (shared) state lines. 2228.El 2229.Pp 2230The default on processors other than 2231.Tn "Pentium M" 2232processors is to count operations affecting all (MESI) state lines. 2233The default on 2234.Tn "Pentium M" 2235processors is to count both hardware-prefetched and 2236non-hardware-prefetch operations on all (MESI) state lines. 2237.Pq Errata 2238This event is affected by processor errata E45. 2239.It Li p6-l2-m-lines-inm 2240Count the number of modified lines allocated in L2 cache. 2241.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier 2242Count the number of L2 M-state lines evicted. 2243.Pp 2244.Pq Tn "Pentium M" 2245On these processors an additional qualifier may be specified and 2246comprises a list of the following keywords separated by 2247.Ql + 2248characters: 2249.Pp 2250.Bl -tag -width indent -compact 2251.It Li both 2252Count both hardware-prefetched lines and non-hardware-prefetched lines. 2253.It Li hw 2254Count hardware-prefetched lines only. 2255.It Li nonhw 2256Exclude hardware-prefetched lines. 2257.El 2258.Pp 2259The default is to count both hardware-prefetched and 2260non-hardware-prefetch operations. 2261.Pq Errata 2262This event is affected by processor errata E53. 2263.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier 2264Count the total number of L2 requests. 2265An additional qualifier may be specified and comprises a list of the following 2266keywords separated by 2267.Ql + 2268characters: 2269.Pp 2270.Bl -tag -width indent -compact 2271.It Li e 2272Count operations affecting E (exclusive) state lines. 2273.It Li i 2274Count operations affecting I (invalid) state lines. 2275.It Li m 2276Count operations affecting M (modified) state lines. 2277.It Li s 2278Count operations affecting S (shared) state lines. 2279.El 2280.Pp 2281The default is to count operations affecting all (MESI) state lines. 2282.It Li p6-l2-st 2283Count the number of L2 data stores. 2284An additional qualifier may be specified and comprises a list of the following 2285keywords separated by 2286.Ql + 2287characters: 2288.Pp 2289.Bl -tag -width indent -compact 2290.It Li e 2291Count operations affecting E (exclusive) state lines. 2292.It Li i 2293Count operations affecting I (invalid) state lines. 2294.It Li m 2295Count operations affecting M (modified) state lines. 2296.It Li s 2297Count operations affecting S (shared) state lines. 2298.El 2299.Pp 2300The default is to count operations affecting all (MESI) state lines. 2301.It Li p6-ld-blocks 2302Count the number of load operations delayed due to store buffer blocks. 2303.It Li p6-misalign-mem-ref 2304Count the number of misaligned data memory references (crossing a 64 2305bit boundary). 2306.It Li p6-mmx-assist 2307.Pq Tn "Pentium II" , Tn "Pentium III" 2308Count the number of MMX assists executed. 2309.It Li p6-mmx-instr-exec 2310.Pq Tn Celeron , Tn "Pentium II" 2311Count the number of MMX instructions executed, except MOVQ and MOVD 2312stores from register to memory. 2313.It Li p6-mmx-instr-ret 2314.Pq Tn "Pentium II" 2315Count the number of MMX instructions retired. 2316.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier 2317.Pq Tn "Pentium II" , Tn "Pentium III" 2318Count the number of MMX instructions executed. 2319An additional qualifier may be specified and comprises a list of 2320the following keywords separated by 2321.Ql + 2322characters: 2323.Pp 2324.Bl -tag -width indent -compact 2325.It Li pack 2326Count MMX pack operation instructions. 2327.It Li packed-arithmetic 2328Count MMX packed arithmetic instructions. 2329.It Li packed-logical 2330Count MMX packed logical instructions. 2331.It Li packed-multiply 2332Count MMX packed multiply instructions. 2333.It Li packed-shift 2334Count MMX packed shift instructions. 2335.It Li unpack 2336Count MMX unpack operation instructions. 2337.El 2338.Pp 2339The default is to count all operations. 2340.It Li p6-mmx-sat-instr-exec 2341.Pq Tn "Pentium II" , Tn "Pentium III" 2342Count the number of MMX saturating instructions executed. 2343.It Li p6-mmx-uops-exec 2344.Pq Tn "Pentium II" , Tn "Pentium III" 2345Count the number of MMX micro-ops executed. 2346.It Li p6-mul 2347Count the number of integer and floating-point multiplies, including 2348speculative multiplies. 2349This event is only allocated on counter 1. 2350.It Li p6-partial-rat-stalls 2351Count the number of cycles or events for partial stalls. 2352.It Li p6-resource-stalls 2353Count the number of cycles there was a resource related stall of any kind. 2354.It Li p6-ret-seg-renames 2355.Pq Tn "Pentium II" , Tn "Pentium III" 2356Count the number of segment register rename events retired. 2357.It Li p6-sb-drains 2358Count the number of cycles the store buffer is draining. 2359.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier 2360.Pq Tn "Pentium II" , Tn "Pentium III" 2361Count the number of segment register renames. 2362An additional qualifier may be specified, and comprises a list of the 2363following keywords separated by 2364.Ql + 2365characters: 2366.Pp 2367.Bl -tag -width indent -compact 2368.It Li ds 2369Count renames for segment register DS. 2370.It Li es 2371Count renames for segment register ES. 2372.It Li fs 2373Count renames for segment register FS. 2374.It Li gs 2375Count renames for segment register GS. 2376.El 2377.Pp 2378The default is to count operations affecting all segment registers. 2379.It Li p6-seg-rename-stalls 2380.Pq Tn "Pentium II" , Tn "Pentium III" 2381Count the number of segment register renaming stalls. 2382An additional qualifier may be specified, and comprises a list of the 2383following keywords separated by 2384.Ql + 2385characters: 2386.Pp 2387.Bl -tag -width indent -compact 2388.It Li ds 2389Count stalls for segment register DS. 2390.It Li es 2391Count stalls for segment register ES. 2392.It Li fs 2393Count stalls for segment register FS. 2394.It Li gs 2395Count stalls for segment register GS. 2396.El 2397.Pp 2398The default is to count operations affecting all the segment registers. 2399.It Li p6-segment-reg-loads 2400Count the number of segment register loads. 2401.It Li p6-uops-retired 2402Count the number of micro-ops retired. 2403.El 2404.Ss Intel P4 PMCS 2405Intel P4 PMCs are present in Intel 2406.Tn "Pentium 4" 2407and 2408.Tn Xeon 2409processors. 2410These PMCs are documented in 2411.Rs 2412.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 2413.%T "Volume 3: System Programming Guide" 2414.%N "Order Number 245472-012" 2415.%D 2003 2416.%Q "Intel Corporation" 2417.Re 2418Further information about using these PMCs may be found in 2419.Rs 2420.%B "IA-32 Intel(R) Architecture Optimization Guide" 2421.%D 2003 2422.%N "Order Number 248966-009" 2423.%Q "Intel Corporation" 2424.Re 2425Some of these events are affected by processor errata described in 2426.Rs 2427.%B "Intel(R) Pentium(R) 4 Processor Specification Update" 2428.%N "Document Number: 249199-059" 2429.%D "April 2005" 2430.%Q "Intel Corporation" 2431.Re 2432.Pp 2433Event specifiers for Intel P4 PMCs can have the following common 2434qualifiers: 2435.Bl -tag -width indent 2436.It Li active= Ns Ar choice 2437(On P4 HTT CPUs) Filter event counting based on which logical 2438processors are active. 2439The allowed values of 2440.Ar choice 2441are: 2442.Pp 2443.Bl -tag -width indent -compact 2444.It Li any 2445Count when either logical processor is active. 2446.It Li both 2447Count when both logical processors are active. 2448.It Li none 2449Count only when neither logical processor is active. 2450.It Li single 2451Count only when one logical processor is active. 2452.El 2453.Pp 2454The default is 2455.Dq Li both . 2456.It Li cascade 2457Configure the PMC to cascade onto its partner. 2458See 2459.Sx "Cascading P4 PMCs" 2460below for more information. 2461.It Li edge 2462Configure the counter to count false to true transitions of the threshold 2463comparision output. 2464This qualifier only takes effect if a threshold qualifier has also been 2465specified. 2466.It Li complement 2467Configure the counter to increment only when the event count seen is 2468less than the threshold qualifier value specified. 2469.It Li mask= Ns Ar qualifier 2470Many event specifiers for Intel P4 PMCs need to be additionally 2471qualified using a mask qualifier. 2472The allowed syntax for these qualifiers is event specific and is 2473described along with the events. 2474.It Li os 2475Configure the PMC to count when the CPL of the processor is 0. 2476.It Li precise 2477Select precise event based sampling. 2478Precise sampling is supported by the hardware for a limited set of 2479events. 2480.It Li tag= Ns Ar value 2481Configure the PMC to tag the internal uop selected by the other 2482fields in this event specifier with value 2483.Ar value . 2484This feature is used when cascading PMCs. 2485.It Li threshold= Ns Ar value 2486Configure the PMC to increment only when the event counts seen are 2487greater than the specified threshold value 2488.Ar value . 2489.It Li usr 2490Configure the PMC to count when the CPL of the processor is 1, 2 or 3. 2491.El 2492.Pp 2493If neither of the 2494.Dq Li os 2495or 2496.Dq Li usr 2497qualifiers are specified, the default is to enable both. 2498.Pp 2499On Intel Pentium 4 processors with HTT, events are 2500divided into two classes: 2501.Pp 2502.Bl -tag -width indent -compact 2503.It "TS Events" 2504are those where hardware can differentiate between events 2505generated on one logical processor from those generated on the 2506other. 2507.It "TI Events" 2508are those where hardware cannot differentiate between events 2509generated by multiple logical processors in a package. 2510.El 2511.Pp 2512Only TS events are allowed for use with process-mode PMCs on 2513Pentium-4/HTT CPUs. 2514.Pp 2515The event specifiers supported by Intel P4 PMCs are: 2516.Pp 2517.Bl -tag -width indent 2518.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags 2519.Pq "TI event" 2520Count integer SIMD SSE2 instructions that operate on 128 bit SIMD 2521operands. 2522Qualifier 2523.Ar flags 2524can take the following value (which is also the default): 2525.Pp 2526.Bl -tag -width indent -compact 2527.It Li all 2528Count all uops operating on 128 bit SIMD integer operands in memory or 2529XMM register. 2530.El 2531.Pp 2532If an instruction contains more than one 128 bit MMX uop, then each 2533uop will be counted. 2534.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags 2535.Pq "TI event" 2536Count MMX instructions that operate on 64 bit SIMD operands. 2537Qualifier 2538.Ar flags 2539can take the following value (which is also the default): 2540.Pp 2541.Bl -tag -width indent -compact 2542.It Li all 2543Count all uops operating on 64 bit SIMD integer operands in memory or 2544in MMX registers. 2545.El 2546.Pp 2547If an instruction contains more than one 64 bit MMX uop, then each 2548uop will be counted. 2549.It Li p4-b2b-cycles 2550.Pq "TI event" 2551Count back-to-back bus cycles. 2552Further documentation for this event is unavailable. 2553.It Li p4-bnr 2554.Pq "TI event" 2555Count bus-not-ready conditions. 2556Further documentation for this event is unavailable. 2557.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier 2558.Pq "TS event" 2559Count instruction fetch requests qualified by additional 2560flags specified in 2561.Ar qualifier . 2562At this point only one flag is supported: 2563.Pp 2564.Bl -tag -width indent -compact 2565.It Li tcmiss 2566Count trace cache lookup misses. 2567.El 2568.Pp 2569The default qualifier is also 2570.Dq Li mask=tcmiss . 2571.It Li p4-branch-retired Op Li ,mask= Ns Ar flags 2572.Pq "TS event" 2573Counts retired branches. 2574Qualifier 2575.Ar flags 2576is a list of the following 2577.Ql + 2578separated strings: 2579.Pp 2580.Bl -tag -width indent -compact 2581.It Li mmnp 2582Count branches not-taken and predicted. 2583.It Li mmnm 2584Count branches not-taken and mis-predicted. 2585.It Li mmtp 2586Count branches taken and predicted. 2587.It Li mmtm 2588Count branches taken and mis-predicted. 2589.El 2590.Pp 2591The default qualifier counts all four kinds of branches. 2592.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier 2593.Pq "TS event" 2594Count the number of entries (clipped at 15) currently active in the 2595BSQ. 2596Qualifier 2597.Ar qualifier 2598is a 2599.Ql + 2600separated set of the following flags: 2601.Pp 2602.Bl -tag -width indent -compact 2603.It Li req-type0 , Li req-type1 2604Forms a 2-bit number used to select the request type encoding: 2605.Pp 2606.Bl -tag -width indent -compact 2607.It Li 0 2608reads excluding read invalidate 2609.It Li 1 2610read invalidates 2611.It Li 2 2612writes other than writebacks 2613.It Li 3 2614writebacks 2615.El 2616.Pp 2617Bit 2618.Dq Li req-type1 2619is the MSB for this two bit number. 2620.It Li req-len0 , Li req-len1 2621Forms a two-bit number that specifies the request length encoding: 2622.Pp 2623.Bl -tag -width indent -compact 2624.It Li 0 26250 chunks 2626.It Li 1 26271 chunk 2628.It Li 3 26298 chunks 2630.El 2631.Pp 2632Bit 2633.Dq Li req-len1 2634is the MSB for this two bit number. 2635.It Li req-io-type 2636Count requests that are input or output requests. 2637.It Li req-lock-type 2638Count requests that lock the bus. 2639.It Li req-lock-cache 2640Count requests that lock the cache. 2641.It Li req-split-type 2642Count requests that is a bus 8-byte chunk that is split across an 26438-byte boundary. 2644.It Li req-dem-type 2645Count requests that are demand (not prefetches) if set. 2646Count requests that are prefetches if not set. 2647.It Li req-ord-type 2648Count requests that are ordered. 2649.It Li mem-type0 , Li mem-type1 , Li mem-type2 2650Forms a 3-bit number that specifies a memory type encoding: 2651.Pp 2652.Bl -tag -width indent -compact 2653.It Li 0 2654UC 2655.It Li 1 2656USWC 2657.It Li 4 2658WT 2659.It Li 5 2660WP 2661.It Li 6 2662WB 2663.El 2664.Pp 2665Bit 2666.Dq Li mem-type2 2667is the MSB of this 3-bit number. 2668.El 2669.Pp 2670The default qualifier has all the above bits set. 2671.Pp 2672Edge triggering using the 2673.Dq Li edge 2674qualifier should not be used with this event when counting cycles. 2675.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier 2676.Pq "TS event" 2677Count allocations in the bus sequence unit according to the flags 2678specified in 2679.Ar qualifier , 2680which is a 2681.Ql + 2682separated set of the following flags: 2683.Pp 2684.Bl -tag -width indent -compact 2685.It Li req-type0 , Li req-type1 2686Forms a 2-bit number used to select the request type encoding: 2687.Pp 2688.Bl -tag -width indent -compact 2689.It Li 0 2690reads excluding read invalidate 2691.It Li 1 2692read invalidates 2693.It Li 2 2694writes other than writebacks 2695.It Li 3 2696writebacks 2697.El 2698.Pp 2699Bit 2700.Dq Li req-type1 2701is the MSB for this two bit number. 2702.It Li req-len0 , Li req-len1 2703Forms a two-bit number that specifies the request length encoding: 2704.Pp 2705.Bl -tag -width indent -compact 2706.It Li 0 27070 chunks 2708.It Li 1 27091 chunk 2710.It Li 3 27118 chunks 2712.El 2713.Pp 2714Bit 2715.Dq Li req-len1 2716is the MSB for this two bit number. 2717.It Li req-io-type 2718Count requests that are input or output requests. 2719.It Li req-lock-type 2720Count requests that lock the bus. 2721.It Li req-lock-cache 2722Count requests that lock the cache. 2723.It Li req-split-type 2724Count requests that is a bus 8-byte chunk that is split across an 27258-byte boundary. 2726.It Li req-dem-type 2727Count requests that are demand (not prefetches) if set. 2728Count requests that are prefetches if not set. 2729.It Li req-ord-type 2730Count requests that are ordered. 2731.It Li mem-type0 , Li mem-type1 , Li mem-type2 2732Forms a 3-bit number that specifies a memory type encoding: 2733.Pp 2734.Bl -tag -width indent -compact 2735.It Li 0 2736UC 2737.It Li 1 2738USWC 2739.It Li 4 2740WT 2741.It Li 5 2742WP 2743.It Li 6 2744WB 2745.El 2746.Pp 2747Bit 2748.Dq Li mem-type2 2749is the MSB of this 3-bit number. 2750.El 2751.Pp 2752The default qualifier has all the above bits set. 2753.Pp 2754This event is usually used along with the 2755.Dq Li edge 2756qualifier to avoid multiple counting. 2757.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier 2758.Pq "TS event" 2759Count cache references as seen by the bus unit (2nd or 3rd level 2760cache references). 2761Qualifier 2762.Ar qualifier 2763is a 2764.Ql + 2765separated list of the following keywords: 2766.Pp 2767.Bl -tag -width indent -compact 2768.It Li rd-2ndl-hits 2769Count 2nd level cache hits in the shared state. 2770.It Li rd-2ndl-hite 2771Count 2nd level cache hits in the exclusive state. 2772.It Li rd-2ndl-hitm 2773Count 2nd level cache hits in the modified state. 2774.It Li rd-3rdl-hits 2775Count 3rd level cache hits in the shared state. 2776.It Li rd-3rdl-hite 2777Count 3rd level cache hits in the exclusive state. 2778.It Li rd-3rdl-hitm 2779Count 3rd level cache hits in the modified state. 2780.It Li rd-2ndl-miss 2781Count 2nd level cache misses. 2782.It Li rd-3rdl-miss 2783Count 3rd level cache misses. 2784.It Li wr-2ndl-miss 2785Count write-back lookups from the data access cache that miss the 2nd 2786level cache. 2787.El 2788.Pp 2789The default is to count all the above events. 2790.It Li p4-execution-event Op Li ,mask= Ns Ar flags 2791.Pq "TS event" 2792Count the retirement of tagged uops selected through the execution 2793tagging mechanism. 2794Qualifier 2795.Ar flags 2796can contain the following strings separated by 2797.Ql + 2798characters: 2799.Pp 2800.Bl -tag -width indent -compact 2801.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3 2802The marked uops are not bogus. 2803.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3 2804The marked uops are bogus. 2805.El 2806.Pp 2807This event requires additional (upstream) events to be allocated to 2808perform the desired uop tagging. 2809The default is to set all the above flags. 2810This event can be used for precise event based sampling. 2811.It Li p4-front-end-event Op Li ,mask= Ns Ar flags 2812.Pq "TS event" 2813Count the retirement of tagged uops selected through the front-end 2814tagging mechanism. 2815Qualifier 2816.Ar flags 2817can contain the following strings separated by 2818.Ql + 2819characters: 2820.Pp 2821.Bl -tag -width indent -compact 2822.It Li nbogus 2823The marked uops are not bogus. 2824.It Li bogus 2825The marked uops are bogus. 2826.El 2827.Pp 2828This event requires additional (upstream) events to be allocated to 2829perform the desired uop tagging. 2830The default is to select both kinds of events. 2831This event can be used for precise event based sampling. 2832.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags 2833.Pq "TI event" 2834Count each DBSY or DRDY event selected by qualifier 2835.Ar flags . 2836Qualifier 2837.Ar flags 2838is a 2839.Ql + 2840separated set of the following flags: 2841.Pp 2842.Bl -tag -width indent -compact 2843.It Li drdy-drv 2844Count when this processor is driving data onto the bus. 2845.It Li drdy-own 2846Count when this processor is reading data from the bus. 2847.It Li drdy-other 2848Count when data is on the bus but not being sampled by this processor. 2849.It Li dbsy-drv 2850Count when this processor reserves the bus for use in the next cycle 2851in order to drive data. 2852.It Li dbsy-own 2853Count when some agent reserves the bus for use in the next bus cycle 2854to drive data that this processor will sample. 2855.It Li dbsy-other 2856Count when some agent reserves the bus for use in the next bus cycle 2857to drive data that this processor will not sample. 2858.El 2859.Pp 2860Flags 2861.Dq Li drdy-own 2862and 2863.Dq Li drdy-other 2864are mutually exclusive. 2865Flags 2866.Dq Li dbsy-own 2867and 2868.Dq Li dbsy-other 2869are mutually exclusive. 2870The default value for 2871.Ar qualifier 2872is 2873.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own . 2874.It Li p4-global-power-events Op Li ,mask= Ns Ar flags 2875.Pq "TS event" 2876Count cycles during which the processor is not stopped. 2877Qualifier 2878.Ar flags 2879can take the following value (which is also the default): 2880.Pp 2881.Bl -tag -width indent -compact 2882.It Li running 2883Count cycles when the processor is active. 2884.El 2885.Pp 2886.It Li p4-instr-retired Op Li ,mask= Ns Ar flags 2887.Pq "TS event" 2888Count instructions retired during a clock cycle. 2889Qualifer 2890.Ar flags 2891comprises of the following strings separated by 2892.Ql + 2893characters: 2894.Pp 2895.Bl -tag -width indent -compact 2896.It Li nbogusntag 2897Count non-bogus instructions that are not tagged. 2898.It Li nbogustag 2899Count non-bogus instructions that are tagged. 2900.It Li bogusntag 2901Count bogus instructions that are not tagged. 2902.It Li bogustag 2903Count bogus instructions that are tagged. 2904.El 2905.Pp 2906The default qualifier counts all the above kinds of instructions. 2907.It Li p4-ioq-active-entries Xo 2908.Op Li ,mask= Ns Ar qualifier 2909.Op Li ,busreqtype= Ns Ar req-type 2910.Xc 2911.Pq "TS event" 2912Count the number of entries (clipped at 15) in the IOQ that are 2913active. 2914The event masks are specified by qualifier 2915.Ar qualifier 2916and 2917.Ar req-type . 2918.Pp 2919Qualifier 2920.Ar qualifier 2921is a 2922.Ql + 2923separated set of the following flags: 2924.Pp 2925.Bl -tag -width indent -compact 2926.It Li all-read 2927Count read entries. 2928.It Li all-write 2929Count write entries. 2930.It Li mem-uc 2931Count entries accessing uncacheable memory. 2932.It Li mem-wc 2933Count entries accessing write-combining memory. 2934.It Li mem-wt 2935Count entries accessing write-through memory. 2936.It Li mem-wp 2937Count entries accessing write-protected memory 2938.It Li mem-wb 2939Count entries accessing write-back memory. 2940.It Li own 2941Count store requests driven by the processor (i.e., not by other 2942processors or by DMA). 2943.It Li other 2944Count store requests driven by other processors or by DMA. 2945.It Li prefetch 2946Include hardware and software prefetch requests in the count. 2947.El 2948.Pp 2949The default value for 2950.Ar qualifier 2951is to enable all the above flags. 2952.Pp 2953The 2954.Ar req-type 2955qualifier is a 5-bit number can be additionally used to select a 2956specific bus request type. 2957The default is 0. 2958.Pp 2959The 2960.Dq Li edge 2961qualifier should not be used when counting cycles with this event. 2962The exact behaviour of this event depends on the processor revision. 2963.It Li p4-ioq-allocation Xo 2964.Op Li ,mask= Ns Ar qualifier 2965.Op Li ,busreqtype= Ns Ar req-type 2966.Xc 2967.Pq "TS event" 2968Count various types of transactions on the bus matching the flags set 2969in 2970.Ar qualifier 2971and 2972.Ar req-type . 2973.Pp 2974Qualifier 2975.Ar qualifier 2976is a 2977.Ql + 2978separated set of the following flags: 2979.Pp 2980.Bl -tag -width indent -compact 2981.It Li all-read 2982Count read entries. 2983.It Li all-write 2984Count write entries. 2985.It Li mem-uc 2986Count entries accessing uncacheable memory. 2987.It Li mem-wc 2988Count entries accessing write-combining memory. 2989.It Li mem-wt 2990Count entries accessing write-through memory. 2991.It Li mem-wp 2992Count entries accessing write-protected memory 2993.It Li mem-wb 2994Count entries accessing write-back memory. 2995.It Li own 2996Count store requests driven by the processor (i.e., not by other 2997processors or by DMA). 2998.It Li other 2999Count store requests driven by other processors or by DMA. 3000.It Li prefetch 3001Include hardware and software prefetch requests in the count. 3002.El 3003.Pp 3004The default value for 3005.Ar qualifier 3006is to enable all the above flags. 3007.Pp 3008The 3009.Ar req-type 3010qualifier is a 5-bit number can be additionally used to select a 3011specific bus request type. 3012The default is 0. 3013.Pp 3014The 3015.Dq Li edge 3016qualifier is normally used with this event to prevent multiple 3017counting. 3018The exact behaviour of this event depends on the processor revision. 3019.It Li p4-itlb-reference Op mask= Ns Ar qualifier 3020.Pq "TS event" 3021Count translations using the intruction translation look-aside 3022buffer. 3023The 3024.Ar qualifier 3025argument is a list of the following strings separated by 3026.Ql + 3027characters. 3028.Pp 3029.Bl -tag -width indent -compact 3030.It Li hit 3031Count ITLB hits. 3032.It Li miss 3033Count ITLB misses. 3034.It Li hit-uc 3035Count uncacheable ITLB hits. 3036.El 3037.Pp 3038If no 3039.Ar qualifier 3040is specified the default is to count all the three kinds of ITLB 3041translations. 3042.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier 3043.Pq "TS event" 3044Count replayed events at the load port. 3045Qualifier 3046.Ar qualifier 3047can take on one value: 3048.Pp 3049.Bl -tag -width indent -compact 3050.It Li split-ld 3051Count split loads. 3052.El 3053.Pp 3054The default value for 3055.Ar qualifier 3056is 3057.Dq Li split-ld . 3058.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags 3059.Pq "TS event" 3060Count mispredicted IA-32 branch instructions. 3061Qualifier 3062.Ar flags 3063can take the following value (which is also the default): 3064.Pp 3065.Bl -tag -width indent -compact 3066.It Li nbogus 3067Count non-bogus retired branch instructions. 3068.El 3069.It Li p4-machine-clear Op Li ,mask= Ns Ar flags 3070.Pq "TS event" 3071Count the number of pipeline clears seen by the processor. 3072Qualifer 3073.Ar flags 3074is a list of the following strings separated by 3075.Ql + 3076characters: 3077.Pp 3078.Bl -tag -width indent -compact 3079.It Li clear 3080Count for a portion of the many cycles when the machine is being 3081cleared for any reason. 3082.It Li moclear 3083Count machine clears due to memory ordering issues. 3084.It Li smclear 3085Count machine clears due to self-modifying code. 3086.El 3087.Pp 3088Use qualifier 3089.Dq Li edge 3090to get a count of occurrences of machine clears. 3091The default qualifier is 3092.Dq Li clear . 3093.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list 3094.Pq "TS event" 3095Count the cancelling of various kinds of requests in the data cache 3096address control unit of the CPU. 3097The qualifier 3098.Ar event-list 3099is a list of the following strings separated by 3100.Ql + 3101characters: 3102.Pp 3103.Bl -tag -width indent -compact 3104.It Li st-rb-full 3105Requests cancelled because no store request buffer was available. 3106.It Li 64k-conf 3107Requests that conflict due to 64K aliasing. 3108.El 3109.Pp 3110If 3111.Ar event-list 3112is not specified, then the default is to count both kinds of events. 3113.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list 3114.Pq "TS event" 3115Count the completion of load split, store split, uncacheable split and 3116uncacheable load operations selected by qualifier 3117.Ar event-list . 3118The qualifier 3119.Ar event-list 3120is a 3121.Ql + 3122separated list of the following flags: 3123.Pp 3124.Bl -tag -width indent -compact 3125.It Li lsc 3126Count load splits completed, excluding loads from uncacheable or 3127write-combining areas. 3128.It Li ssc 3129Count any split stores completed. 3130.El 3131.Pp 3132The default is to count both kinds of operations. 3133.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier 3134.Pq "TS event" 3135Count load replays triggered by the memory order buffer. 3136Qualifier 3137.Ar qualifier 3138can be a 3139.Ql + 3140separated list of the following flags: 3141.Pp 3142.Bl -tag -width indent -compact 3143.It Li no-sta 3144Count replays because of unknown store addresses. 3145.It Li no-std 3146Count replays because of unknown store data. 3147.It Li partial-data 3148Count replays because of partially overlapped data accesses between 3149load and store operations. 3150.It Li unalgn-addr 3151Count replays because of mismatches in the lower 4 bits of load and 3152store operations. 3153.El 3154.Pp 3155The default qualifier is 3156.Ar no-sta+no-std+partial-data+unalgn-addr . 3157.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags 3158.Pq "TI event" 3159Count packed double-precision uops. 3160Qualifier 3161.Ar flags 3162can take the following value (which is also the default): 3163.Pp 3164.Bl -tag -width indent -compact 3165.It Li all 3166Count all uops operating on packed double-precision operands. 3167.El 3168.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags 3169.Pq "TI event" 3170Count packed single-precision uops. 3171Qualifier 3172.Ar flags 3173can take the following value (which is also the default): 3174.Pp 3175.Bl -tag -width indent -compact 3176.It Li all 3177Count all uops operating on packed single-precision operands. 3178.El 3179.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier 3180.Pq "TI event" 3181Count page walks performed by the page miss handler. 3182Qualifier 3183.Ar qualifier 3184can be a 3185.Ql + 3186separated list of the following keywords: 3187.Pp 3188.Bl -tag -width indent -compact 3189.It Li dtmiss 3190Count page walks for data TLB misses. 3191.It Li itmiss 3192Count page walks for instruction TLB misses. 3193.El 3194.Pp 3195The default value for 3196.Ar qualifier 3197is 3198.Dq Li dtmiss+itmiss . 3199.It Li p4-replay-event Op Li ,mask= Ns Ar flags 3200.Pq "TS event" 3201Count the retirement of tagged uops selected through the replay 3202tagging mechanism. 3203Qualifier 3204.Ar flags 3205contains a 3206.Ql + 3207separated set of the following strings: 3208.Pp 3209.Bl -tag -width indent -compact 3210.It Li nbogus 3211The marked uops are not bogus. 3212.It Li bogus 3213The marked uops are bogus. 3214.El 3215.Pp 3216This event requires additional (upstream) events to be allocated to 3217perform the desired uop tagging. 3218The default qualifier counts both kinds of uops. 3219This event can be used for precise event based sampling. 3220.It Li p4-resource-stall Op Li ,mask= Ns Ar flags 3221.Pq "TS event" 3222Count the occurrence or latency of stalls in the allocator. 3223Qualifier 3224.Ar flags 3225can take the following value (which is also the default): 3226.Pp 3227.Bl -tag -width indent -compact 3228.It Li sbfull 3229A stall due to the lack of store buffers. 3230.El 3231.It Li p4-response 3232.Pq "TI event" 3233Count different types of responses. 3234Further documentation on this event is not available. 3235.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags 3236.Pq "TS event" 3237Count branches retired. 3238Qualifier 3239.Ar flags 3240contains a 3241.Ql + 3242separated list of strings: 3243.Pp 3244.Bl -tag -width indent -compact 3245.It Li conditional 3246Count conditional jumps. 3247.It Li call 3248Count direct and indirect call branches. 3249.It Li return 3250Count return branches. 3251.It Li indirect 3252Count returns, indirect calls or indirect jumps. 3253.El 3254.Pp 3255The default qualifier counts all the above branch types. 3256.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags 3257.Pq "TS event" 3258Count mispredicted branches retired. 3259Qualifier 3260.Ar flags 3261contains a 3262.Ql + 3263separated list of strings: 3264.Pp 3265.Bl -tag -width indent -compact 3266.It Li conditional 3267Count conditional jumps. 3268.It Li call 3269Count indirect call branches. 3270.It Li return 3271Count return branches. 3272.It Li indirect 3273Count returns, indirect calls or indirect jumps. 3274.El 3275.Pp 3276The default qualifier counts all the above branch types. 3277.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags 3278.Pq "TI event" 3279Count the number of scalar double-precision uops. 3280Qualifier 3281.Ar flags 3282can take the following value (which is also the default): 3283.Pp 3284.Bl -tag -width indent -compact 3285.It Li all 3286Count the number of scalar double-precision uops. 3287.El 3288.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags 3289.Pq "TI event" 3290Count the number of scalar single-precision uops. 3291Qualifier 3292.Ar flags 3293can take the following value (which is also the default): 3294.Pp 3295.Bl -tag -width indent -compact 3296.It Li all 3297Count all uops operating on scalar single-precision operands. 3298.El 3299.It Li p4-snoop 3300.Pq "TI event" 3301Count snoop traffic. 3302Further documentation on this event is not available. 3303.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags 3304.Pq "TI event" 3305Count the number of times an assist is required to handle problems 3306with the operands for SSE and SSE2 operations. 3307Qualifier 3308.Ar flags 3309can take the following value (which is also the default): 3310.Pp 3311.Bl -tag -width indent -compact 3312.It Li all 3313Count assists for all SSE and SSE2 uops. 3314.El 3315.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier 3316.Pq "TS event" 3317Count events replayed at the store port. 3318Qualifier 3319.Ar qualifier 3320can take on one value: 3321.Pp 3322.Bl -tag -width indent -compact 3323.It Li split-st 3324Count split stores. 3325.El 3326.Pp 3327The default value for 3328.Ar qualifier 3329is 3330.Dq Li split-st . 3331.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier 3332.Pq "TI event" 3333Count the duration in cycles of operating modes of the trace cache and 3334decode engine. 3335The desired operating mode is selected by 3336.Ar qualifier , 3337which is a list of the following strings separated by 3338.Ql + 3339characters: 3340.Pp 3341.Bl -tag -width indent -compact 3342.It Li DD 3343Both logical processors are in deliver mode. 3344.It Li DB 3345Logical processor 0 is in deliver mode while logical processor 1 is in 3346build mode. 3347.It Li DI 3348Logical processor 0 is in deliver mode while logical processor 1 is 3349halted, or in machine clear, or transitioning to a long microcode 3350flow. 3351.It Li BD 3352Logical processor 0 is in build mode while logical processor 1 is in 3353deliver mode. 3354.It Li BB 3355Both logical processors are in build mode. 3356.It Li BI 3357Logical processor 0 is in build mode while logical processor 1 is 3358halted, or in machine clear or transitioning to a long microcode 3359flow. 3360.It Li ID 3361Logical processor 0 is halted, or in machine clear or transitioning to 3362a long microcode flow while logical processor 1 is in deliver mode. 3363.It Li IB 3364Logical processor 0 is halted, or in machine clear or transitioning to 3365a long microcode flow while logical processor 1 is in build mode. 3366.El 3367.Pp 3368If there is only one logical processor in the processor package then 3369the qualifier for logical processor 1 is ignored. 3370If no qualifier is specified, the default qualifier is 3371.Dq Li DD+DB+DI+BD+BB+BI+ID+IB . 3372.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags 3373.Pq "TI event" 3374Count the number of times uop delivery changed from the trace cache to 3375MS ROM. 3376Qualifier 3377.Ar flags 3378can take the following value (which is also the default): 3379.Pp 3380.Bl -tag -width indent -compact 3381.It Li cisc 3382Count TC to MS transfers. 3383.El 3384.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags 3385.Pq "TS event" 3386Count the number of valid uops written to the uop queue. 3387Qualifier 3388.Ar flags 3389is a list of the following strings, separated by 3390.Ql + 3391characters: 3392.Pp 3393.Bl -tag -width indent -compact 3394.It Li from-tc-build 3395Count uops being written from the trace cache in build mode. 3396.It Li from-tc-deliver 3397Count uops being written from the trace cache in deliver mode. 3398.It Li from-rom 3399Count uops being written from microcode ROM. 3400.El 3401.Pp 3402The default qualifier counts all the above kinds of uops. 3403.It Li p4-uop-type Op Li ,mask= Ns Ar flags 3404.Pq "TS event" 3405This event is used in conjunction with the front-end at-retirement 3406mechanism to tag load and store uops. 3407Qualifer 3408.Ar flags 3409comprises the following strings separated by 3410.Ql + 3411characters: 3412.Pp 3413.Bl -tag -width indent -compact 3414.It Li tagloads 3415Mark uops that are load operations. 3416.It Li tagstores 3417Mark uops that are store operations. 3418.El 3419.Pp 3420The default qualifier counts both kinds of uops. 3421.It Li p4-uops-retired Op Li ,mask= Ns Ar flags 3422.Pq "TS event" 3423Count uops retired during a clock cycle. 3424Qualifier 3425.Ar flags 3426comprises the following strings separated by 3427.Ql + 3428characters: 3429.Pp 3430.Bl -tag -width indent -compact 3431.It Li nbogus 3432Count marked uops that are not bogus. 3433.It Li bogus 3434Count marked uops that are bogus. 3435.El 3436.Pp 3437The default qualifier counts both kinds of uops. 3438.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags 3439.Pq "TI event" 3440Count write-combining buffer operations. 3441Qualifier 3442.Ar flags 3443contains the following strings separated by 3444.Ql + 3445characters: 3446.Pp 3447.Bl -tag -width indent -compact 3448.It Li wcb-evicts 3449WC buffer evictions due to any cause. 3450.It Li wcb-full-evict 3451WC buffer evictions due to no WC buffer being available. 3452.El 3453.Pp 3454The default qualifer counts both kinds of evictions. 3455.It Li p4-x87-assist Op Li ,mask= Ns Ar flags 3456.Pq "TS event" 3457Count the retirement of x87 instructions that required special 3458handling. 3459Qualifier 3460.Ar flags 3461contains the following strings separated by 3462.Ql + 3463characters: 3464.Pp 3465.Bl -tag -width indent -compact 3466.It Li fpsu 3467Count instructions that saw an FP stack underflow. 3468.It Li fpso 3469Count instructions that saw an FP stack overflow. 3470.It Li poao 3471Count instructions that saw an x87 output overflow. 3472.It Li poau 3473Count instructions that saw an x87 output underflow. 3474.It Li prea 3475Count instructions that needed an x87 input assist. 3476.El 3477.Pp 3478The default qualifier counts all the above types of instruction 3479retirements. 3480.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags 3481.Pq "TI event" 3482Count x87 floating-point uops. 3483Qualifier 3484.Ar flags 3485can take the following value (which is also the default): 3486.Pp 3487.Bl -tag -width indent -compact 3488.It Li all 3489Count all x87 floating-point uops. 3490.El 3491.Pp 3492If an instruction contains more than one x87 floating-point uops, then 3493all x87 floating-point uops will be counted. 3494This event does not count x87 floating-point data movement operations. 3495.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags 3496.Pq "TI event" 3497Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store 3498data or perform register-to-register moves. 3499This event does not count integer move uops. 3500Qualifier 3501.Ar flags 3502may contain the following keywords separated by 3503.Ql + 3504characters: 3505.Pp 3506.Bl -tag -width indent -compact 3507.It Li allp0 3508Count all x87 and SIMD store and move uops. 3509.It Li allp2 3510Count all x87 and SIMD load uops. 3511.El 3512.Pp 3513The default is to count all uops. 3514.Pq Errata 3515This event may be affected by processor errata N43. 3516.El 3517.Ss "Cascading P4 PMCs" 3518PMC cascading support is currently poorly implemented. 3519While individual event counters may be allocated with a 3520.Dq Li cascade 3521qualifier, the current API does not offer the ability 3522to name and allocate all the resources needed for a 3523cascaded event counter pair in a single operation. 3524.Ss "Precise Event Based Sampling" 3525Support for precise event based sampling is currently 3526unimplemented. 3527.Sh COMPATIBILITY 3528The interface between the 3529.Nm pmc 3530library and the 3531.Xr hwpmc 4 3532driver is intended to be private to the implementation and may 3533change. 3534In order to ease forward compatibility with future versions of the 3535.Xr hwpmc 4 3536driver, applications are urged to dynamically link with the 3537.Nm pmc 3538library. 3539.Pp 3540The 3541.Nm pmc 3542API is 3543.Ud 3544.Sh SEE ALSO 3545.Xr pmclog 3 , 3546.Xr hwpmc 4 , 3547.Xr pmccontrol 8 , 3548.Xr pmcstat 8 3549.Sh HISTORY 3550The 3551.Nm pmc 3552library first appeared in 3553.Fx 6.0 . 3554.Sh AUTHORS 3555The 3556.Lb libpmc 3557library was written by 3558.An "Joseph Koshy" 3559.Aq jkoshy@FreeBSD.org . 3560