1.\" Copyright (c) 2003-2007 Joseph Koshy. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" This software is provided by Joseph Koshy ``as is'' and 13.\" any express or implied warranties, including, but not limited to, the 14.\" implied warranties of merchantability and fitness for a particular purpose 15.\" are disclaimed. in no event shall Joseph Koshy be liable 16.\" for any direct, indirect, incidental, special, exemplary, or consequential 17.\" damages (including, but not limited to, procurement of substitute goods 18.\" or services; loss of use, data, or profits; or business interruption) 19.\" however caused and on any theory of liability, whether in contract, strict 20.\" liability, or tort (including negligence or otherwise) arising in any way 21.\" out of the use of this software, even if advised of the possibility of 22.\" such damage. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd November 25, 2007 27.Os 28.Dt PMC 3 29.Sh NAME 30.Nm pmc 31.Nd library for accessing hardware performance monitoring counters 32.Sh LIBRARY 33.Lb libpmc 34.Sh SYNOPSIS 35.In pmc.h 36.Sh DESCRIPTION 37The 38.Lb libpmc 39provides a programming interface that allows applications to use 40hardware performance counters to gather performance data about 41specific processes or for the system as a whole. 42The library is implemented using the lower-level facilities offered by 43the 44.Xr hwpmc 4 45driver. 46.Ss Key Concepts 47Performance monitoring counters (PMCs) are represented by the library 48using a software abstraction. 49These 50.Dq abstract 51PMCs can have one two scopes: 52.Bl -bullet 53.It 54System scope. 55These PMCs measure events in a whole-system manner, i.e., independent 56of the currently executing thread. 57System scope PMCs are allocated on specific CPUs and do not 58migrate between CPUs. 59Non-privileged process are allowed to allocate system scope PMCs if the 60.Xr hwpmc 4 61sysctl tunable: 62.Va security.bsd.unprivileged_syspmcs 63is non-zero. 64.It 65Process scope. 66These PMCs only measure hardware events when the processes they are 67attached to are executing on a CPU. 68In an SMP system, process scope PMCs migrate between CPUs along with 69their target processes. 70.El 71.Pp 72Orthogonal to PMC scope, PMCs may be allocated in one of two 73operational modes: 74.Bl -bullet 75.It 76Counting PMCs measure events according to their scope 77(system or process). 78The application needs to explicitly read these counters 79to retrieve their value. 80.It 81Sampling PMCs cause the CPU to be periodically interrupted 82and information about its state of execution to be collected. 83Sampling PMCs are used to profile specific processes and kernel 84threads or to profile the system as a whole. 85.El 86.Pp 87The scope and operational mode for a software PMC are specified at 88PMC allocation time. 89An application is allowed to allocate multiple PMCs subject 90to availability of hardware resources. 91.Pp 92The library uses human-readable strings to name the event being 93measured by hardware. 94The syntax used for specifying a hardware event along with additional 95event specific qualifiers (if any) is described in detail in section 96.Sx "EVENT SPECIFIERS" 97below. 98.Pp 99PMCs are associated with the process that allocated them and 100will be automatically reclaimed by the system when the process exits. 101Additionally, process-scope PMCs have to be attached to one or more 102target processes before they can perform measurements. 103A process-scope PMC may be attached to those target processes 104that its owner process would otherwise be permitted to debug. 105An owner process may attach PMCs to itself allowing 106it to measure its own behavior. 107Additionally, on some machine architectures, such self-attached PMCs 108may be read cheaply using specialized instructions supported by the 109processor. 110.Pp 111Certain kinds of PMCs require that a log file be configured before 112they may be started. 113These include: 114.Bl -bullet -compact 115.It 116System scope sampling PMCs. 117.It 118Process scope sampling PMCs. 119.It 120Process scope counting PMCs that have been configured to report PMC 121readings on process context switches or process exits. 122.El 123Upto one log file may be configured per owner process. 124Events logged to a log file may be subsequently analyzed using the 125.Xr pmclog 3 126family of functions. 127.Ss Supported CPUs 128The CPUs known to the PMC library are named by the 129.Vt "enum pmc_cputype" 130enumeration. 131Supported CPUs include: 132.Bl -tag -width PMC_CPU_INTEL_PIII -compact 133.It PMC_CPU_AMD_K7 134.Tn "AMD Athlon" 135CPUs. 136.It PMC_CPU_AMD_K8 137.Tn "AMD Athlon64" 138CPUs. 139.It PMC_CPU_INTEL_P6 140.Tn Intel 141.Tn "Pentium Pro" 142CPUs. 143.It PMC_CPU_INTEL_PII 144.Tn "Intel Pentium II" 145CPUs. 146.It PMC_CPU_INTEL_PIII 147.Tn "Intel Pentium III" 148CPUs. 149.It PMC_CPU_INTEL_PM 150.Tn "Intel Pentium M" 151CPUs. 152.It PMC_CPU_INTEL_PIV 153.Tn "Intel Pentium 4" 154CPUs. 155.El 156.Ss Supported PMCs 157PMC supported by this library are named by the 158.Vt enum pmc_class 159enumeration. 160Supported PMC kinds include: 161.Bl -tag -width PMC_CLASS_TSC -compact 162.It PMC_CLASS_TSC 163The timestamp counter on i386 and amd64 architecture CPUs. 164.It PMC_CLASS_K7 165Programmable hardware counters present in 166.Tn "AMD Athlon" 167CPUs. 168.It PMC_CLASS_K8 169Programmable hardware counters present in 170.Tn "AMD Athlon64" 171CPUs. 172.It PMC_CLASS_P6 173Programmable hardware counters present in 174.Tn Intel 175.Tn "Pentium Pro" , 176.Tn "Pentium II" , 177.Tn "Pentium III" , 178.Tn "Celeron" , 179and 180.Tn "Pentium M" 181CPUs. 182.It PMC_CLASS_P4 183Programmable hardware counters present in 184.Tn "Intel Pentium 4" 185CPUs. 186.El 187.Ss PMC Capabilities 188.Pp 189Capabilities of performance monitoring hardware are denoted using 190the 191.Vt "enum pmc_caps" 192enumeration. 193Supported capabilities include: 194.Bl -tag -width "PMC_CAP_INTERRUPT" -compact 195.It PMC_CAP_EDGE 196The ability to count negated to asserted transitions of the hardware 197conditions being probed for. 198.It PMC_CAP_INTERRUPT 199The ability to interrupt the CPU. 200.It PMC_CAP_INVERT 201The ability to invert the sense of the hardware conditions being 202measured. 203.It PMC_CAP_READ 204PMC hardware allows the CPU to read performance counters. 205.It PMC_CAP_QUALIFIER 206The hardware allows monitored to be further qualified in some 207system dependent way. 208.It PMC_CAP_SYSTEM 209The ability to restrict counting of hardware events to when the CPU is 210running privileged code. 211.It PMC_CAP_THRESHOLD 212The ability to ignore simultaneous hardware events below a 213programmable threshold. 214.It PMC_CAP_USER 215The ability to restrict counting of hardware events to those when the 216CPU is running unprivileged code. 217.It PMC_CAP_WRITE 218PMC hardware allows CPUs write to counters. 219.El 220.Ss Functional Grouping 221This section contains a brief overview of the available functionality 222in the PMC library. 223Each function listed here is described further in its own manual page. 224.Bl -tag -width indent 225.It Administration 226.Bl -tag -compact 227.It Fn pmc_disable , Fn pmc_enable 228Administratively disable (enable) specific performance monitoring 229counter hardware. 230Counters that are disabled will not be available to applications to 231use. 232.El 233.It "Convenience Functions" 234.Bl -tag -compact 235.It Fn pmc_event_names_of_class 236Returns a list of event names supported by a given PMC type. 237.It Fn pmc_name_of_capability 238Convert a 239.Dv PMC_CAP_* 240flag to a human-readable string. 241.It Fn pmc_name_of_class 242Convert a 243.Dv PMC_CLASS_* 244constant to a human-readable string. 245.It Fn pmc_name_of_cputype 246Return a human-readable name for a CPU type. 247.It Fn pmc_name_of_disposition 248Return a human-readable string describing a PMC's disposition. 249.It Fn pmc_name_of_event 250Convert a numeric event code to a human-readable string. 251.It Fn pmc_name_of_mode 252Convert a 253.Dv PMC_MODE_* 254constant to a human-readable name. 255.It Fn pmc_name_of_state 256Return a human-readable string describing a PMC's current state. 257.El 258.It "Library Initialization" 259.Bl -tag -compact 260.It Fn pmc_init 261Initialize the library. 262This function must be called before any other library function. 263.El 264.It "Log File Handling" 265.Bl -tag -compact 266.It Fn pmc_configure_logfile 267Configure a log file for 268.Xr hwpmc 4 269to write logged events to. 270.It Fn pmc_flush_logfile 271Flush all pending log data in 272.Xr hwpmc 4 Ns Ap s 273buffers. 274.It Fn pmc_writelog 275Append arbitrary user data to the current log file. 276.El 277.It "PMC Management" 278.Bl -tag -compact 279.It Fn pmc_allocate , Fn pmc_release 280Allocate (free) a PMC. 281.It Fn pmc_attach , Fn pmc_detach 282Attach (detach) a process scope PMC to a target. 283.It Fn pmc_read , Fn pmc_write , Fn pmc_rw 284Read (write) a value from (to) a PMC. 285.It Fn pmc_start , Fn pmc_stop 286Start (stop) a software PMC. 287.It Fn pmc_set 288Set the reload value for a sampling PMC. 289.El 290.It "Queries" 291.Bl -tag -compact 292.It Fn pmc_capabilities 293Retrieve the capabilities for a given PMC. 294.It Fn pmc_cpuinfo 295Retrieve information about the CPUs and PMC hardware present in the 296system. 297.It Fn pmc_get_driver_stats 298Retrieve statistics maintained by 299.Xr hwpmc 4 . 300.It Fn pmc_ncpu 301Determine the number of CPUs in the system. 302.It Fn pmc_npmc 303Return the number of hardware PMCs present in a given CPU. 304.It Fn pmc_pmcinfo 305Return information about the state of a given CPU's PMCs. 306.It Fn pmc_width 307Determine the width of a hardware counter in bits. 308.El 309.It "x86 Architecture Specific API" 310.Bl -tag -compact 311.It Fn pmc_get_msr 312Returns the processor model specific register number 313associated with 314.Fa pmc . 315Applications may then use the x86 316.Ic RDPMC 317instruction to directly read the contents of the PMC. 318.El 319.El 320.Ss Signal Handling Requirements 321Applications using PMCs are required to handle the following signals: 322.Bl -tag -width ".Dv SIGBUS" 323.It Dv SIGBUS 324When the 325.Xr hwpmc 4 326module is unloaded using 327.Xr kldunload 8 , 328processes that have PMCs allocated to them will be sent a 329.Dv SIGBUS 330signal. 331.It Dv SIGIO 332The 333.Xr hwpmc 4 334driver will send a PMC owning process a 335.Dv SIGIO 336signal if: 337.Bl -bullet 338.It 339If any process-mode PMC allocated by it loses all its 340target processes. 341.It 342If the driver encounters an error when writing log data to a 343configured log file. 344This error may be retrieved by a subsequent call to 345.Fn pmc_flush_logfile . 346.El 347.El 348.Ss Typical Program Flow 349.Bl -enum 350.It 351An application would first invoke function 352.Fn pmc_init 353to allow the library to initialize itself. 354.It 355Signal handling would then be set up. 356.It 357Next the application would allocate the PMCs it desires using function 358.Fn pmc_allocate . 359.It 360Initial values for PMCs may be set using function 361.Fn pmc_set . 362.It 363If a log file is necessary for the PMCs to work, it would 364be configured using function 365.Fn pmc_configure_logfile . 366.It 367Process scope PMCs would then be attached to their target processes 368using function 369.Fn pmc_attach . 370.It 371The PMCs would then be started using function 372.Fn pmc_start . 373.It 374Once started, the values of counting PMCs may be read using function 375.Fn pmc_start . 376For PMCs that write events to the log file, this logged data would be 377read and parsed using the 378.Xr pmclog 3 379family of functions. 380.It 381PMCs are stopped using function 382.Fn pmc_stop , 383and process scope PMCs are detached from their targets using 384function 385.Fn pmc_detach . 386.It 387Before the process exits, its may release its PMCs using function 388.Fn pmc_release . 389Any configured log file may be closed using function 390.Fn pmc_configure_logfile . 391.El 392.Sh EVENT SPECIFIERS 393Event specifiers are strings comprising of an event name, followed by 394optional parameters modifying the semantics of the hardware event 395being probed. 396Event names are PMC architecture dependent, but the 397.Xr hwpmc 4 398library defines machine independent aliases for commonly used 399events. 400.Ss Event Name Aliases 401Event name aliases are CPU architecture independent names for commonly 402used events. 403The following aliases are known to this version of the 404.Nm pmc 405library: 406.Bl -tag -width indent 407.It Li branches 408Measure the number of branches retired. 409.It Li branch-mispredicts 410Measure the number of retired branches that were mispredicted. 411.It Li cycles 412Measure processor cycles. 413This event is implemented using the processor's Time Stamp Counter 414register. 415.It Li dc-misses 416Measure the number of data cache misses. 417.It Li ic-misses 418Measure the number of instruction cache misses. 419.It Li instructions 420Measure the number of instructions retired. 421.It Li interrupts 422Measure the number of interrupts seen. 423.It Li unhalted-cycles 424Measure the number of cycles the processor is not in a halted 425or sleep state. 426.El 427.Ss Time Stamp Counter (TSC) 428The timestamp counter is a monotonically non-decreasing counter that 429counts processor cycles. 430.Pp 431In the i386 architecture, this counter may 432be selected by requesting an event with event specifier 433.Dq Li tsc . 434The 435.Dq Li tsc 436event does not support any further qualifiers. 437It can only be allocated in system-wide counting mode, 438and is a read-only counter. 439Multiple processes are allowed to allocate the TSC. 440Once allocated, it may be read using the 441.Fn pmc_read 442function, or by using the RDTSC instruction. 443.Ss AMD (K7) PMCs 444These PMCs are present in the 445.Tn "AMD Athlon" 446series of CPUs and are documented in: 447.Rs 448.%B "AMD Athlon Processor x86 Code Optimization Guide" 449.%N "Publication No. 22007" 450.%D "February 2002" 451.%Q "Advanced Micro Devices, Inc." 452.Re 453.Pp 454Event specifiers for AMD K7 PMCs can have the following optional 455qualifiers: 456.Bl -tag -width indent 457.It Li count= Ns Ar value 458Configure the counter to increment only if the number of configured 459events measured in a cycle is greater than or equal to 460.Ar value . 461.It Li edge 462Configure the counter to only count negated-to-asserted transitions 463of the conditions expressed by the other qualifiers. 464In other words, the counter will increment only once whenever a given 465condition becomes true, irrespective of the number of clocks during 466which the condition remains true. 467.It Li inv 468Invert the sense of comparision when the 469.Dq Li count 470qualifier is present, making the counter to increment when the 471number of events per cycle is less than the value specified by 472the 473.Dq Li count 474qualifier. 475.It Li os 476Configure the PMC to count events happening at privilege level 0. 477.It Li unitmask= Ns Ar mask 478This qualifier is used to further qualify a select few events, 479.Dq Li k7-dc-refills-from-l2 , 480.Dq Li k7-dc-refills-from-system 481and 482.Dq Li k7-dc-writebacks . 483Here 484.Ar mask 485is a string of the following characters optionally separated by 486.Ql + 487characters: 488.Pp 489.Bl -tag -width indent -compact 490.It Li m 491Count operations for lines in the 492.Dq Modified 493state. 494.It Li o 495Count operations for lines in the 496.Dq Owner 497state. 498.It Li e 499Count operations for lines in the 500.Dq Exclusive 501state. 502.It Li s 503Count operations for lines in the 504.Dq Shared 505state. 506.It Li i 507Count operations for lines in the 508.Dq Invalid 509state. 510.El 511.Pp 512If no 513.Dq Li unitmask 514qualifier is specified, the default is to count events for caches 515lines in any of the above states. 516.It Li usr 517Configure the PMC to count events occurring at privilege levels 1, 2 518or 3. 519.El 520.Pp 521If neither of the 522.Dq Li os 523or 524.Dq Li usr 525qualifiers were specified, the default is to enable both. 526.Pp 527The event specifiers supported on AMD K7 PMCs are: 528.Bl -tag -width indent 529.It Li k7-dc-accesses 530Count data cache accesses. 531.It Li k7-dc-misses 532Count data cache misses. 533.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask 534Count data cache refills from L2 cache. 535This event may be further qualified using the 536.Dq Li unitmask 537qualifier. 538.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask 539Count data cache refills from system memory. 540This event may be further qualified using the 541.Dq Li unitmask 542qualifier. 543.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask 544Count data cache writebacks. 545This event may be further qualified using the 546.Dq Li unitmask 547qualifier. 548.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits 549Count L1 DTLB misses and L2 DTLB hits. 550.It Li k7-l1-and-l2-dtlb-misses 551Count L1 and L2 DTLB misses. 552.It Li k7-misaligned-references 553Count misaligned data references. 554.It Li k7-ic-fetches 555Count instruction cache fetches. 556.It Li k7-ic-misses 557Count instruction cache misses. 558.It Li k7-l1-itlb-misses 559Count L1 ITLB misses that are L2 ITLB hits. 560.It Li k7-l1-l2-itlb-misses 561Count L1 (and L2) ITLB misses. 562.It Li k7-retired-instructions 563Count all retired instructions. 564.It Li k7-retired-ops 565Count retired ops. 566.It Li k7-retired-branches 567Count all retired branches (conditional, unconditional, exceptions 568and interrupts). 569.It Li k7-retired-branches-mispredicted 570Count all misprediced retired branches. 571.It Li k7-retired-taken-branches 572Count retired taken branches. 573.It Li k7-retired-taken-branches-mispredicted 574Count mispredicted taken branches that were retired. 575.It Li k7-retired-far-control-transfers 576Count retired far control transfers. 577.It Li k7-retired-resync-branches 578Count retired resync branches (non control transfer branches). 579.It Li k7-interrupts-masked-cycles 580Count the number of cycles when the processor's 581.Va IF 582flag was zero. 583.It Li k7-interrupts-masked-while-pending-cycles 584Count the number of cycles interrupts were masked while pending due 585to the processor's 586.Va IF 587flag being zero. 588.It Li k7-hardware-interrupts 589Count the number of taken hardware interrupts. 590.El 591.Ss AMD (K8) PMCs 592These PMCs are present in the 593.Tn "AMD Athlon64" 594and 595.Tn "AMD Opteron" 596series of CPUs. 597They are documented in: 598.Rs 599.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors" 600.%N "Publication No. 26094" 601.%D "April 2004" 602.%Q "Advanced Micro Devices, Inc." 603.Re 604.Pp 605Event specifiers for AMD K8 PMCs can have the following optional 606qualifiers: 607.Bl -tag -width indent 608.It Li count= Ns Ar value 609Configure the counter to increment only if the number of configured 610events measured in a cycle is greater than or equal to 611.Ar value . 612.It Li edge 613Configure the counter to only count negated-to-asserted transitions 614of the conditions expressed by the other fields. 615In other words, the counter will increment only once whenever a given 616condition becomes true, irrespective of the number of clocks during 617which the condition remains true. 618.It Li inv 619Invert the sense of comparision when the 620.Dq Li count 621qualifier is present, making the counter to increment when the 622number of events per cycle is less than the value specified by 623the 624.Dq Li count 625qualifier. 626.It Li mask= Ns Ar qualifier 627Many event specifiers for AMD K8 PMCs need to be additionally 628qualified using a mask qualifier. 629These additional qualifiers are event-specific and are documented 630along with their associated event specifiers below. 631.It Li os 632Configure the PMC to count events happening at privilege level 0. 633.It Li usr 634Configure the PMC to count events occurring at privilege levels 1, 2 635or 3. 636.El 637.Pp 638If neither of the 639.Dq Li os 640or 641.Dq Li usr 642qualifiers were specified, the default is to enable both. 643.Pp 644The event specifiers supported on AMD K8 PMCs are: 645.Bl -tag -width indent 646.It Li k8-bu-cpu-clk-unhalted 647Count the number of clock cycles when the CPU is not in the HLT or 648STPCLK states. 649.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier 650Count fill requests that missed in the L2 cache. 651This event may be further qualified using 652.Ar qualifier , 653which is a 654.Ql + 655separated set of the following keywords: 656.Pp 657.Bl -tag -width indent -compact 658.It Li dc-fill 659Count data cache fill requests. 660.It Li ic-fill 661Count instruction cache fill requests. 662.It Li tlb-reload 663Count TLB reloads. 664.El 665.Pp 666The default is to count all types of requests. 667.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier 668Count internally generated requests to the L2 cache. 669This event may be further qualified using 670.Ar qualifier , 671which is a 672.Ql + 673separated set of the following keywords: 674.Pp 675.Bl -tag -width indent -compact 676.It Li cancelled 677Count cancelled requests. 678.It Li dc-fill 679Count data cache fill requests. 680.It Li ic-fill 681Count instruction cache fill requests. 682.It Li tag-snoop 683Count tag snoop requests. 684.It Li tlb-reload 685Count TLB reloads. 686.El 687.Pp 688The default is to count all types of requests. 689.It Li k8-dc-access 690Count data cache accesses including microcode scratchpad accesses. 691.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier 692Count data cache copyback operations. 693This event may be further qualified using 694.Ar qualifier , 695which is a 696.Ql + 697separated set of the following keywords: 698.Pp 699.Bl -tag -width indent -compact 700.It Li exclusive 701Count operations for lines in the 702.Dq exclusive 703state. 704.It Li invalid 705Count operations for lines in the 706.Dq invalid 707state. 708.It Li modified 709Count operations for lines in the 710.Dq modified 711state. 712.It Li owner 713Count operations for lines in the 714.Dq owner 715state. 716.It Li shared 717Count operations for lines in the 718.Dq shared 719state. 720.El 721.Pp 722The default is to count operations for lines in all the 723above states. 724.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier 725Count data cache accesses by lock instructions. 726This event is only available on processors of revision C or later 727vintage. 728This event may be further qualified using 729.Ar qualifier , 730which is a 731.Ql + 732separated set of the following keywords: 733.Pp 734.Bl -tag -width indent -compact 735.It Li accesses 736Count data cache accesses by lock instructions. 737.It Li misses 738Count data cache misses by lock instructions. 739.El 740.Pp 741The default is to count all accesses. 742.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier 743Count the number of dispatched prefetch instructions. 744This event may be further qualified using 745.Ar qualifier , 746which is a 747.Ql + 748separated set of the following keywords: 749.Pp 750.Bl -tag -width indent -compact 751.It Li load 752Count load operations. 753.It Li nta 754Count non-temporal operations. 755.It Li store 756Count store operations. 757.El 758.Pp 759The default is to count all operations. 760.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit 761Count L1 DTLB misses that are L2 DTLB hits. 762.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss 763Count L1 DTLB misses that are also misses in the L2 DTLB. 764.It Li k8-dc-microarchitectural-early-cancel-of-an-access 765Count microarchitectural early cancels of data cache accesses. 766.It Li k8-dc-microarchitectural-late-cancel-of-an-access 767Count microarchitectural late cancels of data cache accesses. 768.It Li k8-dc-misaligned-data-reference 769Count misaligned data references. 770.It Li k8-dc-miss 771Count data cache misses. 772.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier 773Count one bit ECC errors found by the scrubber. 774This event may be further qualified using 775.Ar qualifier , 776which is a 777.Ql + 778separated set of the following keywords: 779.Pp 780.Bl -tag -width indent -compact 781.It Li scrubber 782Count scrubber detected errors. 783.It Li piggyback 784Count piggyback scrubber errors. 785.El 786.Pp 787The default is to count both kinds of errors. 788.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier 789Count data cache refills from L2 cache. 790This event may be further qualified using 791.Ar qualifier , 792which is a 793.Ql + 794separated set of the following keywords: 795.Pp 796.Bl -tag -width indent -compact 797.It Li exclusive 798Count operations for lines in the 799.Dq exclusive 800state. 801.It Li invalid 802Count operations for lines in the 803.Dq invalid 804state. 805.It Li modified 806Count operations for lines in the 807.Dq modified 808state. 809.It Li owner 810Count operations for lines in the 811.Dq owner 812state. 813.It Li shared 814Count operations for lines in the 815.Dq shared 816state. 817.El 818.Pp 819The default is to count operations for lines in all the 820above states. 821.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier 822Count data cache refills from system memory. 823This event may be further qualified using 824.Ar qualifier , 825which is a 826.Ql + 827separated set of the following keywords: 828.Pp 829.Bl -tag -width indent -compact 830.It Li exclusive 831Count operations for lines in the 832.Dq exclusive 833state. 834.It Li invalid 835Count operations for lines in the 836.Dq invalid 837state. 838.It Li modified 839Count operations for lines in the 840.Dq modified 841state. 842.It Li owner 843Count operations for lines in the 844.Dq owner 845state. 846.It Li shared 847Count operations for lines in the 848.Dq shared 849state. 850.El 851.Pp 852The default is to count operations for lines in all the 853above states. 854.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier 855Count the number of dispatched FPU ops. 856This event is supported in revision B and later CPUs. 857This event may be further qualified using 858.Ar qualifier , 859which is a 860.Ql + 861separated set of the following keywords: 862.Pp 863.Bl -tag -width indent -compact 864.It Li add-pipe-excluding-junk-ops 865Count add pipe ops excluding junk ops. 866.It Li add-pipe-junk-ops 867Count junk ops in the add pipe. 868.It Li multiply-pipe-excluding-junk-ops 869Count multiply pipe ops excluding junk ops. 870.It Li multiply-pipe-junk-ops 871Count junk ops in the multiply pipe. 872.It Li store-pipe-excluding-junk-ops 873Count store pipe ops excluding junk ops 874.It Li store-pipe-junk-ops 875Count junk ops in the store pipe. 876.El 877.Pp 878The default is to count all types of ops. 879.It Li k8-fp-cycles-with-no-fpu-ops-retired 880Count cycles when no FPU ops were retired. 881This event is supported in revision B and later CPUs. 882.It Li k8-fp-dispatched-fpu-fast-flag-ops 883Count dispatched FPU ops that use the fast flag interface. 884This event is supported in revision B and later CPUs. 885.It Li k8-fr-decoder-empty 886Count cycles when there was nothing to dispatch (i.e., the decoder 887was empty). 888.It Li k8-fr-dispatch-stalls 889Count all dispatch stalls. 890.It Li k8-fr-dispatch-stall-for-segment-load 891Count dispatch stalls for segment loads. 892.It Li k8-fr-dispatch-stall-for-serialization 893Count dispatch stalls for serialization. 894.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire 895Count dispatch stalls from branch abort to retiral. 896.It Li k8-fr-dispatch-stall-when-fpu-is-full 897Count dispatch stalls when the FPU is full. 898.It Li k8-fr-dispatch-stall-when-ls-is-full 899Count dispatch stalls when the load/store unit is full. 900.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full 901Count dispatch stalls when the reorder buffer is full. 902.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full 903Count dispatch stalls when reservation stations are full. 904.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet 905Count dispatch stalls when waiting for all to be quiet. 906.\" XXX What does "waiting for all to be quiet" mean? 907.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending 908Count dispatch stalls when a far control transfer or a resync branch 909is pending. 910.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier 911Count FPU exceptions. 912This event is supported in revision B and later CPUs. 913This event may be further qualified using 914.Ar qualifier , 915which is a 916.Ql + 917separated set of the following keywords: 918.Pp 919.Bl -tag -width indent -compact 920.It Li sse-and-x87-microtraps 921Count SSE and x87 microtraps. 922.It Li sse-reclass-microfaults 923Count SSE reclass microfaults 924.It Li sse-retype-microfaults 925Count SSE retype microfaults 926.It Li x87-reclass-microfaults 927Count x87 reclass microfaults. 928.El 929.Pp 930The default is to count all types of exceptions. 931.It Li k8-fr-interrupts-masked-cycles 932Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero). 933.It Li k8-fr-interrupts-masked-while-pending-cycles 934Count cycles while interrupts were masked while pending (i.e., cycles 935when INTR was asserted while CPU RFLAGS field IF was zero). 936.It Li k8-fr-number-of-breakpoints-for-dr0 937Count the number of breakpoints for DR0. 938.It Li k8-fr-number-of-breakpoints-for-dr1 939Count the number of breakpoints for DR1. 940.It Li k8-fr-number-of-breakpoints-for-dr2 941Count the number of breakpoints for DR2. 942.It Li k8-fr-number-of-breakpoints-for-dr3 943Count the number of breakpoints for DR3. 944.It Li k8-fr-retired-branches 945Count retired branches including exceptions and interrupts. 946.It Li k8-fr-retired-branches-mispredicted 947Count mispredicted retired branches. 948.It Li k8-fr-retired-far-control-transfers 949Count retired far control transfers (which are always mispredicted). 950.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier 951Count retired fastpath double op instructions. 952This event is supported in revision B and later CPUs. 953This event may be further qualified using 954.Ar qualifier , 955which is a 956.Ql + 957separated set of the following keywords: 958.Pp 959.Bl -tag -width indent -compact 960.It Li low-op-pos-0 961Count instructions with the low op in position 0. 962.It Li low-op-pos-1 963Count instructions with the low op in position 1. 964.It Li low-op-pos-2 965Count instructions with the low op in position 2. 966.El 967.Pp 968The default is to count all types of instructions. 969.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier 970Count retired FPU instructions. 971This event is supported in revision B and later CPUs. 972This event may be further qualified using 973.Ar qualifier , 974which is a 975.Ql + 976separated set of the following keywords: 977.Pp 978.Bl -tag -width indent -compact 979.It Li mmx-3dnow 980Count MMX and 3DNow!\& instructions. 981.It Li packed-sse-sse2 982Count packed SSE and SSE2 instructions. 983.It Li scalar-sse-sse2 984Count scalar SSE and SSE2 instructions 985.It Li x87 986Count x87 instructions. 987.El 988.Pp 989The default is to count all types of instructions. 990.It Li k8-fr-retired-near-returns 991Count retired near returns. 992.It Li k8-fr-retired-near-returns-mispredicted 993Count mispredicted near returns. 994.It Li k8-fr-retired-resyncs 995Count retired resyncs (non-control transfer branches). 996.It Li k8-fr-retired-taken-hardware-interrupts 997Count retired taken hardware interrupts. 998.It Li k8-fr-retired-taken-branches 999Count retired taken branches. 1000.It Li k8-fr-retired-taken-branches-mispredicted 1001Count retired taken branches that were mispredicted. 1002.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare 1003Count retired taken branches that were mispredicted only due to an 1004address miscompare. 1005.It Li k8-fr-retired-uops 1006Count retired uops. 1007.It Li k8-fr-retired-x86-instructions 1008Count retired x86 instructions including exceptions and interrupts. 1009.It Li k8-ic-fetch 1010Count instruction cache fetches. 1011.It Li k8-ic-instruction-fetch-stall 1012Count cycles in stalls due to instruction fetch. 1013.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit 1014Count L1 ITLB misses that are L2 ITLB hits. 1015.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss 1016Count ITLB misses that miss in both L1 and L2 ITLBs. 1017.It Li k8-ic-microarchitectural-resync-by-snoop 1018Count microarchitectural resyncs caused by snoops. 1019.It Li k8-ic-miss 1020Count instruction cache misses. 1021.It Li k8-ic-refill-from-l2 1022Count instruction cache refills from L2 cache. 1023.It Li k8-ic-refill-from-system 1024Count instruction cache refills from system memory. 1025.It Li k8-ic-return-stack-hits 1026Count hits to the return stack. 1027.It Li k8-ic-return-stack-overflow 1028Count overflows of the return stack. 1029.It Li k8-ls-buffer2-full 1030Count load/store buffer2 full events. 1031.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier 1032Count locked operations. 1033For revision C and later CPUs, the following qualifiers are supported: 1034.Pp 1035.Bl -tag -width indent -compact 1036.It Li cycles-in-request 1037Count the number of cycles in the lock request/grant stage. 1038.It Li cycles-to-complete 1039Count the number of cycles a lock takes to complete once it is 1040non-speculative and is the older load/store operation. 1041.It Li locked-instructions 1042Count the number of lock instructions executed. 1043.El 1044.Pp 1045The default is to count the number of lock instructions executed. 1046.It Li k8-ls-microarchitectural-late-cancel 1047Count microarchitectural late cancels of operations in the load/store 1048unit. 1049.It Li k8-ls-microarchitectural-resync-by-self-modifying-code 1050Count microarchitectural resyncs caused by self-modifying code. 1051.It Li k8-ls-microarchitectural-resync-by-snoop 1052Count microarchitectural resyncs caused by snoops. 1053.It Li k8-ls-retired-cflush-instructions 1054Count retired CFLUSH instructions. 1055.It Li k8-ls-retired-cpuid-instructions 1056Count retired CPUID instructions. 1057.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier 1058Count segment register loads. 1059This event may be further qualified using 1060.Ar qualifier , 1061which is a 1062.Ql + 1063separated set of the following keywords: 1064.Bl -tag -width indent -compact 1065.It Li cs 1066Count CS register loads. 1067.It Li ds 1068Count DS register loads. 1069.It Li es 1070Count ES register loads. 1071.It Li fs 1072Count FS register loads. 1073.It Li gs 1074Count GS register loads. 1075.\" .It Li hs 1076.\" Count HS register loads. 1077.\" XXX "HS" register? 1078.It Li ss 1079Count SS register loads. 1080.El 1081.Pp 1082The default is to count all types of loads. 1083.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier 1084Count memory controller bypass counter saturation events. 1085This event may be further qualified using 1086.Ar qualifier , 1087which is a 1088.Ql + 1089separated set of the following keywords: 1090.Pp 1091.Bl -tag -width indent -compact 1092.It Li dram-controller-interface-bypass 1093Count DRAM controller interface bypass. 1094.It Li dram-controller-queue-bypass 1095Count DRAM controller queue bypass. 1096.It Li memory-controller-hi-pri-bypass 1097Count memory controller high priority bypasses. 1098.It Li memory-controller-lo-pri-bypass 1099Count memory controller low priority bypasses. 1100.El 1101.Pp 1102.It Li k8-nb-memory-controller-dram-slots-missed 1103Count memory controller DRAM command slots missed (in MemClks). 1104.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier 1105Count memory controller page access events. 1106This event may be further qualified using 1107.Ar qualifier , 1108which is a 1109.Ql + 1110separated set of the following keywords: 1111.Pp 1112.Bl -tag -width indent -compact 1113.It Li page-conflict 1114Count page conflicts. 1115.It Li page-hit 1116Count page hits. 1117.It Li page-miss 1118Count page misses. 1119.El 1120.Pp 1121The default is to count all types of events. 1122.It Li k8-nb-memory-controller-page-table-overflow 1123Count memory control page table overflow events. 1124.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier 1125Count probe events. 1126This event may be further qualified using 1127.Ar qualifier , 1128which is a 1129.Ql + 1130separated set of the following keywords: 1131.Pp 1132.Bl -tag -width indent -compact 1133.It Li probe-hit 1134Count all probe hits. 1135.It Li probe-hit-dirty-no-memory-cancel 1136Count probe hits without memory cancels. 1137.It Li probe-hit-dirty-with-memory-cancel 1138Count probe hits with memory cancels. 1139.It Li probe-miss 1140Count probe misses. 1141.El 1142.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier 1143Count sized commands issued. 1144This event may be further qualified using 1145.Ar qualifier , 1146which is a 1147.Ql + 1148separated set of the following keywords: 1149.Pp 1150.Bl -tag -width indent -compact 1151.It Li nonpostwrszbyte 1152.It Li nonpostwrszdword 1153.It Li postwrszbyte 1154.It Li postwrszdword 1155.It Li rdszbyte 1156.It Li rdszdword 1157.It Li rdmodwr 1158.El 1159.Pp 1160The default is to count all types of commands. 1161.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier 1162Count memory control turnaround events. 1163This event may be further qualified using 1164.Ar qualifier , 1165which is a 1166.Ql + 1167separated set of the following keywords: 1168.Pp 1169.Bl -tag -width indent -compact 1170.\" XXX doc is unclear whether these are cycle counts or event counts 1171.It Li dimm-turnaround 1172Count DIMM turnarounds. 1173.It Li read-to-write-turnaround 1174Count read to write turnarounds. 1175.It Li write-to-read-turnaround 1176Count write to read turnarounds. 1177.El 1178.Pp 1179The default is to count all types of events. 1180.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier 1181.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier 1182.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier 1183Count events on the HyperTransport(tm) buses. 1184These events may be further qualified using 1185.Ar qualifier , 1186which is a 1187.Ql + 1188separated set of the following keywords: 1189.Pp 1190.Bl -tag -width indent -compact 1191.It Li buffer-release 1192Count buffer release messages sent. 1193.It Li command 1194Count command messages sent. 1195.It Li data 1196Count data messages sent. 1197.It Li nop 1198Count nop messages sent. 1199.El 1200.Pp 1201The default is to count all types of messages. 1202.El 1203.Ss Intel P6 PMCS 1204Intel P6 PMCs are present in Intel 1205.Tn "Pentium Pro" , 1206.Tn "Pentium II" , 1207.Tn Celeron , 1208.Tn "Pentium III" 1209and 1210.Tn "Pentium M" 1211processors. 1212.Pp 1213These CPUs have two counters. 1214Some events may only be used on specific counters and some events are 1215defined only on specific processor models. 1216.Pp 1217These PMCs are documented in 1218.Rs 1219.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 1220.%T "Volume 3: System Programming Guide" 1221.%N "Order Number 245472-012" 1222.%D 2003 1223.%Q "Intel Corporation" 1224.Re 1225.Pp 1226Some of these events are affected by processor errata described in 1227.Rs 1228.%B "Intel(R) Pentium(R) III Processor Specification Update" 1229.%N "Document Number: 244453-054" 1230.%D "April 2005" 1231.%Q "Intel Corporation" 1232.Re 1233.Pp 1234Event specifiers for Intel P6 PMCs can have the following common 1235qualifiers: 1236.Bl -tag -width indent 1237.It Li cmask= Ns Ar value 1238Configure the PMC to increment only if the number of configured 1239events measured in a cycle is greater than or equal to 1240.Ar value . 1241.It Li edge 1242Configure the PMC to count the number of deasserted to asserted 1243transitions of the conditions expressed by the other qualifiers. 1244If specified, the counter will increment only once whenever a 1245condition becomes true, irrespective of the number of clocks during 1246which the condition remains true. 1247.It Li inv 1248Invert the sense of comparision when the 1249.Dq Li cmask 1250qualifier is present, making the counter increment when the number of 1251events per cycle is less than the value specified by the 1252.Dq Li cmask 1253qualifier. 1254.It Li os 1255Configure the PMC to count events happening at processor privilege 1256level 0. 1257.It Li umask= Ns Ar value 1258This qualifier is used to further qualify the event selected (see 1259below). 1260.It Li usr 1261Configure the PMC to count events occurring at privilege levels 1, 2 1262or 3. 1263.El 1264.Pp 1265If neither of the 1266.Dq Li os 1267or 1268.Dq Li usr 1269qualifiers are specified, the default is to enable both. 1270.Pp 1271The event specifiers supported by Intel P6 PMCs are: 1272.Bl -tag -width indent 1273.It Li p6-baclears 1274Count the number of times a static branch prediction was made by the 1275branch decoder because the BTB did not have a prediction. 1276.It Li p6-br-bac-missp-exec 1277.Pq Tn "Pentium M" 1278Count the number of branch instructions executed that where 1279mispredicted at the Front End (BAC). 1280.It Li p6-br-bogus 1281Count the number of bogus branches. 1282.It Li p6-br-call-exec 1283.Pq Tn "Pentium M" 1284Count the number of call instructions executed. 1285.It Li p6-br-call-missp-exec 1286.Pq Tn "Pentium M" 1287Count the number of call instructions executed that were mispredicted. 1288.It Li p6-br-cnd-exec 1289.Pq Tn "Pentium M" 1290Count the number of conditional branch instructions executed. 1291.It Li p6-br-cnd-missp-exec 1292.Pq Tn "Pentium M" 1293Count the number of conditional branch instructions executed that were 1294mispredicted. 1295.It Li p6-br-ind-call-exec 1296.Pq Tn "Pentium M" 1297Count the number of indirect call instructions executed. 1298.It Li p6-br-ind-exec 1299.Pq Tn "Pentium M" 1300Count the number of indirect branch instructions executed. 1301.It Li p6-br-ind-missp-exec 1302.Pq Tn "Pentium M" 1303Count the number of indirect branch instructions executed that were 1304mispredicted. 1305.It Li p6-br-inst-decoded 1306Count the number of branch instructions decoded. 1307.It Li p6-br-inst-exec 1308.Pq Tn "Pentium M" 1309Count the number of branch instructions executed but necessarily retired. 1310.It Li p6-br-inst-retired 1311Count the number of branch instructions retired. 1312.It Li p6-br-miss-pred-retired 1313Count the number of mispredicted branch instructions retired. 1314.It Li p6-br-miss-pred-taken-ret 1315Count the number of taken mispredicted branches retired. 1316.It Li p6-br-missp-exec 1317.Pq Tn "Pentium M" 1318Count the number of branch instructions executed that were 1319mispredicted at execution. 1320.It Li p6-br-ret-bac-missp-exec 1321.Pq Tn "Pentium M" 1322Count the number of return instructions executed that were 1323mispredicted at the Front End (BAC). 1324.It Li p6-br-ret-exec 1325.Pq Tn "Pentium M" 1326Count the number of return instructions executed. 1327.It Li p6-br-ret-missp-exec 1328.Pq Tn "Pentium M" 1329Count the number of return instructions executed that were 1330mispredicted at execution. 1331.It Li p6-br-taken-retired 1332Count the number of taken branches retired. 1333.It Li p6-btb-misses 1334Count the number of branches for which the BTB did not produce a 1335prediction. 1336.It Li p6-bus-bnr-drv 1337Count the number of bus clock cycles during which this processor is 1338driving the BNR# pin. 1339.It Li p6-bus-data-rcv 1340Count the number of bus clock cycles during which this processor is 1341receiving data. 1342.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier 1343Count the number of clocks during which DRDY# is asserted. 1344An additional qualifier may be specified, and comprises one of the 1345following keywords: 1346.Pp 1347.Bl -tag -width indent -compact 1348.It Li any 1349Count transactions generated by any agent on the bus. 1350.It Li self 1351Count transactions generated by this processor. 1352.El 1353.Pp 1354The default is to count operations generated by this processor. 1355.It Li p6-bus-hit-drv 1356Count the number of bus clock cycles during which this processor is 1357driving the HIT# pin. 1358.It Li p6-bus-hitm-drv 1359Count the number of bus clock cycles during which this processor is 1360driving the HITM# pin. 1361.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier 1362Count the number of clocks during with LOCK# is asserted on the 1363external system bus. 1364An additional qualifier may be specified and comprises one of the following 1365keywords: 1366.Pp 1367.Bl -tag -width indent -compact 1368.It Li any 1369Count transactions generated by any agent on the bus. 1370.It Li self 1371Count transactions generated by this processor. 1372.El 1373.Pp 1374The default is to count operations generated by this processor. 1375.It Li p6-bus-req-outstanding 1376Count the number of bus requests outstanding in any given cycle. 1377.It Li p6-bus-snoop-stall 1378Count the number of clock cycles during which the bus is snoop stalled. 1379.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier 1380Count the number of completed bus transactions of any kind. 1381An additional qualifier may be specified and comprises one of the following 1382keywords: 1383.Pp 1384.Bl -tag -width indent -compact 1385.It Li any 1386Count transactions generated by any agent on the bus. 1387.It Li self 1388Count transactions generated by this processor. 1389.El 1390.Pp 1391The default is to count operations generated by this processor. 1392.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier 1393Count the number of burst read transactions. 1394An additional qualifier may be specified and comprises one of the following 1395keywords: 1396.Pp 1397.Bl -tag -width indent -compact 1398.It Li any 1399Count transactions generated by any agent on the bus. 1400.It Li self 1401Count transactions generated by this processor. 1402.El 1403.Pp 1404The default is to count operations generated by this processor. 1405.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier 1406Count the number of completed burst transactions. 1407An additional qualifier may be specified and comprises one of the following 1408keywords: 1409.Pp 1410.Bl -tag -width indent -compact 1411.It Li any 1412Count transactions generated by any agent on the bus. 1413.It Li self 1414Count transactions generated by this processor. 1415.El 1416.Pp 1417The default is to count operations generated by this processor. 1418.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier 1419Count the number of completed deferred transactions. 1420An additional qualifier may be specified and comprises one of the following 1421keywords: 1422.Pp 1423.Bl -tag -width indent -compact 1424.It Li any 1425Count transactions generated by any agent on the bus. 1426.It Li self 1427Count transactions generated by this processor. 1428.El 1429.Pp 1430The default is to count operations generated by this processor. 1431.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier 1432Count the number of completed instruction fetch transactions. 1433An additional qualifier may be specified and comprises one of the following 1434keywords: 1435.Pp 1436.Bl -tag -width indent -compact 1437.It Li any 1438Count transactions generated by any agent on the bus. 1439.It Li self 1440Count transactions generated by this processor. 1441.El 1442.Pp 1443The default is to count operations generated by this processor. 1444.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier 1445Count the number of completed invalidate transactions. 1446An additional qualifier may be specified and comprises one of the following 1447keywords: 1448.Pp 1449.Bl -tag -width indent -compact 1450.It Li any 1451Count transactions generated by any agent on the bus. 1452.It Li self 1453Count transactions generated by this processor. 1454.El 1455.Pp 1456The default is to count operations generated by this processor. 1457.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier 1458Count the number of completed memory transactions. 1459An additional qualifier may be specified and comprises one of the following 1460keywords: 1461.Pp 1462.Bl -tag -width indent -compact 1463.It Li any 1464Count transactions generated by any agent on the bus. 1465.It Li self 1466Count transactions generated by this processor. 1467.El 1468.Pp 1469The default is to count operations generated by this processor. 1470.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier 1471Count the number of completed partial write transactions. 1472An additional qualifier may be specified and comprises one of the following 1473keywords: 1474.Pp 1475.Bl -tag -width indent -compact 1476.It Li any 1477Count transactions generated by any agent on the bus. 1478.It Li self 1479Count transactions generated by this processor. 1480.El 1481.Pp 1482The default is to count operations generated by this processor. 1483.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier 1484Count the number of completed read-for-ownership transactions. 1485An additional qualifier may be specified and comprises one of the following 1486keywords: 1487.Pp 1488.Bl -tag -width indent -compact 1489.It Li any 1490Count transactions generated by any agent on the bus. 1491.It Li self 1492Count transactions generated by this processor. 1493.El 1494.Pp 1495The default is to count operations generated by this processor. 1496.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier 1497Count the number of completed I/O transactions. 1498An additional qualifier may be specified and comprises one of the following 1499keywords: 1500.Pp 1501.Bl -tag -width indent -compact 1502.It Li any 1503Count transactions generated by any agent on the bus. 1504.It Li self 1505Count transactions generated by this processor. 1506.El 1507.Pp 1508The default is to count operations generated by this processor. 1509.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier 1510Count the number of completed partial transactions. 1511An additional qualifier may be specified and comprises one of the following 1512keywords: 1513.Pp 1514.Bl -tag -width indent -compact 1515.It Li any 1516Count transactions generated by any agent on the bus. 1517.It Li self 1518Count transactions generated by this processor. 1519.El 1520.Pp 1521The default is to count operations generated by this processor. 1522.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier 1523Count the number of completed write-back transactions. 1524An additional qualifier may be specified and comprises one of the following 1525keywords: 1526.Pp 1527.Bl -tag -width indent -compact 1528.It Li any 1529Count transactions generated by any agent on the bus. 1530.It Li self 1531Count transactions generated by this processor. 1532.El 1533.Pp 1534The default is to count operations generated by this processor. 1535.It Li p6-cpu-clk-unhalted 1536Count the number of cycles during with the processor was not halted. 1537.Pp 1538.Pq Tn "Pentium M" 1539Count the number of cycles during with the processor was not halted 1540and not in a thermal trip. 1541.It Li p6-cycles-div-busy 1542Count the number of cycles during which the divider is busy and cannot 1543accept new divides. 1544This event is only allocated on counter 0. 1545.It Li p6-cycles-in-pending-and-masked 1546Count the number of processor cycles for which interrupts were 1547disabled and interrupts were pending. 1548.It Li p6-cycles-int-masked 1549Count the number of processor cycles for which interrupts were 1550disabled. 1551.It Li p6-data-mem-refs 1552Count all loads and all stores using any memory type, including 1553internal retries. 1554Each part of a split store is counted separately. 1555.It Li p6-dcu-lines-in 1556Count the total lines allocated in the data cache unit. 1557.It Li p6-dcu-m-lines-in 1558Count the number of M state lines allocated in the data cache unit. 1559.It Li p6-dcu-m-lines-out 1560Count the number of M state lines evicted from the data cache unit. 1561.It Li p6-dcu-miss-outstanding 1562Count the weighted number of cycles while a data cache unit miss is 1563outstanding, incremented by the number of outstanding cache misses at 1564any time. 1565.It Li p6-div 1566Count the number of integer and floating-point divides including 1567speculative divides. 1568This event is only allocated on counter 1. 1569.It Li p6-emon-esp-uops 1570.Pq Tn "Pentium M" 1571Count the total number of micro-ops. 1572.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier 1573.Pq Tn "Pentium M" 1574Count the number of 1575.Tn "Enhanced Intel SpeedStep" 1576transitions. 1577An additional qualifier may be specified, and can be one of the 1578following keywords: 1579.Pp 1580.Bl -tag -width indent -compact 1581.It Li all 1582Count all transitions. 1583.It Li freq 1584Count only frequency transitions. 1585.El 1586.Pp 1587The default is to count all transitions. 1588.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier 1589.Pq Tn "Pentium M" 1590Count the number of retired fused micro-ops. 1591An additional qualifier may be specified, and may be one of the 1592following keywords: 1593.Pp 1594.Bl -tag -width indent -compact 1595.It Li all 1596Count all fused micro-ops. 1597.It Li loadop 1598Count only load and op micro-ops. 1599.It Li stdsta 1600Count only STD/STA micro-ops. 1601.El 1602.Pp 1603The default is to count all fused micro-ops. 1604.It Li p6-emon-kni-comp-inst-ret 1605.Pq Tn "Pentium III" 1606Count the number of SSE computational instructions retired. 1607An additional qualifier may be specified, and comprises one of the 1608following keywords: 1609.Pp 1610.Bl -tag -width indent -compact 1611.It Li packed-and-scalar 1612Count packed and scalar operations. 1613.It Li scalar 1614Count scalar operations only. 1615.El 1616.Pp 1617The default is to count packed and scalar operations. 1618.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier 1619.Pq Tn "Pentium III" 1620Count the number of SSE instructions retired. 1621An additional qualifier may be specified, and comprises one of the 1622following keywords: 1623.Pp 1624.Bl -tag -width indent -compact 1625.It Li packed-and-scalar 1626Count packed and scalar operations. 1627.It Li scalar 1628Count scalar operations only. 1629.El 1630.Pp 1631The default is to count packed and scalar operations. 1632.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier 1633.Pq Tn "Pentium III" 1634Count the number of SSE prefetch or weakly ordered instructions 1635dispatched (including speculative prefetches). 1636An additional qualifier may be specified, and comprises one of the 1637following keywords: 1638.Pp 1639.Bl -tag -width indent -compact 1640.It Li nta 1641Count non-temporal prefetches. 1642.It Li t1 1643Count prefetches to L1. 1644.It Li t2 1645Count prefetches to L2. 1646.It Li wos 1647Count weakly ordered stores. 1648.El 1649.Pp 1650The default is to count non-temporal prefetches. 1651.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier 1652.Pq Tn "Pentium III" 1653Count the number of prefetch or weakly ordered instructions that miss 1654all caches. 1655An additional qualifier may be specified, and comprises one of the 1656following keywords: 1657.Pp 1658.Bl -tag -width indent -compact 1659.It Li nta 1660Count non-temporal prefetches. 1661.It Li t1 1662Count prefetches to L1. 1663.It Li t2 1664Count prefetches to L2. 1665.It Li wos 1666Count weakly ordered stores. 1667.El 1668.Pp 1669The default is to count non-temporal prefetches. 1670.It Li p6-emon-pref-rqsts-dn 1671.Pq Tn "Pentium M" 1672Count the number of downward prefetches issued. 1673.It Li p6-emon-pref-rqsts-up 1674.Pq Tn "Pentium M" 1675Count the number of upward prefetches issued. 1676.It Li p6-emon-simd-instr-retired 1677.Pq Tn "Pentium M" 1678Count the number of retired 1679.Tn MMX 1680instructions. 1681.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier 1682.Pq Tn "Pentium M" 1683Count the number of computational SSE instructions retired. 1684An additional qualifier may be specified and can be one of the 1685following keywords: 1686.Pp 1687.Bl -tag -width indent -compact 1688.It Li sse-packed-single 1689Count SSE packed-single instructions. 1690.It Li sse-scalar-single 1691Count SSE scalar-single instructions. 1692.It Li sse2-packed-double 1693Count SSE2 packed-double instructions. 1694.It Li sse2-scalar-double 1695Count SSE2 scalar-double instructions. 1696.El 1697.Pp 1698The default is to count SSE packed-single instructions. 1699.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer 1700.Pp 1701.Pq Tn "Pentium M" 1702Count the number of SSE instructions retired. 1703An additional qualifier can be specified, and can be one of the 1704following keywords: 1705.Pp 1706.Bl -tag -width indent -compact 1707.It Li sse-packed-single 1708Count SSE packed-single instructions. 1709.It Li sse-packed-single-scalar-single 1710Count SSE packed-single and scalar-single instructions. 1711.It Li sse2-packed-double 1712Count SSE2 packed-double instructions. 1713.It Li sse2-scalar-double 1714Count SSE2 scalar-double instructions. 1715.El 1716.Pp 1717The default is to count SSE packed-single instructions. 1718.It Li p6-emon-synch-uops 1719.Pq Tn "Pentium M" 1720Count the number of sync micro-ops. 1721.It Li p6-emon-thermal-trip 1722.Pq Tn "Pentium M" 1723Count the duration or occurrences of thermal trips. 1724Use the 1725.Dq Li edge 1726qualifier to count occurrences of thermal trips. 1727.It Li p6-emon-unfusion 1728.Pq Tn "Pentium M" 1729Count the number of unfusion events in the reorder buffer. 1730.It Li p6-flops 1731Count the number of computational floating point operations retired. 1732This event is only allocated on counter 0. 1733.It Li p6-fp-assist 1734Count the number of floating point exceptions handled by microcode. 1735This event is only allocated on counter 1. 1736.It Li p6-fp-comps-ops-exe 1737Count the number of computation floating point operations executed. 1738This event is only allocated on counter 0. 1739.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier 1740.Pq Tn "Pentium II" , Tn "Pentium III" 1741Count the number of transitions between MMX and floating-point 1742instructions. 1743An additional qualifier may be specified, and comprises one of the 1744following keywords: 1745.Pp 1746.Bl -tag -width indent -compact 1747.It Li mmxtofp 1748Count transitions from MMX instructions to floating-point instructions. 1749.It Li fptommx 1750Count transitions from floating-point instructions to MMX instructions. 1751.El 1752.Pp 1753The default is to count MMX to floating-point transitions. 1754.It Li p6-hw-int-rx 1755Count the number of hardware interrupts received. 1756.It Li p6-ifu-fetch 1757Count the number of instruction fetches, both cacheable and non-cacheable. 1758.It Li p6-ifu-fetch-miss 1759Count the number of instruction fetch misses (i.e., those that produce 1760memory accesses). 1761.It Li p6-ifu-mem-stall 1762Count the number of cycles instruction fetch is stalled for any reason. 1763.It Li p6-ild-stall 1764Count the number of cycles the instruction length decoder is stalled. 1765.It Li p6-inst-decoded 1766Count the number of instructions decoded. 1767.It Li p6-inst-retired 1768Count the number of instructions retired. 1769.It Li p6-itlb-miss 1770Count the number of instruction TLB misses. 1771.It Li p6-l2-ads 1772Count the number of L2 address strobes. 1773.It Li p6-l2-dbus-busy 1774Count the number of cycles during which the L2 cache data bus was busy. 1775.It Li p6-l2-dbus-busy-rd 1776Count the number of cycles during which the L2 cache data bus was busy 1777transferring read data from L2 to the processor. 1778.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier 1779Count the number of L2 instruction fetches. 1780An additional qualifier may be specified and comprises a list of the following 1781keywords separated by 1782.Ql + 1783characters: 1784.Pp 1785.Bl -tag -width indent -compact 1786.It Li e 1787Count operations affecting E (exclusive) state lines. 1788.It Li i 1789Count operations affecting I (invalid) state lines. 1790.It Li m 1791Count operations affecting M (modified) state lines. 1792.It Li s 1793Count operations affecting S (shared) state lines. 1794.El 1795.Pp 1796The default is to count operations affecting all (MESI) state lines. 1797.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier 1798Count the number of L2 data loads. 1799An additional qualifier may be specified and comprises a list of the following 1800keywords separated by 1801.Ql + 1802characters: 1803.Pp 1804.Bl -tag -width indent -compact 1805.It Li both 1806.Pq Tn "Pentium M" 1807Count both hardware-prefetched lines and non-hardware-prefetched lines. 1808.It Li e 1809Count operations affecting E (exclusive) state lines. 1810.It Li hw 1811.Pq Tn "Pentium M" 1812Count hardware-prefetched lines only. 1813.It Li i 1814Count operations affecting I (invalid) state lines. 1815.It Li m 1816Count operations affecting M (modified) state lines. 1817.It Li nonhw 1818.Pq Tn "Pentium M" 1819Exclude hardware-prefetched lines. 1820.It Li s 1821Count operations affecting S (shared) state lines. 1822.El 1823.Pp 1824The default on processors other than 1825.Tn "Pentium M" 1826processors is to count operations affecting all (MESI) state lines. 1827The default on 1828.Tn "Pentium M" 1829processors is to count both hardware-prefetched and 1830non-hardware-prefetch operations on all (MESI) state lines. 1831.Pq Errata 1832This event is affected by processor errata E53. 1833.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier 1834Count the number of L2 lines allocated. 1835An additional qualifier may be specified and comprises a list of the following 1836keywords separated by 1837.Ql + 1838characters: 1839.Pp 1840.Bl -tag -width indent -compact 1841.It Li both 1842.Pq Tn "Pentium M" 1843Count both hardware-prefetched lines and non-hardware-prefetched lines. 1844.It Li e 1845Count operations affecting E (exclusive) state lines. 1846.It Li hw 1847.Pq Tn "Pentium M" 1848Count hardware-prefetched lines only. 1849.It Li i 1850Count operations affecting I (invalid) state lines. 1851.It Li m 1852Count operations affecting M (modified) state lines. 1853.It Li nonhw 1854.Pq Tn "Pentium M" 1855Exclude hardware-prefetched lines. 1856.It Li s 1857Count operations affecting S (shared) state lines. 1858.El 1859.Pp 1860The default on processors other than 1861.Tn "Pentium M" 1862processors is to count operations affecting all (MESI) state lines. 1863The default on 1864.Tn "Pentium M" 1865processors is to count both hardware-prefetched and 1866non-hardware-prefetch operations on all (MESI) state lines. 1867.Pq Errata 1868This event is affected by processor errata E45. 1869.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier 1870Count the number of L2 lines evicted. 1871An additional qualifier may be specified and comprises a list of the following 1872keywords separated by 1873.Ql + 1874characters: 1875.Pp 1876.Bl -tag -width indent -compact 1877.It Li both 1878.Pq Tn "Pentium M" 1879Count both hardware-prefetched lines and non-hardware-prefetched lines. 1880.It Li e 1881Count operations affecting E (exclusive) state lines. 1882.It Li hw 1883.Pq Tn "Pentium M" 1884Count hardware-prefetched lines only. 1885.It Li i 1886Count operations affecting I (invalid) state lines. 1887.It Li m 1888Count operations affecting M (modified) state lines. 1889.It Li nonhw 1890.Pq Tn "Pentium M" only 1891Exclude hardware-prefetched lines. 1892.It Li s 1893Count operations affecting S (shared) state lines. 1894.El 1895.Pp 1896The default on processors other than 1897.Tn "Pentium M" 1898processors is to count operations affecting all (MESI) state lines. 1899The default on 1900.Tn "Pentium M" 1901processors is to count both hardware-prefetched and 1902non-hardware-prefetch operations on all (MESI) state lines. 1903.Pq Errata 1904This event is affected by processor errata E45. 1905.It Li p6-l2-m-lines-inm 1906Count the number of modified lines allocated in L2 cache. 1907.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier 1908Count the number of L2 M-state lines evicted. 1909.Pp 1910.Pq Tn "Pentium M" 1911On these processors an additional qualifier may be specified and 1912comprises a list of the following keywords separated by 1913.Ql + 1914characters: 1915.Pp 1916.Bl -tag -width indent -compact 1917.It Li both 1918Count both hardware-prefetched lines and non-hardware-prefetched lines. 1919.It Li hw 1920Count hardware-prefetched lines only. 1921.It Li nonhw 1922Exclude hardware-prefetched lines. 1923.El 1924.Pp 1925The default is to count both hardware-prefetched and 1926non-hardware-prefetch operations. 1927.Pq Errata 1928This event is affected by processor errata E53. 1929.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier 1930Count the total number of L2 requests. 1931An additional qualifier may be specified and comprises a list of the following 1932keywords separated by 1933.Ql + 1934characters: 1935.Pp 1936.Bl -tag -width indent -compact 1937.It Li e 1938Count operations affecting E (exclusive) state lines. 1939.It Li i 1940Count operations affecting I (invalid) state lines. 1941.It Li m 1942Count operations affecting M (modified) state lines. 1943.It Li s 1944Count operations affecting S (shared) state lines. 1945.El 1946.Pp 1947The default is to count operations affecting all (MESI) state lines. 1948.It Li p6-l2-st 1949Count the number of L2 data stores. 1950An additional qualifier may be specified and comprises a list of the following 1951keywords separated by 1952.Ql + 1953characters: 1954.Pp 1955.Bl -tag -width indent -compact 1956.It Li e 1957Count operations affecting E (exclusive) state lines. 1958.It Li i 1959Count operations affecting I (invalid) state lines. 1960.It Li m 1961Count operations affecting M (modified) state lines. 1962.It Li s 1963Count operations affecting S (shared) state lines. 1964.El 1965.Pp 1966The default is to count operations affecting all (MESI) state lines. 1967.It Li p6-ld-blocks 1968Count the number of load operations delayed due to store buffer blocks. 1969.It Li p6-misalign-mem-ref 1970Count the number of misaligned data memory references (crossing a 64 1971bit boundary). 1972.It Li p6-mmx-assist 1973.Pq Tn "Pentium II" , Tn "Pentium III" 1974Count the number of MMX assists executed. 1975.It Li p6-mmx-instr-exec 1976.Pq Tn Celeron , Tn "Pentium II" 1977Count the number of MMX instructions executed, except MOVQ and MOVD 1978stores from register to memory. 1979.It Li p6-mmx-instr-ret 1980.Pq Tn "Pentium II" 1981Count the number of MMX instructions retired. 1982.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier 1983.Pq Tn "Pentium II" , Tn "Pentium III" 1984Count the number of MMX instructions executed. 1985An additional qualifier may be specified and comprises a list of 1986the following keywords separated by 1987.Ql + 1988characters: 1989.Pp 1990.Bl -tag -width indent -compact 1991.It Li pack 1992Count MMX pack operation instructions. 1993.It Li packed-arithmetic 1994Count MMX packed arithmetic instructions. 1995.It Li packed-logical 1996Count MMX packed logical instructions. 1997.It Li packed-multiply 1998Count MMX packed multiply instructions. 1999.It Li packed-shift 2000Count MMX packed shift instructions. 2001.It Li unpack 2002Count MMX unpack operation instructions. 2003.El 2004.Pp 2005The default is to count all operations. 2006.It Li p6-mmx-sat-instr-exec 2007.Pq Tn "Pentium II" , Tn "Pentium III" 2008Count the number of MMX saturating instructions executed. 2009.It Li p6-mmx-uops-exec 2010.Pq Tn "Pentium II" , Tn "Pentium III" 2011Count the number of MMX micro-ops executed. 2012.It Li p6-mul 2013Count the number of integer and floating-point multiplies, including 2014speculative multiplies. 2015This event is only allocated on counter 1. 2016.It Li p6-partial-rat-stalls 2017Count the number of cycles or events for partial stalls. 2018.It Li p6-resource-stalls 2019Count the number of cycles there was a resource related stall of any kind. 2020.It Li p6-ret-seg-renames 2021.Pq Tn "Pentium II" , Tn "Pentium III" 2022Count the number of segment register rename events retired. 2023.It Li p6-sb-drains 2024Count the number of cycles the store buffer is draining. 2025.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier 2026.Pq Tn "Pentium II" , Tn "Pentium III" 2027Count the number of segment register renames. 2028An additional qualifier may be specified, and comprises a list of the 2029following keywords separated by 2030.Ql + 2031characters: 2032.Pp 2033.Bl -tag -width indent -compact 2034.It Li ds 2035Count renames for segment register DS. 2036.It Li es 2037Count renames for segment register ES. 2038.It Li fs 2039Count renames for segment register FS. 2040.It Li gs 2041Count renames for segment register GS. 2042.El 2043.Pp 2044The default is to count operations affecting all segment registers. 2045.It Li p6-seg-rename-stalls 2046.Pq Tn "Pentium II" , Tn "Pentium III" 2047Count the number of segment register renaming stalls. 2048An additional qualifier may be specified, and comprises a list of the 2049following keywords separated by 2050.Ql + 2051characters: 2052.Pp 2053.Bl -tag -width indent -compact 2054.It Li ds 2055Count stalls for segment register DS. 2056.It Li es 2057Count stalls for segment register ES. 2058.It Li fs 2059Count stalls for segment register FS. 2060.It Li gs 2061Count stalls for segment register GS. 2062.El 2063.Pp 2064The default is to count operations affecting all the segment registers. 2065.It Li p6-segment-reg-loads 2066Count the number of segment register loads. 2067.It Li p6-uops-retired 2068Count the number of micro-ops retired. 2069.El 2070.Ss Intel P4 PMCS 2071Intel P4 PMCs are present in Intel 2072.Tn "Pentium 4" 2073and 2074.Tn Xeon 2075processors. 2076These PMCs are documented in 2077.Rs 2078.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 2079.%T "Volume 3: System Programming Guide" 2080.%N "Order Number 245472-012" 2081.%D 2003 2082.%Q "Intel Corporation" 2083.Re 2084Further information about using these PMCs may be found in 2085.Rs 2086.%B "IA-32 Intel(R) Architecture Optimization Guide" 2087.%D 2003 2088.%N "Order Number 248966-009" 2089.%Q "Intel Corporation" 2090.Re 2091Some of these events are affected by processor errata described in 2092.Rs 2093.%B "Intel(R) Pentium(R) 4 Processor Specification Update" 2094.%N "Document Number: 249199-059" 2095.%D "April 2005" 2096.%Q "Intel Corporation" 2097.Re 2098.Pp 2099Event specifiers for Intel P4 PMCs can have the following common 2100qualifiers: 2101.Bl -tag -width indent 2102.It Li active= Ns Ar choice 2103(On P4 HTT CPUs) Filter event counting based on which logical 2104processors are active. 2105The allowed values of 2106.Ar choice 2107are: 2108.Pp 2109.Bl -tag -width indent -compact 2110.It Li any 2111Count when either logical processor is active. 2112.It Li both 2113Count when both logical processors are active. 2114.It Li none 2115Count only when neither logical processor is active. 2116.It Li single 2117Count only when one logical processor is active. 2118.El 2119.Pp 2120The default is 2121.Dq Li both . 2122.It Li cascade 2123Configure the PMC to cascade onto its partner. 2124See 2125.Sx "Cascading P4 PMCs" 2126below for more information. 2127.It Li edge 2128Configure the counter to count false to true transitions of the threshold 2129comparision output. 2130This qualifier only takes effect if a threshold qualifier has also been 2131specified. 2132.It Li complement 2133Configure the counter to increment only when the event count seen is 2134less than the threshold qualifier value specified. 2135.It Li mask= Ns Ar qualifier 2136Many event specifiers for Intel P4 PMCs need to be additionally 2137qualified using a mask qualifier. 2138The allowed syntax for these qualifiers is event specific and is 2139described along with the events. 2140.It Li os 2141Configure the PMC to count when the CPL of the processor is 0. 2142.It Li precise 2143Select precise event based sampling. 2144Precise sampling is supported by the hardware for a limited set of 2145events. 2146.It Li tag= Ns Ar value 2147Configure the PMC to tag the internal uop selected by the other 2148fields in this event specifier with value 2149.Ar value . 2150This feature is used when cascading PMCs. 2151.It Li threshold= Ns Ar value 2152Configure the PMC to increment only when the event counts seen are 2153greater than the specified threshold value 2154.Ar value . 2155.It Li usr 2156Configure the PMC to count when the CPL of the processor is 1, 2 or 3. 2157.El 2158.Pp 2159If neither of the 2160.Dq Li os 2161or 2162.Dq Li usr 2163qualifiers are specified, the default is to enable both. 2164.Pp 2165On Intel Pentium 4 processors with HTT, events are 2166divided into two classes: 2167.Pp 2168.Bl -tag -width indent -compact 2169.It "TS Events" 2170are those where hardware can differentiate between events 2171generated on one logical processor from those generated on the 2172other. 2173.It "TI Events" 2174are those where hardware cannot differentiate between events 2175generated by multiple logical processors in a package. 2176.El 2177.Pp 2178Only TS events are allowed for use with process-mode PMCs on 2179Pentium-4/HTT CPUs. 2180.Pp 2181The event specifiers supported by Intel P4 PMCs are: 2182.Pp 2183.Bl -tag -width indent 2184.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags 2185.Pq "TI event" 2186Count integer SIMD SSE2 instructions that operate on 128 bit SIMD 2187operands. 2188Qualifier 2189.Ar flags 2190can take the following value (which is also the default): 2191.Pp 2192.Bl -tag -width indent -compact 2193.It Li all 2194Count all uops operating on 128 bit SIMD integer operands in memory or 2195XMM register. 2196.El 2197.Pp 2198If an instruction contains more than one 128 bit MMX uop, then each 2199uop will be counted. 2200.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags 2201.Pq "TI event" 2202Count MMX instructions that operate on 64 bit SIMD operands. 2203Qualifier 2204.Ar flags 2205can take the following value (which is also the default): 2206.Pp 2207.Bl -tag -width indent -compact 2208.It Li all 2209Count all uops operating on 64 bit SIMD integer operands in memory or 2210in MMX registers. 2211.El 2212.Pp 2213If an instruction contains more than one 64 bit MMX uop, then each 2214uop will be counted. 2215.It Li p4-b2b-cycles 2216.Pq "TI event" 2217Count back-to-back bys cycles. 2218Further documentation for this event is unavailable. 2219.It Li p4-bnr 2220.Pq "TI event" 2221Count bus-not-ready conditions. 2222Further documentation for this event is unavailable. 2223.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier 2224.Pq "TS event" 2225Count instruction fetch requests qualified by additional 2226flags specified in 2227.Ar qualifier . 2228At this point only one flag is supported: 2229.Pp 2230.Bl -tag -width indent -compact 2231.It Li tcmiss 2232Count trace cache lookup misses. 2233.El 2234.Pp 2235The default qualifier is also 2236.Dq Li mask=tcmiss . 2237.It Li p4-branch-retired Op Li ,mask= Ns Ar flags 2238.Pq "TS event" 2239Counts retired branches. 2240Qualifier 2241.Ar flags 2242is a list of the following 2243.Ql + 2244separated strings: 2245.Pp 2246.Bl -tag -width indent -compact 2247.It Li mmnp 2248Count branches not-taken and predicted. 2249.It Li mmnm 2250Count branches not-taken and mis-predicted. 2251.It Li mmtp 2252Count branches taken and predicted. 2253.It Li mmtm 2254Count branches taken and mis-predicted. 2255.El 2256.Pp 2257The default qualifier counts all four kinds of branches. 2258.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier 2259.Pq "TS event" 2260Count the number of entries (clipped at 15) currently active in the 2261BSQ. 2262Qualifier 2263.Ar qualifier 2264is a 2265.Ql + 2266separated set of the following flags: 2267.Pp 2268.Bl -tag -width indent -compact 2269.It Li req-type0 , Li req-type1 2270Forms a 2-bit number used to select the request type encoding: 2271.Pp 2272.Bl -tag -width indent -compact 2273.It Li 0 2274reads excluding read invalidate 2275.It Li 1 2276read invalidates 2277.It Li 2 2278writes other than writebacks 2279.It Li 3 2280writebacks 2281.El 2282.Pp 2283Bit 2284.Dq Li req-type1 2285is the MSB for this two bit number. 2286.It Li req-len0 , Li req-len1 2287Forms a two-bit number that specifies the request length encoding: 2288.Pp 2289.Bl -tag -width indent -compact 2290.It Li 0 22910 chunks 2292.It Li 1 22931 chunk 2294.It Li 3 22958 chunks 2296.El 2297.Pp 2298Bit 2299.Dq Li req-len1 2300is the MSB for this two bit number. 2301.It Li req-io-type 2302Count requests that are input or output requests. 2303.It Li req-lock-type 2304Count requests that lock the bus. 2305.It Li req-lock-cache 2306Count requests that lock the cache. 2307.It Li req-split-type 2308Count requests that is a bus 8-byte chunk that is split across an 23098-byte boundary. 2310.It Li req-dem-type 2311Count requests that are demand (not prefetches) if set. 2312Count requests that are prefetches if not set. 2313.It Li req-ord-type 2314Count requests that are ordered. 2315.It Li mem-type0 , Li mem-type1 , Li mem-type2 2316Forms a 3-bit number that specifies a memory type encoding: 2317.Pp 2318.Bl -tag -width indent -compact 2319.It Li 0 2320UC 2321.It Li 1 2322USWC 2323.It Li 4 2324WT 2325.It Li 5 2326WP 2327.It Li 6 2328WB 2329.El 2330.Pp 2331Bit 2332.Dq Li mem-type2 2333is the MSB of this 3-bit number. 2334.El 2335.Pp 2336The default qualifier has all the above bits set. 2337.Pp 2338Edge triggering using the 2339.Dq Li edge 2340qualifier should not be used with this event when counting cycles. 2341.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier 2342.Pq "TS event" 2343Count allocations in the bus sequence unit according to the flags 2344specified in 2345.Ar qualifier , 2346which is a 2347.Ql + 2348separated set of the following flags: 2349.Pp 2350.Bl -tag -width indent -compact 2351.It Li req-type0 , Li req-type1 2352Forms a 2-bit number used to select the request type encoding: 2353.Pp 2354.Bl -tag -width indent -compact 2355.It Li 0 2356reads excluding read invalidate 2357.It Li 1 2358read invalidates 2359.It Li 2 2360writes other than writebacks 2361.It Li 3 2362writebacks 2363.El 2364.Pp 2365Bit 2366.Dq Li req-type1 2367is the MSB for this two bit number. 2368.It Li req-len0 , Li req-len1 2369Forms a two-bit number that specifies the request length encoding: 2370.Pp 2371.Bl -tag -width indent -compact 2372.It Li 0 23730 chunks 2374.It Li 1 23751 chunk 2376.It Li 3 23778 chunks 2378.El 2379.Pp 2380Bit 2381.Dq Li req-len1 2382is the MSB for this two bit number. 2383.It Li req-io-type 2384Count requests that are input or output requests. 2385.It Li req-lock-type 2386Count requests that lock the bus. 2387.It Li req-lock-cache 2388Count requests that lock the cache. 2389.It Li req-split-type 2390Count requests that is a bus 8-byte chunk that is split across an 23918-byte boundary. 2392.It Li req-dem-type 2393Count requests that are demand (not prefetches) if set. 2394Count requests that are prefetches if not set. 2395.It Li req-ord-type 2396Count requests that are ordered. 2397.It Li mem-type0 , Li mem-type1 , Li mem-type2 2398Forms a 3-bit number that specifies a memory type encoding: 2399.Pp 2400.Bl -tag -width indent -compact 2401.It Li 0 2402UC 2403.It Li 1 2404USWC 2405.It Li 4 2406WT 2407.It Li 5 2408WP 2409.It Li 6 2410WB 2411.El 2412.Pp 2413Bit 2414.Dq Li mem-type2 2415is the MSB of this 3-bit number. 2416.El 2417.Pp 2418The default qualifier has all the above bits set. 2419.Pp 2420This event is usually used along with the 2421.Dq Li edge 2422qualifier to avoid multiple counting. 2423.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier 2424.Pq "TS event" 2425Count cache references as seen by the bus unit (2nd or 3rd level 2426cache references). 2427Qualifier 2428.Ar qualifier 2429is a 2430.Ql + 2431separated list of the following keywords: 2432.Pp 2433.Bl -tag -width indent -compact 2434.It Li rd-2ndl-hits 2435Count 2nd level cache hits in the shared state. 2436.It Li rd-2ndl-hite 2437Count 2nd level cache hits in the exclusive state. 2438.It Li rd-2ndl-hitm 2439Count 2nd level cache hits in the modified state. 2440.It Li rd-3rdl-hits 2441Count 3rd level cache hits in the shared state. 2442.It Li rd-3rdl-hite 2443Count 3rd level cache hits in the exclusive state. 2444.It Li rd-3rdl-hitm 2445Count 3rd level cache hits in the modified state. 2446.It Li rd-2ndl-miss 2447Count 2nd level cache misses. 2448.It Li rd-3rdl-miss 2449Count 3rd level cache misses. 2450.It Li wr-2ndl-miss 2451Count write-back lookups from the data access cache that miss the 2nd 2452level cache. 2453.El 2454.Pp 2455The default is to count all the above events. 2456.It Li p4-execution-event Op Li ,mask= Ns Ar flags 2457.Pq "TS event" 2458Count the retirement of tagged uops selected through the execution 2459tagging mechanism. 2460Qualifier 2461.Ar flags 2462can contain the following strings separated by 2463.Ql + 2464characters: 2465.Pp 2466.Bl -tag -width indent -compact 2467.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3 2468The marked uops are not bogus. 2469.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3 2470The marked uops are bogus. 2471.El 2472.Pp 2473This event requires additional (upstream) events to be allocated to 2474perform the desired uop tagging. 2475The default is to set all the above flags. 2476This event can be used for precise event based sampling. 2477.It Li p4-front-end-event Op Li ,mask= Ns Ar flags 2478.Pq "TS event" 2479Count the retirement of tagged uops selected through the front-end 2480tagging mechanism. 2481Qualifier 2482.Ar flags 2483can contain the following strings separated by 2484.Ql + 2485characters: 2486.Pp 2487.Bl -tag -width indent -compact 2488.It Li nbogus 2489The marked uops are not bogus. 2490.It Li bogus 2491The marked uops are bogus. 2492.El 2493.Pp 2494This event requires additional (upstream) events to be allocated to 2495perform the desired uop tagging. 2496The default is to select both kinds of events. 2497This event can be used for precise event based sampling. 2498.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags 2499.Pq "TI event" 2500Count each DBSY or DRDY event selected by qualifier 2501.Ar flags . 2502Qualifier 2503.Ar flags 2504is a 2505.Ql + 2506separated set of the following flags: 2507.Pp 2508.Bl -tag -width indent -compact 2509.It Li drdy-drv 2510Count when this processor is driving data onto the bus. 2511.It Li drdy-own 2512Count when this processor is reading data from the bus. 2513.It Li drdy-other 2514Count when data is on the bus but not being sampled by this processor. 2515.It Li dbsy-drv 2516Count when this processor reserves the bus for use in the next cycle 2517in order to drive data. 2518.It Li dbsy-own 2519Count when some agent reserves the bus for use in the next bus cycle 2520to drive data that this processor will sample. 2521.It Li dbsy-other 2522Count when some agent reserves the bus for use in the next bus cycle 2523to drive data that this processor will not sample. 2524.El 2525.Pp 2526Flags 2527.Dq Li drdy-own 2528and 2529.Dq Li drdy-other 2530are mutually exclusive. 2531Flags 2532.Dq Li dbsy-own 2533and 2534.Dq Li dbsy-other 2535are mutually exclusive. 2536The default value for 2537.Ar qualifier 2538is 2539.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own . 2540.It Li p4-global-power-events Op Li ,mask= Ns Ar flags 2541.Pq "TS event" 2542Count cycles during which the processor is not stopped. 2543Qualifier 2544.Ar flags 2545can take the following value (which is also the default): 2546.Pp 2547.Bl -tag -width indent -compact 2548.It Li running 2549Count cycles when the processor is active. 2550.El 2551.Pp 2552.It Li p4-instr-retired Op Li ,mask= Ns Ar flags 2553.Pq "TS event" 2554Count instructions retired during a clock cycle. 2555Qualifer 2556.Ar flags 2557comprises of the following strings separated by 2558.Ql + 2559characters: 2560.Pp 2561.Bl -tag -width indent -compact 2562.It Li nbogusntag 2563Count non-bogus instructions that are not tagged. 2564.It Li nbogustag 2565Count non-bogus instructions that are tagged. 2566.It Li bogusntag 2567Count bogus instructions that are not tagged. 2568.It Li bogustag 2569Count bogus instructions that are tagged. 2570.El 2571.Pp 2572The default qualifier counts all the above kinds of instructions. 2573.It Li p4-ioq-active-entries Xo 2574.Op Li ,mask= Ns Ar qualifier 2575.Op Li ,busreqtype= Ns Ar req-type 2576.Xc 2577.Pq "TS event" 2578Count the number of entries (clipped at 15) in the IOQ that are 2579active. 2580The event masks are specified by qualifier 2581.Ar qualifier 2582and 2583.Ar req-type . 2584.Pp 2585Qualifier 2586.Ar qualifier 2587is a 2588.Ql + 2589separated set of the following flags: 2590.Pp 2591.Bl -tag -width indent -compact 2592.It Li all-read 2593Count read entries. 2594.It Li all-write 2595Count write entries. 2596.It Li mem-uc 2597Count entries accessing uncacheable memory. 2598.It Li mem-wc 2599Count entries accessing write-combining memory. 2600.It Li mem-wt 2601Count entries accessing write-through memory. 2602.It Li mem-wp 2603Count entries accessing write-protected memory 2604.It Li mem-wb 2605Count entries accessing write-back memory. 2606.It Li own 2607Count store requests driven by the processor (i.e., not by other 2608processors or by DMA). 2609.It Li other 2610Count store requests driven by other processors or by DMA. 2611.It Li prefetch 2612Include hardware and software prefetch requests in the count. 2613.El 2614.Pp 2615The default value for 2616.Ar qualifier 2617is to enable all the above flags. 2618.Pp 2619The 2620.Ar req-type 2621qualifier is a 5-bit number can be additionally used to select a 2622specific bus request type. 2623The default is 0. 2624.Pp 2625The 2626.Dq Li edge 2627qualifier should not be used when counting cycles with this event. 2628The exact behaviour of this event depends on the processor revision. 2629.It Li p4-ioq-allocation Xo 2630.Op Li ,mask= Ns Ar qualifier 2631.Op Li ,busreqtype= Ns Ar req-type 2632.Xc 2633.Pq "TS event" 2634Count various types of transactions on the bus matching the flags set 2635in 2636.Ar qualifier 2637and 2638.Ar req-type . 2639.Pp 2640Qualifier 2641.Ar qualifier 2642is a 2643.Ql + 2644separated set of the following flags: 2645.Pp 2646.Bl -tag -width indent -compact 2647.It Li all-read 2648Count read entries. 2649.It Li all-write 2650Count write entries. 2651.It Li mem-uc 2652Count entries accessing uncacheable memory. 2653.It Li mem-wc 2654Count entries accessing write-combining memory. 2655.It Li mem-wt 2656Count entries accessing write-through memory. 2657.It Li mem-wp 2658Count entries accessing write-protected memory 2659.It Li mem-wb 2660Count entries accessing write-back memory. 2661.It Li own 2662Count store requests driven by the processor (i.e., not by other 2663processors or by DMA). 2664.It Li other 2665Count store requests driven by other processors or by DMA. 2666.It Li prefetch 2667Include hardware and software prefetch requests in the count. 2668.El 2669.Pp 2670The default value for 2671.Ar qualifier 2672is to enable all the above flags. 2673.Pp 2674The 2675.Ar req-type 2676qualifier is a 5-bit number can be additionally used to select a 2677specific bus request type. 2678The default is 0. 2679.Pp 2680The 2681.Dq Li edge 2682qualifier is normally used with this event to prevent multiple 2683counting. 2684The exact behaviour of this event depends on the processor revision. 2685.It Li p4-itlb-reference Op mask= Ns Ar qualifier 2686.Pq "TS event" 2687Count translations using the intruction translation look-aside 2688buffer. 2689The 2690.Ar qualifier 2691argument is a list of the following strings separated by 2692.Ql + 2693characters. 2694.Pp 2695.Bl -tag -width indent -compact 2696.It Li hit 2697Count ITLB hits. 2698.It Li miss 2699Count ITLB misses. 2700.It Li hit-uc 2701Count uncacheable ITLB hits. 2702.El 2703.Pp 2704If no 2705.Ar qualifier 2706is specified the default is to count all the three kinds of ITLB 2707translations. 2708.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier 2709.Pq "TS event" 2710Count replayed events at the load port. 2711Qualifier 2712.Ar qualifier 2713can take on one value: 2714.Pp 2715.Bl -tag -width indent -compact 2716.It Li split-ld 2717Count split loads. 2718.El 2719.Pp 2720The default value for 2721.Ar qualifier 2722is 2723.Dq Li split-ld . 2724.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags 2725.Pq "TS event" 2726Count mispredicted IA-32 branch instructions. 2727Qualifier 2728.Ar flags 2729can take the following value (which is also the default): 2730.Pp 2731.Bl -tag -width indent -compact 2732.It Li nbogus 2733Count non-bogus retired branch instructions. 2734.El 2735.It Li p4-machine-clear Op Li ,mask= Ns Ar flags 2736.Pq "TS event" 2737Count the number of pipeline clears seen by the processor. 2738Qualifer 2739.Ar flags 2740is a list of the following strings separated by 2741.Ql + 2742characters: 2743.Pp 2744.Bl -tag -width indent -compact 2745.It Li clear 2746Count for a portion of the many cycles when the machine is being 2747cleared for any reason. 2748.It Li moclear 2749Count machine clears due to memory ordering issues. 2750.It Li smclear 2751Count machine clears due to self-modifying code. 2752.El 2753.Pp 2754Use qualifier 2755.Dq Li edge 2756to get a count of occurrences of machine clears. 2757The default qualifier is 2758.Dq Li clear . 2759.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list 2760.Pq "TS event" 2761Count the cancelling of various kinds of requests in the data cache 2762address control unit of the CPU. 2763The qualifier 2764.Ar event-list 2765is a list of the following strings separated by 2766.Ql + 2767characters: 2768.Pp 2769.Bl -tag -width indent -compact 2770.It Li st-rb-full 2771Requests cancelled because no store request buffer was available. 2772.It Li 64k-conf 2773Requests that conflict due to 64K aliasing. 2774.El 2775.Pp 2776If 2777.Ar event-list 2778is not specified, then the default is to count both kinds of events. 2779.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list 2780.Pq "TS event" 2781Count the completion of load split, store split, uncacheable split and 2782uncacheable load operations selected by qualifier 2783.Ar event-list . 2784The qualifier 2785.Ar event-list 2786is a 2787.Ql + 2788separated list of the following flags: 2789.Pp 2790.Bl -tag -width indent -compact 2791.It Li lsc 2792Count load splits completed, excluding loads from uncacheable or 2793write-combining areas. 2794.It Li ssc 2795Count any split stores completed. 2796.El 2797.Pp 2798The default is to count both kinds of operations. 2799.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier 2800.Pq "TS event" 2801Count load replays triggered by the memory order buffer. 2802Qualifier 2803.Ar qualifier 2804can be a 2805.Ql + 2806separated list of the following flags: 2807.Pp 2808.Bl -tag -width indent -compact 2809.It Li no-sta 2810Count replays because of unknown store addresses. 2811.It Li no-std 2812Count replays because of unknown store data. 2813.It Li partial-data 2814Count replays because of partially overlapped data accesses between 2815load and store operations. 2816.It Li unalgn-addr 2817Count replays because of mismatches in the lower 4 bits of load and 2818store operations. 2819.El 2820.Pp 2821The default qualifier is 2822.Ar no-sta+no-std+partial-data+unalgn-addr . 2823.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags 2824.Pq "TI event" 2825Count packed double-precision uops. 2826Qualifier 2827.Ar flags 2828can take the following value (which is also the default): 2829.Pp 2830.Bl -tag -width indent -compact 2831.It Li all 2832Count all uops operating on packed double-precision operands. 2833.El 2834.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags 2835.Pq "TI event" 2836Count packed single-precision uops. 2837Qualifier 2838.Ar flags 2839can take the following value (which is also the default): 2840.Pp 2841.Bl -tag -width indent -compact 2842.It Li all 2843Count all uops operating on packed single-precision operands. 2844.El 2845.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier 2846.Pq "TI event" 2847Count page walks performed by the page miss handler. 2848Qualifier 2849.Ar qualifier 2850can be a 2851.Ql + 2852separated list of the following keywords: 2853.Pp 2854.Bl -tag -width indent -compact 2855.It Li dtmiss 2856Count page walks for data TLB misses. 2857.It Li itmiss 2858Count page walks for instruction TLB misses. 2859.El 2860.Pp 2861The default value for 2862.Ar qualifier 2863is 2864.Dq Li dtmiss+itmiss . 2865.It Li p4-replay-event Op Li ,mask= Ns Ar flags 2866.Pq "TS event" 2867Count the retirement of tagged uops selected through the replay 2868tagging mechanism. 2869Qualifier 2870.Ar flags 2871contains a 2872.Ql + 2873separated set of the following strings: 2874.Pp 2875.Bl -tag -width indent -compact 2876.It Li nbogus 2877The marked uops are not bogus. 2878.It Li bogus 2879The marked uops are bogus. 2880.El 2881.Pp 2882This event requires additional (upstream) events to be allocated to 2883perform the desired uop tagging. 2884The default qualifier counts both kinds of uops. 2885This event can be used for precise event based sampling. 2886.It Li p4-resource-stall Op Li ,mask= Ns Ar flags 2887.Pq "TS event" 2888Count the occurrence or latency of stalls in the allocator. 2889Qualifier 2890.Ar flags 2891can take the following value (which is also the default): 2892.Pp 2893.Bl -tag -width indent -compact 2894.It Li sbfull 2895A stall due to the lack of store buffers. 2896.El 2897.It Li p4-response 2898.Pq "TI event" 2899Count different types of responses. 2900Further documentation on this event is not available. 2901.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags 2902.Pq "TS event" 2903Count branches retired. 2904Qualifier 2905.Ar flags 2906contains a 2907.Ql + 2908separated list of strings: 2909.Pp 2910.Bl -tag -width indent -compact 2911.It Li conditional 2912Count conditional jumps. 2913.It Li call 2914Count direct and indirect call branches. 2915.It Li return 2916Count return branches. 2917.It Li indirect 2918Count returns, indirect calls or indirect jumps. 2919.El 2920.Pp 2921The default qualifier counts all the above branch types. 2922.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags 2923.Pq "TS event" 2924Count mispredicted branches retired. 2925Qualifier 2926.Ar flags 2927contains a 2928.Ql + 2929separated list of strings: 2930.Pp 2931.Bl -tag -width indent -compact 2932.It Li conditional 2933Count conditional jumps. 2934.It Li call 2935Count indirect call branches. 2936.It Li return 2937Count return branches. 2938.It Li indirect 2939Count returns, indirect calls or indirect jumps. 2940.El 2941.Pp 2942The default qualifier counts all the above branch types. 2943.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags 2944.Pq "TI event" 2945Count the number of scalar double-precision uops. 2946Qualifier 2947.Ar flags 2948can take the following value (which is also the default): 2949.Pp 2950.Bl -tag -width indent -compact 2951.It Li all 2952Count the number of scalar double-precision uops. 2953.El 2954.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags 2955.Pq "TI event" 2956Count the number of scalar single-precision uops. 2957Qualifier 2958.Ar flags 2959can take the following value (which is also the default): 2960.Pp 2961.Bl -tag -width indent -compact 2962.It Li all 2963Count all uops operating on scalar single-precision operands. 2964.El 2965.It Li p4-snoop 2966.Pq "TI event" 2967Count snoop traffic. 2968Further documentation on this event is not available. 2969.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags 2970.Pq "TI event" 2971Count the number of times an assist is required to handle problems 2972with the operands for SSE and SSE2 operations. 2973Qualifier 2974.Ar flags 2975can take the following value (which is also the default): 2976.Pp 2977.Bl -tag -width indent -compact 2978.It Li all 2979Count assists for all SSE and SSE2 uops. 2980.El 2981.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier 2982.Pq "TS event" 2983Count events replayed at the store port. 2984Qualifier 2985.Ar qualifier 2986can take on one value: 2987.Pp 2988.Bl -tag -width indent -compact 2989.It Li split-st 2990Count split stores. 2991.El 2992.Pp 2993The default value for 2994.Ar qualifier 2995is 2996.Dq Li split-st . 2997.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier 2998.Pq "TI event" 2999Count the duration in cycles of operating modes of the trace cache and 3000decode engine. 3001The desired operating mode is selected by 3002.Ar qualifier , 3003which is a list of the following strings separated by 3004.Ql + 3005characters: 3006.Pp 3007.Bl -tag -width indent -compact 3008.It Li DD 3009Both logical processors are in deliver mode. 3010.It Li DB 3011Logical processor 0 is in deliver mode while logical processor 1 is in 3012build mode. 3013.It Li DI 3014Logical processor 0 is in deliver mode while logical processor 1 is 3015halted, or in machine clear, or transitioning to a long microcode 3016flow. 3017.It Li BD 3018Logical processor 0 is in build mode while logical processor 1 is in 3019deliver mode. 3020.It Li BB 3021Both logical processors are in build mode. 3022.It Li BI 3023Logical processor 0 is in build mode while logical processor 1 is 3024halted, or in machine clear or transitioning to a long microcode 3025flow. 3026.It Li ID 3027Logical processor 0 is halted, or in machine clear or transitioning to 3028a long microcode flow while logical processor 1 is in deliver mode. 3029.It Li IB 3030Logical processor 0 is halted, or in machine clear or transitioning to 3031a long microcode flow while logical processor 1 is in build mode. 3032.El 3033.Pp 3034If there is only one logical processor in the processor package then 3035the qualifier for logical processor 1 is ignored. 3036If no qualifier is specified, the default qualifier is 3037.Dq Li DD+DB+DI+BD+BB+BI+ID+IB . 3038.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags 3039.Pq "TI event" 3040Count the number of times uop delivery changed from the trace cache to 3041MS ROM. 3042Qualifier 3043.Ar flags 3044can take the following value (which is also the default): 3045.Pp 3046.Bl -tag -width indent -compact 3047.It Li cisc 3048Count TC to MS transfers. 3049.El 3050.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags 3051.Pq "TS event" 3052Count the number of valid uops written to the uop queue. 3053Qualifier 3054.Ar flags 3055is a list of the following strings, separated by 3056.Ql + 3057characters: 3058.Pp 3059.Bl -tag -width indent -compact 3060.It Li from-tc-build 3061Count uops being written from the trace cache in build mode. 3062.It Li from-tc-deliver 3063Count uops being written from the trace cache in deliver mode. 3064.It Li from-rom 3065Count uops being written from microcode ROM. 3066.El 3067.Pp 3068The default qualifier counts all the above kinds of uops. 3069.It Li p4-uop-type Op Li ,mask= Ns Ar flags 3070.Pq "TS event" 3071This event is used in conjunction with the front-end at-retirement 3072mechanism to tag load and store uops. 3073Qualifer 3074.Ar flags 3075comprises the following strings separated by 3076.Ql + 3077characters: 3078.Pp 3079.Bl -tag -width indent -compact 3080.It Li tagloads 3081Mark uops that are load operations. 3082.It Li tagstores 3083Mark uops that are store operations. 3084.El 3085.Pp 3086The default qualifier counts both kinds of uops. 3087.It Li p4-uops-retired Op Li ,mask= Ns Ar flags 3088.Pq "TS event" 3089Count uops retired during a clock cycle. 3090Qualifier 3091.Ar flags 3092comprises the following strings separated by 3093.Ql + 3094characters: 3095.Pp 3096.Bl -tag -width indent -compact 3097.It Li nbogus 3098Count marked uops that are not bogus. 3099.It Li bogus 3100Count marked uops that are bogus. 3101.El 3102.Pp 3103The default qualifier counts both kinds of uops. 3104.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags 3105.Pq "TI event" 3106Count write-combining buffer operations. 3107Qualifier 3108.Ar flags 3109contains the following strings separated by 3110.Ql + 3111characters: 3112.Pp 3113.Bl -tag -width indent -compact 3114.It Li wcb-evicts 3115WC buffer evictions due to any cause. 3116.It Li wcb-full-evict 3117WC buffer evictions due to no WC buffer being available. 3118.El 3119.Pp 3120The default qualifer counts both kinds of evictions. 3121.It Li p4-x87-assist Op Li ,mask= Ns Ar flags 3122.Pq "TS event" 3123Count the retirement of x87 instructions that required special 3124handling. 3125Qualifier 3126.Ar flags 3127contains the following strings separated by 3128.Ql + 3129characters: 3130.Pp 3131.Bl -tag -width indent -compact 3132.It Li fpsu 3133Count instructions that saw an FP stack underflow. 3134.It Li fpso 3135Count instructions that saw an FP stack overflow. 3136.It Li poao 3137Count instructions that saw an x87 output overflow. 3138.It Li poau 3139Count instructions that saw an x87 output underflow. 3140.It Li prea 3141Count instructions that needed an x87 input assist. 3142.El 3143.Pp 3144The default qualifier counts all the above types of instruction 3145retirements. 3146.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags 3147.Pq "TI event" 3148Count x87 floating-point uops. 3149Qualifier 3150.Ar flags 3151can take the following value (which is also the default): 3152.Pp 3153.Bl -tag -width indent -compact 3154.It Li all 3155Count all x87 floating-point uops. 3156.El 3157.Pp 3158If an instruction contains more than one x87 floating-point uops, then 3159all x87 floating-point uops will be counted. 3160This event does not count x87 floating-point data movement operations. 3161.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags 3162.Pq "TI event" 3163Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store 3164data or perform register-to-register moves. 3165This event does not count integer move uops. 3166Qualifier 3167.Ar flags 3168may contain the following keywords separated by 3169.Ql + 3170characters: 3171.Pp 3172.Bl -tag -width indent -compact 3173.It Li allp0 3174Count all x87 and SIMD store and move uops. 3175.It Li allp2 3176Count all x87 and SIMD load uops. 3177.El 3178.Pp 3179The default is to count all uops. 3180.Pq Errata 3181This event may be affected by processor errata N43. 3182.El 3183.Ss "Cascading P4 PMCs" 3184PMC cascading support is currently poorly implemented. 3185While individual event counters may be allocated with a 3186.Dq Li cascade 3187qualifier, the current API does not offer the ability 3188to name and allocate all the resources needed for a 3189cascaded event counter pair in a single operation. 3190.Ss "Precise Event Based Sampling" 3191Support for precise event based sampling is currently 3192unimplemented. 3193.Sh COMPATIBILITY 3194The interface between the 3195.Nm pmc 3196library and the 3197.Xr hwpmc 4 3198driver is intended to be private to the implementation and may 3199change. 3200In order to ease forward compatibility with future versions of the 3201.Xr hwpmc 4 3202driver, applications are urged to dynamically link with the 3203.Nm pmc 3204library. 3205.Pp 3206The 3207.Nm pmc 3208API is 3209.Ud 3210.Sh SEE ALSO 3211.Xr pmclog 3 , 3212.Xr hwpmc 4 , 3213.Xr pmccontrol 8 , 3214.Xr pmcstat 8 3215.Sh HISTORY 3216The 3217.Nm pmc 3218library first appeared in 3219.Fx 6.0 . 3220.Sh AUTHORS 3221The 3222.Lb libpmc 3223library was written by 3224.An "Joseph Koshy" 3225.Aq jkoshy@FreeBSD.org . 3226