1.\" Copyright (c) 2003-2007 Joseph Koshy. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" This software is provided by Joseph Koshy ``as is'' and 13.\" any express or implied warranties, including, but not limited to, the 14.\" implied warranties of merchantability and fitness for a particular purpose 15.\" are disclaimed. in no event shall Joseph Koshy be liable 16.\" for any direct, indirect, incidental, special, exemplary, or consequential 17.\" damages (including, but not limited to, procurement of substitute goods 18.\" or services; loss of use, data, or profits; or business interruption) 19.\" however caused and on any theory of liability, whether in contract, strict 20.\" liability, or tort (including negligence or otherwise) arising in any way 21.\" out of the use of this software, even if advised of the possibility of 22.\" such damage. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd November 25, 2007 27.Os 28.Dt PMC 3 29.Sh NAME 30.Nm pmc 31.Nd library for accessing hardware performance monitoring counters 32.Sh LIBRARY 33.Lb libpmc 34.Sh SYNOPSIS 35.In pmc.h 36.Sh DESCRIPTION 37The 38.Lb libpmc 39provides a programming interface that allows applications to use 40hardware performance counters to gather performance data about 41specific processes or for the system as a whole. 42The library is implemented using the lower-level facilities offered by 43the 44.Xr hwpmc 4 45driver. 46.Ss Key Concepts 47Performance monitoring counters (PMCs) are represented by the library 48using a software abstraction. 49These 50.Dq abstract 51PMCs can have one two scopes: 52.Bl -bullet 53.It 54System scope. 55These PMCs measure events in a whole-system manner, i.e., independent 56of the currently executing thread. 57System scope PMCs are allocated on specific CPUs and do not 58migrate between CPUs. 59Non-privileged process are allowed to allocate system scope PMCs if the 60.Xr hwpmc 4 61sysctl tunable: 62.Va security.bsd.unprivileged_syspmcs 63is non-zero. 64.It 65Process scope. 66These PMCs only measure hardware events when the processes they are 67attached to are executing on a CPU. 68In an SMP system, process scope PMCs migrate between CPUs along with 69their target processes. 70.El 71.Pp 72Orthogonal to PMC scope, PMCs may be allocated in one of two 73operational modes: 74.Bl -bullet 75.It 76Counting PMCs measure events according to their scope 77(system or process). 78The application needs to explicitly read these counters 79to retrieve their value. 80.It 81Sampling PMCs cause the CPU to be periodically interrupted 82and information about its state of execution to be collected. 83Sampling PMCs are used to profile specific processes and kernel 84threads or to profile the system as a whole. 85.El 86.Pp 87The scope and operational mode for a software PMC are specified at 88PMC allocation time. 89An application is allowed to allocate multiple PMCs subject 90to availability of hardware resources. 91.Pp 92The library uses human-readable strings to name the event being 93measured by hardware. 94The syntax used for specifying a hardware event along with additional 95event specific qualifiers (if any) is described in detail in section 96.Sx "EVENT SPECIFIERS" 97below. 98.Pp 99PMCs are associated with the process that allocated them and 100will be automatically reclaimed by the system when the process exits. 101Additionally, process-scope PMCs have to be attached to one or more 102target processes before they can perform measurements. 103A process-scope PMC may be attached to those target processes 104that its owner process would otherwise be permitted to debug. 105An owner process may attach PMCs to itself allowing 106it to measure its own behavior. 107Additionally, on some machine architectures, such self-attached PMCs 108may be read cheaply using specialized instructions supported by the 109processor. 110.Pp 111Certain kinds of PMCs require that a log file be configured before 112they may be started. 113These include: 114.Bl -bullet -compact 115.It 116System scope sampling PMCs. 117.It 118Process scope sampling PMCs. 119.It 120Process scope counting PMCs that have been configured to report PMC 121readings on process context switches or process exits. 122.El 123Upto one log file may be configured per owner process. 124Events logged to a log file may be subsequently analyzed using the 125.Xr pmclog 3 126family of functions. 127.Ss Supported CPUs 128The CPUs known to the PMC library are named by the 129.Vt "enum pmc_cputype" 130enumeration. 131Supported CPUs include: 132.Bl -tag -width PMC_CPU_INTEL_PIII -compact 133.It PMC_CPU_AMD_K7 134.Tn "AMD Athlon" 135CPUs. 136.It PMC_CPU_AMD_K8 137.Tn "AMD Athlon64" 138CPUs. 139.It PMC_CPU_INTEL_P6 140.Tn Intel 141.Tn "Pentium Pro" 142CPUs. 143.It PMC_CPU_INTEL_PII 144.Tn "Intel Pentium II" 145CPUs. 146.It PMC_CPU_INTEL_PIII 147.Tn "Intel Pentium III" 148CPUs. 149.It PMC_CPU_INTEL_PM 150.Tn "Intel Pentium M" 151CPUs. 152.It PMC_CPU_INTEL_PIV 153.Tn "Intel Pentium 4" 154CPUs. 155.El 156.Ss Supported PMCs 157PMC supported by this library are named by the 158.Vt enum pmc_class 159enumeration. 160Supported PMC kinds include: 161.Bl -tag -width PMC_CLASS_TSC -compact 162.It PMC_CLASS_TSC 163The timestamp counter on i386 and amd64 architecture CPUs. 164.It PMC_CLASS_K7 165Programmable hardware counters present in 166.Tn "AMD Athlon" 167CPUs. 168.It PMC_CLASS_K8 169Programmable hardware counters present in 170.Tn "AMD Athlon64" 171CPUs. 172.It PMC_CLASS_P6 173Programmable hardware counters present in 174.Tn Intel 175.Tn "Pentium Pro" , 176.Tn "Pentium II" , 177.Tn "Pentium III" , 178.Tn "Celeron" , 179and 180.Tn "Pentium M" 181CPUs. 182.It PMC_CLASS_P4 183Programmable hardware counters present in 184.Tn "Intel Pentium 4" 185CPUs. 186.El 187.Ss PMC Capabilities 188.Pp 189Capabilities of performance monitoring hardware are denoted using 190the 191.Vt "enum pmc_caps" 192enumeration. 193Supported capabilities include: 194.Bl -tag -width "PMC_CAP_INTERRUPT" -compact 195.It PMC_CAP_EDGE 196The ability to count negated to asserted transitions of the hardware 197conditions being probed for. 198.It PMC_CAP_INTERRUPT 199The ability to interrupt the CPU. 200.It PMC_CAP_INVERT 201The ability to invert the sense of the hardware conditions being 202measured. 203.It PMC_CAP_READ 204PMC hardware allows the CPU to read performance counters. 205.It PMC_CAP_QUALIFIER 206The hardware allows monitored to be further qualified in some 207system dependent way. 208.It PMC_CAP_SYSTEM 209The ability to restrict counting of hardware events to when the CPU is 210running privileged code. 211.It PMC_CAP_THRESHOLD 212The ability to ignore simultaneous hardware events below a 213programmable threshold. 214.It PMC_CAP_USER 215The ability to restrict counting of hardware events to those when the 216CPU is running unprivileged code. 217.It PMC_CAP_WRITE 218PMC hardware allows CPUs write to counters. 219.El 220.Ss Functional Grouping 221This section contains a brief overview of the available functionality 222in the PMC library. 223Each function listed here is described further in its own manual page. 224.Bl -tag -width indent 225.It Administration 226.Bl -tag -compact 227.It Fn pmc_disable , Fn pmc_enable 228Administratively disable (enable) specific performance monitoring 229counter hardware. 230Counters that are disabled will not be available to applications to 231use. 232.El 233.It "Convenience Functions" 234.Bl -tag -compact 235.It Fn pmc_event_names_of_class 236Returns a list of event names supported by a given PMC type. 237.It Fn pmc_name_of_capability 238Convert a 239.Dv PMC_CAP_* 240flag to a human-readable string. 241.It Fn pmc_name_of_class 242Convert a 243.Dv PMC_CLASS_* 244constant to a human-readable string. 245.It Fn pmc_name_of_cputype 246Return a human-readable name for a CPU type. 247.It Fn pmc_name_of_disposition 248Return a human-readable string describing a PMC's disposition. 249.It Fn pmc_name_of_event 250Convert a numeric event code to a human-readable string. 251.It Fn pmc_name_of_mode 252Convert a 253.Dv PMC_MODE_* 254constant to a human-readable name. 255.It Fn pmc_name_of_state 256Return a human-readable string describing a PMC's current state. 257.El 258.It "Library Initialization" 259.Bl -tag -compact 260.It Fn pmc_init 261Initialize the library. 262This function must be called before any other library function. 263.El 264.It "Log File Handling" 265.Bl -tag -compact 266.It Fn pmc_configure_logfile 267Configure a log file for 268.Xr hwpmc 4 269to write logged events to. 270.It Fn pmc_flush_logfile 271Flush all pending log data in 272.Xr hwpmc 4 Ns Ap s 273buffers. 274.It Fn pmc_writelog 275Append arbitrary user data to the current log file. 276.El 277.It "PMC Management" 278.Bl -tag -compact 279.It Fn pmc_allocate , Fn pmc_release 280Allocate (free) a PMC. 281.It Fn pmc_attach , Fn pmc_detach 282Attach (detach) a process scope PMC to a target. 283.It Fn pmc_read , Fn pmc_write , Fn pmc_rw 284Read (write) a value from (to) a PMC. 285.It Fn pmc_start , Fn pmc_stop 286Start (stop) a software PMC. 287.It Fn pmc_set 288Set the reload value for a sampling PMC. 289.El 290.It "Queries" 291.Bl -tag -compact 292.It Fn pmc_capabilities 293Retrieve the capabilities for a given PMC. 294.It Fn pmc_cpuinfo 295Retrieve information about the CPUs and PMC hardware present in the 296system. 297.It Fn pmc_get_driver_stats 298Retrieve statistics maintained by 299.Xr hwpmc 4 . 300.It Fn pmc_ncpu 301Determine the number of CPUs in the system. 302.It Fn pmc_npmc 303Return the number of hardware PMCs present in a given CPU. 304.It Fn pmc_pmcinfo 305Return information about the state of a given CPU's PMCs. 306.It Fn pmc_width 307Determine the width of a hardware counter in bits. 308.El 309.It "x86 Architecture Specific API" 310.Bl -tag -compact 311.It Fn pmc_get_msr 312Returns the processor model specific register number 313associated with 314.Fa pmc . 315Applications may then use the x86 316.Ic RDPMC 317instruction to directly read the contents of the PMC. 318.El 319.El 320.Ss Signal Handling Requirements 321Applications using PMCs are required to handle the following signals: 322.Bl -tag -width ".Dv SIGBUS" 323.It Dv SIGBUS 324When the 325.Xr hwpmc 4 326module is unloaded using 327.Xr kldunload 8 , 328processes that have PMCs allocated to them will be sent a 329.Dv SIGBUS 330signal. 331.It Dv SIGIO 332The 333.Xr hwpmc 4 334driver will send a PMC owning process a 335.Dv SIGIO 336signal if: 337.Bl -bullet 338.It 339If any process-mode PMC allocated by it loses all its 340target processes. 341.It 342If the driver encounters an error when writing log data to a 343configured log file. 344This error may be retrieved by a subsequent call to 345.Fn pmc_flush_logfile . 346.El 347.El 348.Ss Typical Program Flow 349.Bl -enum 350.It 351An application would first invoke function 352.Fn pmc_init 353to allow the library to initialize itself. 354.It 355Signal handling would then be set up. 356.It 357Next the application would allocate the PMCs it desires using function 358.Fn pmc_allocate . 359.It 360Initial values for PMCs may be set using function 361.Fn pmc_set . 362.It 363If a log file is necessary for the PMCs to work, it would 364be configured using function 365.Fn pmc_configure_logfile . 366.It 367Process scope PMCs would then be attached to their target processes 368using function 369.Fn pmc_attach . 370.It 371The PMCs would then be started using function 372.Fn pmc_start . 373.It 374Once started, the values of counting PMCs may be read using function 375.Fn pmc_start . 376For PMCs that write events to the log file, this logged data would be 377read and parsed using the 378.Xr pmclog 3 379family of functions. 380.It 381PMCs are stopped using function 382.Fn pmc_stop , 383and process scope PMCs are detached from their targets using 384function 385.Fn pmc_detach . 386.It 387Before the process exits, its may release its PMCs using function 388.Fn pmc_release . 389Any configured log file may be closed using function 390.Fn pmc_configure_logfile . 391.El 392.Sh EVENT SPECIFIERS 393Event specifiers are strings comprising of an event name, followed by 394optional parameters modifying the semantics of the hardware event 395being probed. 396Event names are PMC architecture dependent, but the PMC library defines 397machine independent aliases for commonly used events. 398.Ss Event Name Aliases 399Event name aliases are CPU architecture independent names for commonly 400used events. 401The following aliases are known to this version of the 402.Nm pmc 403library: 404.Bl -tag -width indent 405.It Li branches 406Measure the number of branches retired. 407.It Li branch-mispredicts 408Measure the number of retired branches that were mispredicted. 409.It Li cycles 410Measure processor cycles. 411This event is implemented using the processor's Time Stamp Counter 412register. 413.It Li dc-misses 414Measure the number of data cache misses. 415.It Li ic-misses 416Measure the number of instruction cache misses. 417.It Li instructions 418Measure the number of instructions retired. 419.It Li interrupts 420Measure the number of interrupts seen. 421.It Li unhalted-cycles 422Measure the number of cycles the processor is not in a halted 423or sleep state. 424.El 425.Ss Time Stamp Counter (TSC) 426The timestamp counter is a monotonically non-decreasing counter that 427counts processor cycles. 428.Pp 429In the i386 architecture, this counter may 430be selected by requesting an event with event specifier 431.Dq Li tsc . 432The 433.Dq Li tsc 434event does not support any further qualifiers. 435It can only be allocated in system-wide counting mode, 436and is a read-only counter. 437Multiple processes are allowed to allocate the TSC. 438Once allocated, it may be read using the 439.Fn pmc_read 440function, or by using the RDTSC instruction. 441.Ss AMD (K7) PMCs 442These PMCs are present in the 443.Tn "AMD Athlon" 444series of CPUs and are documented in: 445.Rs 446.%B "AMD Athlon Processor x86 Code Optimization Guide" 447.%N "Publication No. 22007" 448.%D "February 2002" 449.%Q "Advanced Micro Devices, Inc." 450.Re 451.Pp 452Event specifiers for AMD K7 PMCs can have the following optional 453qualifiers: 454.Bl -tag -width indent 455.It Li count= Ns Ar value 456Configure the counter to increment only if the number of configured 457events measured in a cycle is greater than or equal to 458.Ar value . 459.It Li edge 460Configure the counter to only count negated-to-asserted transitions 461of the conditions expressed by the other qualifiers. 462In other words, the counter will increment only once whenever a given 463condition becomes true, irrespective of the number of clocks during 464which the condition remains true. 465.It Li inv 466Invert the sense of comparision when the 467.Dq Li count 468qualifier is present, making the counter to increment when the 469number of events per cycle is less than the value specified by 470the 471.Dq Li count 472qualifier. 473.It Li os 474Configure the PMC to count events happening at privilege level 0. 475.It Li unitmask= Ns Ar mask 476This qualifier is used to further qualify a select few events, 477.Dq Li k7-dc-refills-from-l2 , 478.Dq Li k7-dc-refills-from-system 479and 480.Dq Li k7-dc-writebacks . 481Here 482.Ar mask 483is a string of the following characters optionally separated by 484.Ql + 485characters: 486.Pp 487.Bl -tag -width indent -compact 488.It Li m 489Count operations for lines in the 490.Dq Modified 491state. 492.It Li o 493Count operations for lines in the 494.Dq Owner 495state. 496.It Li e 497Count operations for lines in the 498.Dq Exclusive 499state. 500.It Li s 501Count operations for lines in the 502.Dq Shared 503state. 504.It Li i 505Count operations for lines in the 506.Dq Invalid 507state. 508.El 509.Pp 510If no 511.Dq Li unitmask 512qualifier is specified, the default is to count events for caches 513lines in any of the above states. 514.It Li usr 515Configure the PMC to count events occurring at privilege levels 1, 2 516or 3. 517.El 518.Pp 519If neither of the 520.Dq Li os 521or 522.Dq Li usr 523qualifiers were specified, the default is to enable both. 524.Pp 525The event specifiers supported on AMD K7 PMCs are: 526.Bl -tag -width indent 527.It Li k7-dc-accesses 528Count data cache accesses. 529.It Li k7-dc-misses 530Count data cache misses. 531.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask 532Count data cache refills from L2 cache. 533This event may be further qualified using the 534.Dq Li unitmask 535qualifier. 536.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask 537Count data cache refills from system memory. 538This event may be further qualified using the 539.Dq Li unitmask 540qualifier. 541.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask 542Count data cache writebacks. 543This event may be further qualified using the 544.Dq Li unitmask 545qualifier. 546.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits 547Count L1 DTLB misses and L2 DTLB hits. 548.It Li k7-l1-and-l2-dtlb-misses 549Count L1 and L2 DTLB misses. 550.It Li k7-misaligned-references 551Count misaligned data references. 552.It Li k7-ic-fetches 553Count instruction cache fetches. 554.It Li k7-ic-misses 555Count instruction cache misses. 556.It Li k7-l1-itlb-misses 557Count L1 ITLB misses that are L2 ITLB hits. 558.It Li k7-l1-l2-itlb-misses 559Count L1 (and L2) ITLB misses. 560.It Li k7-retired-instructions 561Count all retired instructions. 562.It Li k7-retired-ops 563Count retired ops. 564.It Li k7-retired-branches 565Count all retired branches (conditional, unconditional, exceptions 566and interrupts). 567.It Li k7-retired-branches-mispredicted 568Count all misprediced retired branches. 569.It Li k7-retired-taken-branches 570Count retired taken branches. 571.It Li k7-retired-taken-branches-mispredicted 572Count mispredicted taken branches that were retired. 573.It Li k7-retired-far-control-transfers 574Count retired far control transfers. 575.It Li k7-retired-resync-branches 576Count retired resync branches (non control transfer branches). 577.It Li k7-interrupts-masked-cycles 578Count the number of cycles when the processor's 579.Va IF 580flag was zero. 581.It Li k7-interrupts-masked-while-pending-cycles 582Count the number of cycles interrupts were masked while pending due 583to the processor's 584.Va IF 585flag being zero. 586.It Li k7-hardware-interrupts 587Count the number of taken hardware interrupts. 588.El 589.Ss AMD (K8) PMCs 590These PMCs are present in the 591.Tn "AMD Athlon64" 592and 593.Tn "AMD Opteron" 594series of CPUs. 595They are documented in: 596.Rs 597.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors" 598.%N "Publication No. 26094" 599.%D "April 2004" 600.%Q "Advanced Micro Devices, Inc." 601.Re 602.Pp 603Event specifiers for AMD K8 PMCs can have the following optional 604qualifiers: 605.Bl -tag -width indent 606.It Li count= Ns Ar value 607Configure the counter to increment only if the number of configured 608events measured in a cycle is greater than or equal to 609.Ar value . 610.It Li edge 611Configure the counter to only count negated-to-asserted transitions 612of the conditions expressed by the other fields. 613In other words, the counter will increment only once whenever a given 614condition becomes true, irrespective of the number of clocks during 615which the condition remains true. 616.It Li inv 617Invert the sense of comparision when the 618.Dq Li count 619qualifier is present, making the counter to increment when the 620number of events per cycle is less than the value specified by 621the 622.Dq Li count 623qualifier. 624.It Li mask= Ns Ar qualifier 625Many event specifiers for AMD K8 PMCs need to be additionally 626qualified using a mask qualifier. 627These additional qualifiers are event-specific and are documented 628along with their associated event specifiers below. 629.It Li os 630Configure the PMC to count events happening at privilege level 0. 631.It Li usr 632Configure the PMC to count events occurring at privilege levels 1, 2 633or 3. 634.El 635.Pp 636If neither of the 637.Dq Li os 638or 639.Dq Li usr 640qualifiers were specified, the default is to enable both. 641.Pp 642The event specifiers supported on AMD K8 PMCs are: 643.Bl -tag -width indent 644.It Li k8-bu-cpu-clk-unhalted 645Count the number of clock cycles when the CPU is not in the HLT or 646STPCLK states. 647.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier 648Count fill requests that missed in the L2 cache. 649This event may be further qualified using 650.Ar qualifier , 651which is a 652.Ql + 653separated set of the following keywords: 654.Pp 655.Bl -tag -width indent -compact 656.It Li dc-fill 657Count data cache fill requests. 658.It Li ic-fill 659Count instruction cache fill requests. 660.It Li tlb-reload 661Count TLB reloads. 662.El 663.Pp 664The default is to count all types of requests. 665.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier 666Count internally generated requests to the L2 cache. 667This event may be further qualified using 668.Ar qualifier , 669which is a 670.Ql + 671separated set of the following keywords: 672.Pp 673.Bl -tag -width indent -compact 674.It Li cancelled 675Count cancelled requests. 676.It Li dc-fill 677Count data cache fill requests. 678.It Li ic-fill 679Count instruction cache fill requests. 680.It Li tag-snoop 681Count tag snoop requests. 682.It Li tlb-reload 683Count TLB reloads. 684.El 685.Pp 686The default is to count all types of requests. 687.It Li k8-dc-access 688Count data cache accesses including microcode scratchpad accesses. 689.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier 690Count data cache copyback operations. 691This event may be further qualified using 692.Ar qualifier , 693which is a 694.Ql + 695separated set of the following keywords: 696.Pp 697.Bl -tag -width indent -compact 698.It Li exclusive 699Count operations for lines in the 700.Dq exclusive 701state. 702.It Li invalid 703Count operations for lines in the 704.Dq invalid 705state. 706.It Li modified 707Count operations for lines in the 708.Dq modified 709state. 710.It Li owner 711Count operations for lines in the 712.Dq owner 713state. 714.It Li shared 715Count operations for lines in the 716.Dq shared 717state. 718.El 719.Pp 720The default is to count operations for lines in all the 721above states. 722.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier 723Count data cache accesses by lock instructions. 724This event is only available on processors of revision C or later 725vintage. 726This event may be further qualified using 727.Ar qualifier , 728which is a 729.Ql + 730separated set of the following keywords: 731.Pp 732.Bl -tag -width indent -compact 733.It Li accesses 734Count data cache accesses by lock instructions. 735.It Li misses 736Count data cache misses by lock instructions. 737.El 738.Pp 739The default is to count all accesses. 740.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier 741Count the number of dispatched prefetch instructions. 742This event may be further qualified using 743.Ar qualifier , 744which is a 745.Ql + 746separated set of the following keywords: 747.Pp 748.Bl -tag -width indent -compact 749.It Li load 750Count load operations. 751.It Li nta 752Count non-temporal operations. 753.It Li store 754Count store operations. 755.El 756.Pp 757The default is to count all operations. 758.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit 759Count L1 DTLB misses that are L2 DTLB hits. 760.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss 761Count L1 DTLB misses that are also misses in the L2 DTLB. 762.It Li k8-dc-microarchitectural-early-cancel-of-an-access 763Count microarchitectural early cancels of data cache accesses. 764.It Li k8-dc-microarchitectural-late-cancel-of-an-access 765Count microarchitectural late cancels of data cache accesses. 766.It Li k8-dc-misaligned-data-reference 767Count misaligned data references. 768.It Li k8-dc-miss 769Count data cache misses. 770.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier 771Count one bit ECC errors found by the scrubber. 772This event may be further qualified using 773.Ar qualifier , 774which is a 775.Ql + 776separated set of the following keywords: 777.Pp 778.Bl -tag -width indent -compact 779.It Li scrubber 780Count scrubber detected errors. 781.It Li piggyback 782Count piggyback scrubber errors. 783.El 784.Pp 785The default is to count both kinds of errors. 786.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier 787Count data cache refills from L2 cache. 788This event may be further qualified using 789.Ar qualifier , 790which is a 791.Ql + 792separated set of the following keywords: 793.Pp 794.Bl -tag -width indent -compact 795.It Li exclusive 796Count operations for lines in the 797.Dq exclusive 798state. 799.It Li invalid 800Count operations for lines in the 801.Dq invalid 802state. 803.It Li modified 804Count operations for lines in the 805.Dq modified 806state. 807.It Li owner 808Count operations for lines in the 809.Dq owner 810state. 811.It Li shared 812Count operations for lines in the 813.Dq shared 814state. 815.El 816.Pp 817The default is to count operations for lines in all the 818above states. 819.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier 820Count data cache refills from system memory. 821This event may be further qualified using 822.Ar qualifier , 823which is a 824.Ql + 825separated set of the following keywords: 826.Pp 827.Bl -tag -width indent -compact 828.It Li exclusive 829Count operations for lines in the 830.Dq exclusive 831state. 832.It Li invalid 833Count operations for lines in the 834.Dq invalid 835state. 836.It Li modified 837Count operations for lines in the 838.Dq modified 839state. 840.It Li owner 841Count operations for lines in the 842.Dq owner 843state. 844.It Li shared 845Count operations for lines in the 846.Dq shared 847state. 848.El 849.Pp 850The default is to count operations for lines in all the 851above states. 852.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier 853Count the number of dispatched FPU ops. 854This event is supported in revision B and later CPUs. 855This event may be further qualified using 856.Ar qualifier , 857which is a 858.Ql + 859separated set of the following keywords: 860.Pp 861.Bl -tag -width indent -compact 862.It Li add-pipe-excluding-junk-ops 863Count add pipe ops excluding junk ops. 864.It Li add-pipe-junk-ops 865Count junk ops in the add pipe. 866.It Li multiply-pipe-excluding-junk-ops 867Count multiply pipe ops excluding junk ops. 868.It Li multiply-pipe-junk-ops 869Count junk ops in the multiply pipe. 870.It Li store-pipe-excluding-junk-ops 871Count store pipe ops excluding junk ops 872.It Li store-pipe-junk-ops 873Count junk ops in the store pipe. 874.El 875.Pp 876The default is to count all types of ops. 877.It Li k8-fp-cycles-with-no-fpu-ops-retired 878Count cycles when no FPU ops were retired. 879This event is supported in revision B and later CPUs. 880.It Li k8-fp-dispatched-fpu-fast-flag-ops 881Count dispatched FPU ops that use the fast flag interface. 882This event is supported in revision B and later CPUs. 883.It Li k8-fr-decoder-empty 884Count cycles when there was nothing to dispatch (i.e., the decoder 885was empty). 886.It Li k8-fr-dispatch-stalls 887Count all dispatch stalls. 888.It Li k8-fr-dispatch-stall-for-segment-load 889Count dispatch stalls for segment loads. 890.It Li k8-fr-dispatch-stall-for-serialization 891Count dispatch stalls for serialization. 892.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire 893Count dispatch stalls from branch abort to retiral. 894.It Li k8-fr-dispatch-stall-when-fpu-is-full 895Count dispatch stalls when the FPU is full. 896.It Li k8-fr-dispatch-stall-when-ls-is-full 897Count dispatch stalls when the load/store unit is full. 898.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full 899Count dispatch stalls when the reorder buffer is full. 900.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full 901Count dispatch stalls when reservation stations are full. 902.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet 903Count dispatch stalls when waiting for all to be quiet. 904.\" XXX What does "waiting for all to be quiet" mean? 905.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending 906Count dispatch stalls when a far control transfer or a resync branch 907is pending. 908.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier 909Count FPU exceptions. 910This event is supported in revision B and later CPUs. 911This event may be further qualified using 912.Ar qualifier , 913which is a 914.Ql + 915separated set of the following keywords: 916.Pp 917.Bl -tag -width indent -compact 918.It Li sse-and-x87-microtraps 919Count SSE and x87 microtraps. 920.It Li sse-reclass-microfaults 921Count SSE reclass microfaults 922.It Li sse-retype-microfaults 923Count SSE retype microfaults 924.It Li x87-reclass-microfaults 925Count x87 reclass microfaults. 926.El 927.Pp 928The default is to count all types of exceptions. 929.It Li k8-fr-interrupts-masked-cycles 930Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero). 931.It Li k8-fr-interrupts-masked-while-pending-cycles 932Count cycles while interrupts were masked while pending (i.e., cycles 933when INTR was asserted while CPU RFLAGS field IF was zero). 934.It Li k8-fr-number-of-breakpoints-for-dr0 935Count the number of breakpoints for DR0. 936.It Li k8-fr-number-of-breakpoints-for-dr1 937Count the number of breakpoints for DR1. 938.It Li k8-fr-number-of-breakpoints-for-dr2 939Count the number of breakpoints for DR2. 940.It Li k8-fr-number-of-breakpoints-for-dr3 941Count the number of breakpoints for DR3. 942.It Li k8-fr-retired-branches 943Count retired branches including exceptions and interrupts. 944.It Li k8-fr-retired-branches-mispredicted 945Count mispredicted retired branches. 946.It Li k8-fr-retired-far-control-transfers 947Count retired far control transfers (which are always mispredicted). 948.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier 949Count retired fastpath double op instructions. 950This event is supported in revision B and later CPUs. 951This event may be further qualified using 952.Ar qualifier , 953which is a 954.Ql + 955separated set of the following keywords: 956.Pp 957.Bl -tag -width indent -compact 958.It Li low-op-pos-0 959Count instructions with the low op in position 0. 960.It Li low-op-pos-1 961Count instructions with the low op in position 1. 962.It Li low-op-pos-2 963Count instructions with the low op in position 2. 964.El 965.Pp 966The default is to count all types of instructions. 967.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier 968Count retired FPU instructions. 969This event is supported in revision B and later CPUs. 970This event may be further qualified using 971.Ar qualifier , 972which is a 973.Ql + 974separated set of the following keywords: 975.Pp 976.Bl -tag -width indent -compact 977.It Li mmx-3dnow 978Count MMX and 3DNow!\& instructions. 979.It Li packed-sse-sse2 980Count packed SSE and SSE2 instructions. 981.It Li scalar-sse-sse2 982Count scalar SSE and SSE2 instructions 983.It Li x87 984Count x87 instructions. 985.El 986.Pp 987The default is to count all types of instructions. 988.It Li k8-fr-retired-near-returns 989Count retired near returns. 990.It Li k8-fr-retired-near-returns-mispredicted 991Count mispredicted near returns. 992.It Li k8-fr-retired-resyncs 993Count retired resyncs (non-control transfer branches). 994.It Li k8-fr-retired-taken-hardware-interrupts 995Count retired taken hardware interrupts. 996.It Li k8-fr-retired-taken-branches 997Count retired taken branches. 998.It Li k8-fr-retired-taken-branches-mispredicted 999Count retired taken branches that were mispredicted. 1000.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare 1001Count retired taken branches that were mispredicted only due to an 1002address miscompare. 1003.It Li k8-fr-retired-uops 1004Count retired uops. 1005.It Li k8-fr-retired-x86-instructions 1006Count retired x86 instructions including exceptions and interrupts. 1007.It Li k8-ic-fetch 1008Count instruction cache fetches. 1009.It Li k8-ic-instruction-fetch-stall 1010Count cycles in stalls due to instruction fetch. 1011.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit 1012Count L1 ITLB misses that are L2 ITLB hits. 1013.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss 1014Count ITLB misses that miss in both L1 and L2 ITLBs. 1015.It Li k8-ic-microarchitectural-resync-by-snoop 1016Count microarchitectural resyncs caused by snoops. 1017.It Li k8-ic-miss 1018Count instruction cache misses. 1019.It Li k8-ic-refill-from-l2 1020Count instruction cache refills from L2 cache. 1021.It Li k8-ic-refill-from-system 1022Count instruction cache refills from system memory. 1023.It Li k8-ic-return-stack-hits 1024Count hits to the return stack. 1025.It Li k8-ic-return-stack-overflow 1026Count overflows of the return stack. 1027.It Li k8-ls-buffer2-full 1028Count load/store buffer2 full events. 1029.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier 1030Count locked operations. 1031For revision C and later CPUs, the following qualifiers are supported: 1032.Pp 1033.Bl -tag -width indent -compact 1034.It Li cycles-in-request 1035Count the number of cycles in the lock request/grant stage. 1036.It Li cycles-to-complete 1037Count the number of cycles a lock takes to complete once it is 1038non-speculative and is the older load/store operation. 1039.It Li locked-instructions 1040Count the number of lock instructions executed. 1041.El 1042.Pp 1043The default is to count the number of lock instructions executed. 1044.It Li k8-ls-microarchitectural-late-cancel 1045Count microarchitectural late cancels of operations in the load/store 1046unit. 1047.It Li k8-ls-microarchitectural-resync-by-self-modifying-code 1048Count microarchitectural resyncs caused by self-modifying code. 1049.It Li k8-ls-microarchitectural-resync-by-snoop 1050Count microarchitectural resyncs caused by snoops. 1051.It Li k8-ls-retired-cflush-instructions 1052Count retired CFLUSH instructions. 1053.It Li k8-ls-retired-cpuid-instructions 1054Count retired CPUID instructions. 1055.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier 1056Count segment register loads. 1057This event may be further qualified using 1058.Ar qualifier , 1059which is a 1060.Ql + 1061separated set of the following keywords: 1062.Bl -tag -width indent -compact 1063.It Li cs 1064Count CS register loads. 1065.It Li ds 1066Count DS register loads. 1067.It Li es 1068Count ES register loads. 1069.It Li fs 1070Count FS register loads. 1071.It Li gs 1072Count GS register loads. 1073.\" .It Li hs 1074.\" Count HS register loads. 1075.\" XXX "HS" register? 1076.It Li ss 1077Count SS register loads. 1078.El 1079.Pp 1080The default is to count all types of loads. 1081.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier 1082Count memory controller bypass counter saturation events. 1083This event may be further qualified using 1084.Ar qualifier , 1085which is a 1086.Ql + 1087separated set of the following keywords: 1088.Pp 1089.Bl -tag -width indent -compact 1090.It Li dram-controller-interface-bypass 1091Count DRAM controller interface bypass. 1092.It Li dram-controller-queue-bypass 1093Count DRAM controller queue bypass. 1094.It Li memory-controller-hi-pri-bypass 1095Count memory controller high priority bypasses. 1096.It Li memory-controller-lo-pri-bypass 1097Count memory controller low priority bypasses. 1098.El 1099.Pp 1100.It Li k8-nb-memory-controller-dram-slots-missed 1101Count memory controller DRAM command slots missed (in MemClks). 1102.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier 1103Count memory controller page access events. 1104This event may be further qualified using 1105.Ar qualifier , 1106which is a 1107.Ql + 1108separated set of the following keywords: 1109.Pp 1110.Bl -tag -width indent -compact 1111.It Li page-conflict 1112Count page conflicts. 1113.It Li page-hit 1114Count page hits. 1115.It Li page-miss 1116Count page misses. 1117.El 1118.Pp 1119The default is to count all types of events. 1120.It Li k8-nb-memory-controller-page-table-overflow 1121Count memory control page table overflow events. 1122.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier 1123Count probe events. 1124This event may be further qualified using 1125.Ar qualifier , 1126which is a 1127.Ql + 1128separated set of the following keywords: 1129.Pp 1130.Bl -tag -width indent -compact 1131.It Li probe-hit 1132Count all probe hits. 1133.It Li probe-hit-dirty-no-memory-cancel 1134Count probe hits without memory cancels. 1135.It Li probe-hit-dirty-with-memory-cancel 1136Count probe hits with memory cancels. 1137.It Li probe-miss 1138Count probe misses. 1139.El 1140.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier 1141Count sized commands issued. 1142This event may be further qualified using 1143.Ar qualifier , 1144which is a 1145.Ql + 1146separated set of the following keywords: 1147.Pp 1148.Bl -tag -width indent -compact 1149.It Li nonpostwrszbyte 1150.It Li nonpostwrszdword 1151.It Li postwrszbyte 1152.It Li postwrszdword 1153.It Li rdszbyte 1154.It Li rdszdword 1155.It Li rdmodwr 1156.El 1157.Pp 1158The default is to count all types of commands. 1159.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier 1160Count memory control turnaround events. 1161This event may be further qualified using 1162.Ar qualifier , 1163which is a 1164.Ql + 1165separated set of the following keywords: 1166.Pp 1167.Bl -tag -width indent -compact 1168.\" XXX doc is unclear whether these are cycle counts or event counts 1169.It Li dimm-turnaround 1170Count DIMM turnarounds. 1171.It Li read-to-write-turnaround 1172Count read to write turnarounds. 1173.It Li write-to-read-turnaround 1174Count write to read turnarounds. 1175.El 1176.Pp 1177The default is to count all types of events. 1178.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier 1179.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier 1180.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier 1181Count events on the HyperTransport(tm) buses. 1182These events may be further qualified using 1183.Ar qualifier , 1184which is a 1185.Ql + 1186separated set of the following keywords: 1187.Pp 1188.Bl -tag -width indent -compact 1189.It Li buffer-release 1190Count buffer release messages sent. 1191.It Li command 1192Count command messages sent. 1193.It Li data 1194Count data messages sent. 1195.It Li nop 1196Count nop messages sent. 1197.El 1198.Pp 1199The default is to count all types of messages. 1200.El 1201.Ss Intel P6 PMCS 1202Intel P6 PMCs are present in Intel 1203.Tn "Pentium Pro" , 1204.Tn "Pentium II" , 1205.Tn Celeron , 1206.Tn "Pentium III" 1207and 1208.Tn "Pentium M" 1209processors. 1210.Pp 1211These CPUs have two counters. 1212Some events may only be used on specific counters and some events are 1213defined only on specific processor models. 1214.Pp 1215These PMCs are documented in 1216.Rs 1217.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 1218.%T "Volume 3: System Programming Guide" 1219.%N "Order Number 245472-012" 1220.%D 2003 1221.%Q "Intel Corporation" 1222.Re 1223.Pp 1224Some of these events are affected by processor errata described in 1225.Rs 1226.%B "Intel(R) Pentium(R) III Processor Specification Update" 1227.%N "Document Number: 244453-054" 1228.%D "April 2005" 1229.%Q "Intel Corporation" 1230.Re 1231.Pp 1232Event specifiers for Intel P6 PMCs can have the following common 1233qualifiers: 1234.Bl -tag -width indent 1235.It Li cmask= Ns Ar value 1236Configure the PMC to increment only if the number of configured 1237events measured in a cycle is greater than or equal to 1238.Ar value . 1239.It Li edge 1240Configure the PMC to count the number of deasserted to asserted 1241transitions of the conditions expressed by the other qualifiers. 1242If specified, the counter will increment only once whenever a 1243condition becomes true, irrespective of the number of clocks during 1244which the condition remains true. 1245.It Li inv 1246Invert the sense of comparision when the 1247.Dq Li cmask 1248qualifier is present, making the counter increment when the number of 1249events per cycle is less than the value specified by the 1250.Dq Li cmask 1251qualifier. 1252.It Li os 1253Configure the PMC to count events happening at processor privilege 1254level 0. 1255.It Li umask= Ns Ar value 1256This qualifier is used to further qualify the event selected (see 1257below). 1258.It Li usr 1259Configure the PMC to count events occurring at privilege levels 1, 2 1260or 3. 1261.El 1262.Pp 1263If neither of the 1264.Dq Li os 1265or 1266.Dq Li usr 1267qualifiers are specified, the default is to enable both. 1268.Pp 1269The event specifiers supported by Intel P6 PMCs are: 1270.Bl -tag -width indent 1271.It Li p6-baclears 1272Count the number of times a static branch prediction was made by the 1273branch decoder because the BTB did not have a prediction. 1274.It Li p6-br-bac-missp-exec 1275.Pq Tn "Pentium M" 1276Count the number of branch instructions executed that where 1277mispredicted at the Front End (BAC). 1278.It Li p6-br-bogus 1279Count the number of bogus branches. 1280.It Li p6-br-call-exec 1281.Pq Tn "Pentium M" 1282Count the number of call instructions executed. 1283.It Li p6-br-call-missp-exec 1284.Pq Tn "Pentium M" 1285Count the number of call instructions executed that were mispredicted. 1286.It Li p6-br-cnd-exec 1287.Pq Tn "Pentium M" 1288Count the number of conditional branch instructions executed. 1289.It Li p6-br-cnd-missp-exec 1290.Pq Tn "Pentium M" 1291Count the number of conditional branch instructions executed that were 1292mispredicted. 1293.It Li p6-br-ind-call-exec 1294.Pq Tn "Pentium M" 1295Count the number of indirect call instructions executed. 1296.It Li p6-br-ind-exec 1297.Pq Tn "Pentium M" 1298Count the number of indirect branch instructions executed. 1299.It Li p6-br-ind-missp-exec 1300.Pq Tn "Pentium M" 1301Count the number of indirect branch instructions executed that were 1302mispredicted. 1303.It Li p6-br-inst-decoded 1304Count the number of branch instructions decoded. 1305.It Li p6-br-inst-exec 1306.Pq Tn "Pentium M" 1307Count the number of branch instructions executed but necessarily retired. 1308.It Li p6-br-inst-retired 1309Count the number of branch instructions retired. 1310.It Li p6-br-miss-pred-retired 1311Count the number of mispredicted branch instructions retired. 1312.It Li p6-br-miss-pred-taken-ret 1313Count the number of taken mispredicted branches retired. 1314.It Li p6-br-missp-exec 1315.Pq Tn "Pentium M" 1316Count the number of branch instructions executed that were 1317mispredicted at execution. 1318.It Li p6-br-ret-bac-missp-exec 1319.Pq Tn "Pentium M" 1320Count the number of return instructions executed that were 1321mispredicted at the Front End (BAC). 1322.It Li p6-br-ret-exec 1323.Pq Tn "Pentium M" 1324Count the number of return instructions executed. 1325.It Li p6-br-ret-missp-exec 1326.Pq Tn "Pentium M" 1327Count the number of return instructions executed that were 1328mispredicted at execution. 1329.It Li p6-br-taken-retired 1330Count the number of taken branches retired. 1331.It Li p6-btb-misses 1332Count the number of branches for which the BTB did not produce a 1333prediction. 1334.It Li p6-bus-bnr-drv 1335Count the number of bus clock cycles during which this processor is 1336driving the BNR# pin. 1337.It Li p6-bus-data-rcv 1338Count the number of bus clock cycles during which this processor is 1339receiving data. 1340.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier 1341Count the number of clocks during which DRDY# is asserted. 1342An additional qualifier may be specified, and comprises one of the 1343following keywords: 1344.Pp 1345.Bl -tag -width indent -compact 1346.It Li any 1347Count transactions generated by any agent on the bus. 1348.It Li self 1349Count transactions generated by this processor. 1350.El 1351.Pp 1352The default is to count operations generated by this processor. 1353.It Li p6-bus-hit-drv 1354Count the number of bus clock cycles during which this processor is 1355driving the HIT# pin. 1356.It Li p6-bus-hitm-drv 1357Count the number of bus clock cycles during which this processor is 1358driving the HITM# pin. 1359.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier 1360Count the number of clocks during with LOCK# is asserted on the 1361external system bus. 1362An additional qualifier may be specified and comprises one of the following 1363keywords: 1364.Pp 1365.Bl -tag -width indent -compact 1366.It Li any 1367Count transactions generated by any agent on the bus. 1368.It Li self 1369Count transactions generated by this processor. 1370.El 1371.Pp 1372The default is to count operations generated by this processor. 1373.It Li p6-bus-req-outstanding 1374Count the number of bus requests outstanding in any given cycle. 1375.It Li p6-bus-snoop-stall 1376Count the number of clock cycles during which the bus is snoop stalled. 1377.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier 1378Count the number of completed bus transactions of any kind. 1379An additional qualifier may be specified and comprises one of the following 1380keywords: 1381.Pp 1382.Bl -tag -width indent -compact 1383.It Li any 1384Count transactions generated by any agent on the bus. 1385.It Li self 1386Count transactions generated by this processor. 1387.El 1388.Pp 1389The default is to count operations generated by this processor. 1390.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier 1391Count the number of burst read transactions. 1392An additional qualifier may be specified and comprises one of the following 1393keywords: 1394.Pp 1395.Bl -tag -width indent -compact 1396.It Li any 1397Count transactions generated by any agent on the bus. 1398.It Li self 1399Count transactions generated by this processor. 1400.El 1401.Pp 1402The default is to count operations generated by this processor. 1403.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier 1404Count the number of completed burst transactions. 1405An additional qualifier may be specified and comprises one of the following 1406keywords: 1407.Pp 1408.Bl -tag -width indent -compact 1409.It Li any 1410Count transactions generated by any agent on the bus. 1411.It Li self 1412Count transactions generated by this processor. 1413.El 1414.Pp 1415The default is to count operations generated by this processor. 1416.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier 1417Count the number of completed deferred transactions. 1418An additional qualifier may be specified and comprises one of the following 1419keywords: 1420.Pp 1421.Bl -tag -width indent -compact 1422.It Li any 1423Count transactions generated by any agent on the bus. 1424.It Li self 1425Count transactions generated by this processor. 1426.El 1427.Pp 1428The default is to count operations generated by this processor. 1429.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier 1430Count the number of completed instruction fetch transactions. 1431An additional qualifier may be specified and comprises one of the following 1432keywords: 1433.Pp 1434.Bl -tag -width indent -compact 1435.It Li any 1436Count transactions generated by any agent on the bus. 1437.It Li self 1438Count transactions generated by this processor. 1439.El 1440.Pp 1441The default is to count operations generated by this processor. 1442.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier 1443Count the number of completed invalidate transactions. 1444An additional qualifier may be specified and comprises one of the following 1445keywords: 1446.Pp 1447.Bl -tag -width indent -compact 1448.It Li any 1449Count transactions generated by any agent on the bus. 1450.It Li self 1451Count transactions generated by this processor. 1452.El 1453.Pp 1454The default is to count operations generated by this processor. 1455.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier 1456Count the number of completed memory transactions. 1457An additional qualifier may be specified and comprises one of the following 1458keywords: 1459.Pp 1460.Bl -tag -width indent -compact 1461.It Li any 1462Count transactions generated by any agent on the bus. 1463.It Li self 1464Count transactions generated by this processor. 1465.El 1466.Pp 1467The default is to count operations generated by this processor. 1468.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier 1469Count the number of completed partial write transactions. 1470An additional qualifier may be specified and comprises one of the following 1471keywords: 1472.Pp 1473.Bl -tag -width indent -compact 1474.It Li any 1475Count transactions generated by any agent on the bus. 1476.It Li self 1477Count transactions generated by this processor. 1478.El 1479.Pp 1480The default is to count operations generated by this processor. 1481.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier 1482Count the number of completed read-for-ownership transactions. 1483An additional qualifier may be specified and comprises one of the following 1484keywords: 1485.Pp 1486.Bl -tag -width indent -compact 1487.It Li any 1488Count transactions generated by any agent on the bus. 1489.It Li self 1490Count transactions generated by this processor. 1491.El 1492.Pp 1493The default is to count operations generated by this processor. 1494.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier 1495Count the number of completed I/O transactions. 1496An additional qualifier may be specified and comprises one of the following 1497keywords: 1498.Pp 1499.Bl -tag -width indent -compact 1500.It Li any 1501Count transactions generated by any agent on the bus. 1502.It Li self 1503Count transactions generated by this processor. 1504.El 1505.Pp 1506The default is to count operations generated by this processor. 1507.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier 1508Count the number of completed partial transactions. 1509An additional qualifier may be specified and comprises one of the following 1510keywords: 1511.Pp 1512.Bl -tag -width indent -compact 1513.It Li any 1514Count transactions generated by any agent on the bus. 1515.It Li self 1516Count transactions generated by this processor. 1517.El 1518.Pp 1519The default is to count operations generated by this processor. 1520.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier 1521Count the number of completed write-back transactions. 1522An additional qualifier may be specified and comprises one of the following 1523keywords: 1524.Pp 1525.Bl -tag -width indent -compact 1526.It Li any 1527Count transactions generated by any agent on the bus. 1528.It Li self 1529Count transactions generated by this processor. 1530.El 1531.Pp 1532The default is to count operations generated by this processor. 1533.It Li p6-cpu-clk-unhalted 1534Count the number of cycles during with the processor was not halted. 1535.Pp 1536.Pq Tn "Pentium M" 1537Count the number of cycles during with the processor was not halted 1538and not in a thermal trip. 1539.It Li p6-cycles-div-busy 1540Count the number of cycles during which the divider is busy and cannot 1541accept new divides. 1542This event is only allocated on counter 0. 1543.It Li p6-cycles-in-pending-and-masked 1544Count the number of processor cycles for which interrupts were 1545disabled and interrupts were pending. 1546.It Li p6-cycles-int-masked 1547Count the number of processor cycles for which interrupts were 1548disabled. 1549.It Li p6-data-mem-refs 1550Count all loads and all stores using any memory type, including 1551internal retries. 1552Each part of a split store is counted separately. 1553.It Li p6-dcu-lines-in 1554Count the total lines allocated in the data cache unit. 1555.It Li p6-dcu-m-lines-in 1556Count the number of M state lines allocated in the data cache unit. 1557.It Li p6-dcu-m-lines-out 1558Count the number of M state lines evicted from the data cache unit. 1559.It Li p6-dcu-miss-outstanding 1560Count the weighted number of cycles while a data cache unit miss is 1561outstanding, incremented by the number of outstanding cache misses at 1562any time. 1563.It Li p6-div 1564Count the number of integer and floating-point divides including 1565speculative divides. 1566This event is only allocated on counter 1. 1567.It Li p6-emon-esp-uops 1568.Pq Tn "Pentium M" 1569Count the total number of micro-ops. 1570.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier 1571.Pq Tn "Pentium M" 1572Count the number of 1573.Tn "Enhanced Intel SpeedStep" 1574transitions. 1575An additional qualifier may be specified, and can be one of the 1576following keywords: 1577.Pp 1578.Bl -tag -width indent -compact 1579.It Li all 1580Count all transitions. 1581.It Li freq 1582Count only frequency transitions. 1583.El 1584.Pp 1585The default is to count all transitions. 1586.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier 1587.Pq Tn "Pentium M" 1588Count the number of retired fused micro-ops. 1589An additional qualifier may be specified, and may be one of the 1590following keywords: 1591.Pp 1592.Bl -tag -width indent -compact 1593.It Li all 1594Count all fused micro-ops. 1595.It Li loadop 1596Count only load and op micro-ops. 1597.It Li stdsta 1598Count only STD/STA micro-ops. 1599.El 1600.Pp 1601The default is to count all fused micro-ops. 1602.It Li p6-emon-kni-comp-inst-ret 1603.Pq Tn "Pentium III" 1604Count the number of SSE computational instructions retired. 1605An additional qualifier may be specified, and comprises one of the 1606following keywords: 1607.Pp 1608.Bl -tag -width indent -compact 1609.It Li packed-and-scalar 1610Count packed and scalar operations. 1611.It Li scalar 1612Count scalar operations only. 1613.El 1614.Pp 1615The default is to count packed and scalar operations. 1616.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier 1617.Pq Tn "Pentium III" 1618Count the number of SSE instructions retired. 1619An additional qualifier may be specified, and comprises one of the 1620following keywords: 1621.Pp 1622.Bl -tag -width indent -compact 1623.It Li packed-and-scalar 1624Count packed and scalar operations. 1625.It Li scalar 1626Count scalar operations only. 1627.El 1628.Pp 1629The default is to count packed and scalar operations. 1630.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier 1631.Pq Tn "Pentium III" 1632Count the number of SSE prefetch or weakly ordered instructions 1633dispatched (including speculative prefetches). 1634An additional qualifier may be specified, and comprises one of the 1635following keywords: 1636.Pp 1637.Bl -tag -width indent -compact 1638.It Li nta 1639Count non-temporal prefetches. 1640.It Li t1 1641Count prefetches to L1. 1642.It Li t2 1643Count prefetches to L2. 1644.It Li wos 1645Count weakly ordered stores. 1646.El 1647.Pp 1648The default is to count non-temporal prefetches. 1649.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier 1650.Pq Tn "Pentium III" 1651Count the number of prefetch or weakly ordered instructions that miss 1652all caches. 1653An additional qualifier may be specified, and comprises one of the 1654following keywords: 1655.Pp 1656.Bl -tag -width indent -compact 1657.It Li nta 1658Count non-temporal prefetches. 1659.It Li t1 1660Count prefetches to L1. 1661.It Li t2 1662Count prefetches to L2. 1663.It Li wos 1664Count weakly ordered stores. 1665.El 1666.Pp 1667The default is to count non-temporal prefetches. 1668.It Li p6-emon-pref-rqsts-dn 1669.Pq Tn "Pentium M" 1670Count the number of downward prefetches issued. 1671.It Li p6-emon-pref-rqsts-up 1672.Pq Tn "Pentium M" 1673Count the number of upward prefetches issued. 1674.It Li p6-emon-simd-instr-retired 1675.Pq Tn "Pentium M" 1676Count the number of retired 1677.Tn MMX 1678instructions. 1679.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier 1680.Pq Tn "Pentium M" 1681Count the number of computational SSE instructions retired. 1682An additional qualifier may be specified and can be one of the 1683following keywords: 1684.Pp 1685.Bl -tag -width indent -compact 1686.It Li sse-packed-single 1687Count SSE packed-single instructions. 1688.It Li sse-scalar-single 1689Count SSE scalar-single instructions. 1690.It Li sse2-packed-double 1691Count SSE2 packed-double instructions. 1692.It Li sse2-scalar-double 1693Count SSE2 scalar-double instructions. 1694.El 1695.Pp 1696The default is to count SSE packed-single instructions. 1697.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer 1698.Pp 1699.Pq Tn "Pentium M" 1700Count the number of SSE instructions retired. 1701An additional qualifier can be specified, and can be one of the 1702following keywords: 1703.Pp 1704.Bl -tag -width indent -compact 1705.It Li sse-packed-single 1706Count SSE packed-single instructions. 1707.It Li sse-packed-single-scalar-single 1708Count SSE packed-single and scalar-single instructions. 1709.It Li sse2-packed-double 1710Count SSE2 packed-double instructions. 1711.It Li sse2-scalar-double 1712Count SSE2 scalar-double instructions. 1713.El 1714.Pp 1715The default is to count SSE packed-single instructions. 1716.It Li p6-emon-synch-uops 1717.Pq Tn "Pentium M" 1718Count the number of sync micro-ops. 1719.It Li p6-emon-thermal-trip 1720.Pq Tn "Pentium M" 1721Count the duration or occurrences of thermal trips. 1722Use the 1723.Dq Li edge 1724qualifier to count occurrences of thermal trips. 1725.It Li p6-emon-unfusion 1726.Pq Tn "Pentium M" 1727Count the number of unfusion events in the reorder buffer. 1728.It Li p6-flops 1729Count the number of computational floating point operations retired. 1730This event is only allocated on counter 0. 1731.It Li p6-fp-assist 1732Count the number of floating point exceptions handled by microcode. 1733This event is only allocated on counter 1. 1734.It Li p6-fp-comps-ops-exe 1735Count the number of computation floating point operations executed. 1736This event is only allocated on counter 0. 1737.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier 1738.Pq Tn "Pentium II" , Tn "Pentium III" 1739Count the number of transitions between MMX and floating-point 1740instructions. 1741An additional qualifier may be specified, and comprises one of the 1742following keywords: 1743.Pp 1744.Bl -tag -width indent -compact 1745.It Li mmxtofp 1746Count transitions from MMX instructions to floating-point instructions. 1747.It Li fptommx 1748Count transitions from floating-point instructions to MMX instructions. 1749.El 1750.Pp 1751The default is to count MMX to floating-point transitions. 1752.It Li p6-hw-int-rx 1753Count the number of hardware interrupts received. 1754.It Li p6-ifu-fetch 1755Count the number of instruction fetches, both cacheable and non-cacheable. 1756.It Li p6-ifu-fetch-miss 1757Count the number of instruction fetch misses (i.e., those that produce 1758memory accesses). 1759.It Li p6-ifu-mem-stall 1760Count the number of cycles instruction fetch is stalled for any reason. 1761.It Li p6-ild-stall 1762Count the number of cycles the instruction length decoder is stalled. 1763.It Li p6-inst-decoded 1764Count the number of instructions decoded. 1765.It Li p6-inst-retired 1766Count the number of instructions retired. 1767.It Li p6-itlb-miss 1768Count the number of instruction TLB misses. 1769.It Li p6-l2-ads 1770Count the number of L2 address strobes. 1771.It Li p6-l2-dbus-busy 1772Count the number of cycles during which the L2 cache data bus was busy. 1773.It Li p6-l2-dbus-busy-rd 1774Count the number of cycles during which the L2 cache data bus was busy 1775transferring read data from L2 to the processor. 1776.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier 1777Count the number of L2 instruction fetches. 1778An additional qualifier may be specified and comprises a list of the following 1779keywords separated by 1780.Ql + 1781characters: 1782.Pp 1783.Bl -tag -width indent -compact 1784.It Li e 1785Count operations affecting E (exclusive) state lines. 1786.It Li i 1787Count operations affecting I (invalid) state lines. 1788.It Li m 1789Count operations affecting M (modified) state lines. 1790.It Li s 1791Count operations affecting S (shared) state lines. 1792.El 1793.Pp 1794The default is to count operations affecting all (MESI) state lines. 1795.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier 1796Count the number of L2 data loads. 1797An additional qualifier may be specified and comprises a list of the following 1798keywords separated by 1799.Ql + 1800characters: 1801.Pp 1802.Bl -tag -width indent -compact 1803.It Li both 1804.Pq Tn "Pentium M" 1805Count both hardware-prefetched lines and non-hardware-prefetched lines. 1806.It Li e 1807Count operations affecting E (exclusive) state lines. 1808.It Li hw 1809.Pq Tn "Pentium M" 1810Count hardware-prefetched lines only. 1811.It Li i 1812Count operations affecting I (invalid) state lines. 1813.It Li m 1814Count operations affecting M (modified) state lines. 1815.It Li nonhw 1816.Pq Tn "Pentium M" 1817Exclude hardware-prefetched lines. 1818.It Li s 1819Count operations affecting S (shared) state lines. 1820.El 1821.Pp 1822The default on processors other than 1823.Tn "Pentium M" 1824processors is to count operations affecting all (MESI) state lines. 1825The default on 1826.Tn "Pentium M" 1827processors is to count both hardware-prefetched and 1828non-hardware-prefetch operations on all (MESI) state lines. 1829.Pq Errata 1830This event is affected by processor errata E53. 1831.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier 1832Count the number of L2 lines allocated. 1833An additional qualifier may be specified and comprises a list of the following 1834keywords separated by 1835.Ql + 1836characters: 1837.Pp 1838.Bl -tag -width indent -compact 1839.It Li both 1840.Pq Tn "Pentium M" 1841Count both hardware-prefetched lines and non-hardware-prefetched lines. 1842.It Li e 1843Count operations affecting E (exclusive) state lines. 1844.It Li hw 1845.Pq Tn "Pentium M" 1846Count hardware-prefetched lines only. 1847.It Li i 1848Count operations affecting I (invalid) state lines. 1849.It Li m 1850Count operations affecting M (modified) state lines. 1851.It Li nonhw 1852.Pq Tn "Pentium M" 1853Exclude hardware-prefetched lines. 1854.It Li s 1855Count operations affecting S (shared) state lines. 1856.El 1857.Pp 1858The default on processors other than 1859.Tn "Pentium M" 1860processors is to count operations affecting all (MESI) state lines. 1861The default on 1862.Tn "Pentium M" 1863processors is to count both hardware-prefetched and 1864non-hardware-prefetch operations on all (MESI) state lines. 1865.Pq Errata 1866This event is affected by processor errata E45. 1867.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier 1868Count the number of L2 lines evicted. 1869An additional qualifier may be specified and comprises a list of the following 1870keywords separated by 1871.Ql + 1872characters: 1873.Pp 1874.Bl -tag -width indent -compact 1875.It Li both 1876.Pq Tn "Pentium M" 1877Count both hardware-prefetched lines and non-hardware-prefetched lines. 1878.It Li e 1879Count operations affecting E (exclusive) state lines. 1880.It Li hw 1881.Pq Tn "Pentium M" 1882Count hardware-prefetched lines only. 1883.It Li i 1884Count operations affecting I (invalid) state lines. 1885.It Li m 1886Count operations affecting M (modified) state lines. 1887.It Li nonhw 1888.Pq Tn "Pentium M" only 1889Exclude hardware-prefetched lines. 1890.It Li s 1891Count operations affecting S (shared) state lines. 1892.El 1893.Pp 1894The default on processors other than 1895.Tn "Pentium M" 1896processors is to count operations affecting all (MESI) state lines. 1897The default on 1898.Tn "Pentium M" 1899processors is to count both hardware-prefetched and 1900non-hardware-prefetch operations on all (MESI) state lines. 1901.Pq Errata 1902This event is affected by processor errata E45. 1903.It Li p6-l2-m-lines-inm 1904Count the number of modified lines allocated in L2 cache. 1905.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier 1906Count the number of L2 M-state lines evicted. 1907.Pp 1908.Pq Tn "Pentium M" 1909On these processors an additional qualifier may be specified and 1910comprises a list of the following keywords separated by 1911.Ql + 1912characters: 1913.Pp 1914.Bl -tag -width indent -compact 1915.It Li both 1916Count both hardware-prefetched lines and non-hardware-prefetched lines. 1917.It Li hw 1918Count hardware-prefetched lines only. 1919.It Li nonhw 1920Exclude hardware-prefetched lines. 1921.El 1922.Pp 1923The default is to count both hardware-prefetched and 1924non-hardware-prefetch operations. 1925.Pq Errata 1926This event is affected by processor errata E53. 1927.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier 1928Count the total number of L2 requests. 1929An additional qualifier may be specified and comprises a list of the following 1930keywords separated by 1931.Ql + 1932characters: 1933.Pp 1934.Bl -tag -width indent -compact 1935.It Li e 1936Count operations affecting E (exclusive) state lines. 1937.It Li i 1938Count operations affecting I (invalid) state lines. 1939.It Li m 1940Count operations affecting M (modified) state lines. 1941.It Li s 1942Count operations affecting S (shared) state lines. 1943.El 1944.Pp 1945The default is to count operations affecting all (MESI) state lines. 1946.It Li p6-l2-st 1947Count the number of L2 data stores. 1948An additional qualifier may be specified and comprises a list of the following 1949keywords separated by 1950.Ql + 1951characters: 1952.Pp 1953.Bl -tag -width indent -compact 1954.It Li e 1955Count operations affecting E (exclusive) state lines. 1956.It Li i 1957Count operations affecting I (invalid) state lines. 1958.It Li m 1959Count operations affecting M (modified) state lines. 1960.It Li s 1961Count operations affecting S (shared) state lines. 1962.El 1963.Pp 1964The default is to count operations affecting all (MESI) state lines. 1965.It Li p6-ld-blocks 1966Count the number of load operations delayed due to store buffer blocks. 1967.It Li p6-misalign-mem-ref 1968Count the number of misaligned data memory references (crossing a 64 1969bit boundary). 1970.It Li p6-mmx-assist 1971.Pq Tn "Pentium II" , Tn "Pentium III" 1972Count the number of MMX assists executed. 1973.It Li p6-mmx-instr-exec 1974.Pq Tn Celeron , Tn "Pentium II" 1975Count the number of MMX instructions executed, except MOVQ and MOVD 1976stores from register to memory. 1977.It Li p6-mmx-instr-ret 1978.Pq Tn "Pentium II" 1979Count the number of MMX instructions retired. 1980.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier 1981.Pq Tn "Pentium II" , Tn "Pentium III" 1982Count the number of MMX instructions executed. 1983An additional qualifier may be specified and comprises a list of 1984the following keywords separated by 1985.Ql + 1986characters: 1987.Pp 1988.Bl -tag -width indent -compact 1989.It Li pack 1990Count MMX pack operation instructions. 1991.It Li packed-arithmetic 1992Count MMX packed arithmetic instructions. 1993.It Li packed-logical 1994Count MMX packed logical instructions. 1995.It Li packed-multiply 1996Count MMX packed multiply instructions. 1997.It Li packed-shift 1998Count MMX packed shift instructions. 1999.It Li unpack 2000Count MMX unpack operation instructions. 2001.El 2002.Pp 2003The default is to count all operations. 2004.It Li p6-mmx-sat-instr-exec 2005.Pq Tn "Pentium II" , Tn "Pentium III" 2006Count the number of MMX saturating instructions executed. 2007.It Li p6-mmx-uops-exec 2008.Pq Tn "Pentium II" , Tn "Pentium III" 2009Count the number of MMX micro-ops executed. 2010.It Li p6-mul 2011Count the number of integer and floating-point multiplies, including 2012speculative multiplies. 2013This event is only allocated on counter 1. 2014.It Li p6-partial-rat-stalls 2015Count the number of cycles or events for partial stalls. 2016.It Li p6-resource-stalls 2017Count the number of cycles there was a resource related stall of any kind. 2018.It Li p6-ret-seg-renames 2019.Pq Tn "Pentium II" , Tn "Pentium III" 2020Count the number of segment register rename events retired. 2021.It Li p6-sb-drains 2022Count the number of cycles the store buffer is draining. 2023.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier 2024.Pq Tn "Pentium II" , Tn "Pentium III" 2025Count the number of segment register renames. 2026An additional qualifier may be specified, and comprises a list of the 2027following keywords separated by 2028.Ql + 2029characters: 2030.Pp 2031.Bl -tag -width indent -compact 2032.It Li ds 2033Count renames for segment register DS. 2034.It Li es 2035Count renames for segment register ES. 2036.It Li fs 2037Count renames for segment register FS. 2038.It Li gs 2039Count renames for segment register GS. 2040.El 2041.Pp 2042The default is to count operations affecting all segment registers. 2043.It Li p6-seg-rename-stalls 2044.Pq Tn "Pentium II" , Tn "Pentium III" 2045Count the number of segment register renaming stalls. 2046An additional qualifier may be specified, and comprises a list of the 2047following keywords separated by 2048.Ql + 2049characters: 2050.Pp 2051.Bl -tag -width indent -compact 2052.It Li ds 2053Count stalls for segment register DS. 2054.It Li es 2055Count stalls for segment register ES. 2056.It Li fs 2057Count stalls for segment register FS. 2058.It Li gs 2059Count stalls for segment register GS. 2060.El 2061.Pp 2062The default is to count operations affecting all the segment registers. 2063.It Li p6-segment-reg-loads 2064Count the number of segment register loads. 2065.It Li p6-uops-retired 2066Count the number of micro-ops retired. 2067.El 2068.Ss Intel P4 PMCS 2069Intel P4 PMCs are present in Intel 2070.Tn "Pentium 4" 2071and 2072.Tn Xeon 2073processors. 2074These PMCs are documented in 2075.Rs 2076.%B "IA-32 Intel(R) Architecture Software Developer's Manual" 2077.%T "Volume 3: System Programming Guide" 2078.%N "Order Number 245472-012" 2079.%D 2003 2080.%Q "Intel Corporation" 2081.Re 2082Further information about using these PMCs may be found in 2083.Rs 2084.%B "IA-32 Intel(R) Architecture Optimization Guide" 2085.%D 2003 2086.%N "Order Number 248966-009" 2087.%Q "Intel Corporation" 2088.Re 2089Some of these events are affected by processor errata described in 2090.Rs 2091.%B "Intel(R) Pentium(R) 4 Processor Specification Update" 2092.%N "Document Number: 249199-059" 2093.%D "April 2005" 2094.%Q "Intel Corporation" 2095.Re 2096.Pp 2097Event specifiers for Intel P4 PMCs can have the following common 2098qualifiers: 2099.Bl -tag -width indent 2100.It Li active= Ns Ar choice 2101(On P4 HTT CPUs) Filter event counting based on which logical 2102processors are active. 2103The allowed values of 2104.Ar choice 2105are: 2106.Pp 2107.Bl -tag -width indent -compact 2108.It Li any 2109Count when either logical processor is active. 2110.It Li both 2111Count when both logical processors are active. 2112.It Li none 2113Count only when neither logical processor is active. 2114.It Li single 2115Count only when one logical processor is active. 2116.El 2117.Pp 2118The default is 2119.Dq Li both . 2120.It Li cascade 2121Configure the PMC to cascade onto its partner. 2122See 2123.Sx "Cascading P4 PMCs" 2124below for more information. 2125.It Li edge 2126Configure the counter to count false to true transitions of the threshold 2127comparision output. 2128This qualifier only takes effect if a threshold qualifier has also been 2129specified. 2130.It Li complement 2131Configure the counter to increment only when the event count seen is 2132less than the threshold qualifier value specified. 2133.It Li mask= Ns Ar qualifier 2134Many event specifiers for Intel P4 PMCs need to be additionally 2135qualified using a mask qualifier. 2136The allowed syntax for these qualifiers is event specific and is 2137described along with the events. 2138.It Li os 2139Configure the PMC to count when the CPL of the processor is 0. 2140.It Li precise 2141Select precise event based sampling. 2142Precise sampling is supported by the hardware for a limited set of 2143events. 2144.It Li tag= Ns Ar value 2145Configure the PMC to tag the internal uop selected by the other 2146fields in this event specifier with value 2147.Ar value . 2148This feature is used when cascading PMCs. 2149.It Li threshold= Ns Ar value 2150Configure the PMC to increment only when the event counts seen are 2151greater than the specified threshold value 2152.Ar value . 2153.It Li usr 2154Configure the PMC to count when the CPL of the processor is 1, 2 or 3. 2155.El 2156.Pp 2157If neither of the 2158.Dq Li os 2159or 2160.Dq Li usr 2161qualifiers are specified, the default is to enable both. 2162.Pp 2163On Intel Pentium 4 processors with HTT, events are 2164divided into two classes: 2165.Pp 2166.Bl -tag -width indent -compact 2167.It "TS Events" 2168are those where hardware can differentiate between events 2169generated on one logical processor from those generated on the 2170other. 2171.It "TI Events" 2172are those where hardware cannot differentiate between events 2173generated by multiple logical processors in a package. 2174.El 2175.Pp 2176Only TS events are allowed for use with process-mode PMCs on 2177Pentium-4/HTT CPUs. 2178.Pp 2179The event specifiers supported by Intel P4 PMCs are: 2180.Pp 2181.Bl -tag -width indent 2182.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags 2183.Pq "TI event" 2184Count integer SIMD SSE2 instructions that operate on 128 bit SIMD 2185operands. 2186Qualifier 2187.Ar flags 2188can take the following value (which is also the default): 2189.Pp 2190.Bl -tag -width indent -compact 2191.It Li all 2192Count all uops operating on 128 bit SIMD integer operands in memory or 2193XMM register. 2194.El 2195.Pp 2196If an instruction contains more than one 128 bit MMX uop, then each 2197uop will be counted. 2198.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags 2199.Pq "TI event" 2200Count MMX instructions that operate on 64 bit SIMD operands. 2201Qualifier 2202.Ar flags 2203can take the following value (which is also the default): 2204.Pp 2205.Bl -tag -width indent -compact 2206.It Li all 2207Count all uops operating on 64 bit SIMD integer operands in memory or 2208in MMX registers. 2209.El 2210.Pp 2211If an instruction contains more than one 64 bit MMX uop, then each 2212uop will be counted. 2213.It Li p4-b2b-cycles 2214.Pq "TI event" 2215Count back-to-back bus cycles. 2216Further documentation for this event is unavailable. 2217.It Li p4-bnr 2218.Pq "TI event" 2219Count bus-not-ready conditions. 2220Further documentation for this event is unavailable. 2221.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier 2222.Pq "TS event" 2223Count instruction fetch requests qualified by additional 2224flags specified in 2225.Ar qualifier . 2226At this point only one flag is supported: 2227.Pp 2228.Bl -tag -width indent -compact 2229.It Li tcmiss 2230Count trace cache lookup misses. 2231.El 2232.Pp 2233The default qualifier is also 2234.Dq Li mask=tcmiss . 2235.It Li p4-branch-retired Op Li ,mask= Ns Ar flags 2236.Pq "TS event" 2237Counts retired branches. 2238Qualifier 2239.Ar flags 2240is a list of the following 2241.Ql + 2242separated strings: 2243.Pp 2244.Bl -tag -width indent -compact 2245.It Li mmnp 2246Count branches not-taken and predicted. 2247.It Li mmnm 2248Count branches not-taken and mis-predicted. 2249.It Li mmtp 2250Count branches taken and predicted. 2251.It Li mmtm 2252Count branches taken and mis-predicted. 2253.El 2254.Pp 2255The default qualifier counts all four kinds of branches. 2256.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier 2257.Pq "TS event" 2258Count the number of entries (clipped at 15) currently active in the 2259BSQ. 2260Qualifier 2261.Ar qualifier 2262is a 2263.Ql + 2264separated set of the following flags: 2265.Pp 2266.Bl -tag -width indent -compact 2267.It Li req-type0 , Li req-type1 2268Forms a 2-bit number used to select the request type encoding: 2269.Pp 2270.Bl -tag -width indent -compact 2271.It Li 0 2272reads excluding read invalidate 2273.It Li 1 2274read invalidates 2275.It Li 2 2276writes other than writebacks 2277.It Li 3 2278writebacks 2279.El 2280.Pp 2281Bit 2282.Dq Li req-type1 2283is the MSB for this two bit number. 2284.It Li req-len0 , Li req-len1 2285Forms a two-bit number that specifies the request length encoding: 2286.Pp 2287.Bl -tag -width indent -compact 2288.It Li 0 22890 chunks 2290.It Li 1 22911 chunk 2292.It Li 3 22938 chunks 2294.El 2295.Pp 2296Bit 2297.Dq Li req-len1 2298is the MSB for this two bit number. 2299.It Li req-io-type 2300Count requests that are input or output requests. 2301.It Li req-lock-type 2302Count requests that lock the bus. 2303.It Li req-lock-cache 2304Count requests that lock the cache. 2305.It Li req-split-type 2306Count requests that is a bus 8-byte chunk that is split across an 23078-byte boundary. 2308.It Li req-dem-type 2309Count requests that are demand (not prefetches) if set. 2310Count requests that are prefetches if not set. 2311.It Li req-ord-type 2312Count requests that are ordered. 2313.It Li mem-type0 , Li mem-type1 , Li mem-type2 2314Forms a 3-bit number that specifies a memory type encoding: 2315.Pp 2316.Bl -tag -width indent -compact 2317.It Li 0 2318UC 2319.It Li 1 2320USWC 2321.It Li 4 2322WT 2323.It Li 5 2324WP 2325.It Li 6 2326WB 2327.El 2328.Pp 2329Bit 2330.Dq Li mem-type2 2331is the MSB of this 3-bit number. 2332.El 2333.Pp 2334The default qualifier has all the above bits set. 2335.Pp 2336Edge triggering using the 2337.Dq Li edge 2338qualifier should not be used with this event when counting cycles. 2339.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier 2340.Pq "TS event" 2341Count allocations in the bus sequence unit according to the flags 2342specified in 2343.Ar qualifier , 2344which is a 2345.Ql + 2346separated set of the following flags: 2347.Pp 2348.Bl -tag -width indent -compact 2349.It Li req-type0 , Li req-type1 2350Forms a 2-bit number used to select the request type encoding: 2351.Pp 2352.Bl -tag -width indent -compact 2353.It Li 0 2354reads excluding read invalidate 2355.It Li 1 2356read invalidates 2357.It Li 2 2358writes other than writebacks 2359.It Li 3 2360writebacks 2361.El 2362.Pp 2363Bit 2364.Dq Li req-type1 2365is the MSB for this two bit number. 2366.It Li req-len0 , Li req-len1 2367Forms a two-bit number that specifies the request length encoding: 2368.Pp 2369.Bl -tag -width indent -compact 2370.It Li 0 23710 chunks 2372.It Li 1 23731 chunk 2374.It Li 3 23758 chunks 2376.El 2377.Pp 2378Bit 2379.Dq Li req-len1 2380is the MSB for this two bit number. 2381.It Li req-io-type 2382Count requests that are input or output requests. 2383.It Li req-lock-type 2384Count requests that lock the bus. 2385.It Li req-lock-cache 2386Count requests that lock the cache. 2387.It Li req-split-type 2388Count requests that is a bus 8-byte chunk that is split across an 23898-byte boundary. 2390.It Li req-dem-type 2391Count requests that are demand (not prefetches) if set. 2392Count requests that are prefetches if not set. 2393.It Li req-ord-type 2394Count requests that are ordered. 2395.It Li mem-type0 , Li mem-type1 , Li mem-type2 2396Forms a 3-bit number that specifies a memory type encoding: 2397.Pp 2398.Bl -tag -width indent -compact 2399.It Li 0 2400UC 2401.It Li 1 2402USWC 2403.It Li 4 2404WT 2405.It Li 5 2406WP 2407.It Li 6 2408WB 2409.El 2410.Pp 2411Bit 2412.Dq Li mem-type2 2413is the MSB of this 3-bit number. 2414.El 2415.Pp 2416The default qualifier has all the above bits set. 2417.Pp 2418This event is usually used along with the 2419.Dq Li edge 2420qualifier to avoid multiple counting. 2421.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier 2422.Pq "TS event" 2423Count cache references as seen by the bus unit (2nd or 3rd level 2424cache references). 2425Qualifier 2426.Ar qualifier 2427is a 2428.Ql + 2429separated list of the following keywords: 2430.Pp 2431.Bl -tag -width indent -compact 2432.It Li rd-2ndl-hits 2433Count 2nd level cache hits in the shared state. 2434.It Li rd-2ndl-hite 2435Count 2nd level cache hits in the exclusive state. 2436.It Li rd-2ndl-hitm 2437Count 2nd level cache hits in the modified state. 2438.It Li rd-3rdl-hits 2439Count 3rd level cache hits in the shared state. 2440.It Li rd-3rdl-hite 2441Count 3rd level cache hits in the exclusive state. 2442.It Li rd-3rdl-hitm 2443Count 3rd level cache hits in the modified state. 2444.It Li rd-2ndl-miss 2445Count 2nd level cache misses. 2446.It Li rd-3rdl-miss 2447Count 3rd level cache misses. 2448.It Li wr-2ndl-miss 2449Count write-back lookups from the data access cache that miss the 2nd 2450level cache. 2451.El 2452.Pp 2453The default is to count all the above events. 2454.It Li p4-execution-event Op Li ,mask= Ns Ar flags 2455.Pq "TS event" 2456Count the retirement of tagged uops selected through the execution 2457tagging mechanism. 2458Qualifier 2459.Ar flags 2460can contain the following strings separated by 2461.Ql + 2462characters: 2463.Pp 2464.Bl -tag -width indent -compact 2465.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3 2466The marked uops are not bogus. 2467.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3 2468The marked uops are bogus. 2469.El 2470.Pp 2471This event requires additional (upstream) events to be allocated to 2472perform the desired uop tagging. 2473The default is to set all the above flags. 2474This event can be used for precise event based sampling. 2475.It Li p4-front-end-event Op Li ,mask= Ns Ar flags 2476.Pq "TS event" 2477Count the retirement of tagged uops selected through the front-end 2478tagging mechanism. 2479Qualifier 2480.Ar flags 2481can contain the following strings separated by 2482.Ql + 2483characters: 2484.Pp 2485.Bl -tag -width indent -compact 2486.It Li nbogus 2487The marked uops are not bogus. 2488.It Li bogus 2489The marked uops are bogus. 2490.El 2491.Pp 2492This event requires additional (upstream) events to be allocated to 2493perform the desired uop tagging. 2494The default is to select both kinds of events. 2495This event can be used for precise event based sampling. 2496.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags 2497.Pq "TI event" 2498Count each DBSY or DRDY event selected by qualifier 2499.Ar flags . 2500Qualifier 2501.Ar flags 2502is a 2503.Ql + 2504separated set of the following flags: 2505.Pp 2506.Bl -tag -width indent -compact 2507.It Li drdy-drv 2508Count when this processor is driving data onto the bus. 2509.It Li drdy-own 2510Count when this processor is reading data from the bus. 2511.It Li drdy-other 2512Count when data is on the bus but not being sampled by this processor. 2513.It Li dbsy-drv 2514Count when this processor reserves the bus for use in the next cycle 2515in order to drive data. 2516.It Li dbsy-own 2517Count when some agent reserves the bus for use in the next bus cycle 2518to drive data that this processor will sample. 2519.It Li dbsy-other 2520Count when some agent reserves the bus for use in the next bus cycle 2521to drive data that this processor will not sample. 2522.El 2523.Pp 2524Flags 2525.Dq Li drdy-own 2526and 2527.Dq Li drdy-other 2528are mutually exclusive. 2529Flags 2530.Dq Li dbsy-own 2531and 2532.Dq Li dbsy-other 2533are mutually exclusive. 2534The default value for 2535.Ar qualifier 2536is 2537.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own . 2538.It Li p4-global-power-events Op Li ,mask= Ns Ar flags 2539.Pq "TS event" 2540Count cycles during which the processor is not stopped. 2541Qualifier 2542.Ar flags 2543can take the following value (which is also the default): 2544.Pp 2545.Bl -tag -width indent -compact 2546.It Li running 2547Count cycles when the processor is active. 2548.El 2549.Pp 2550.It Li p4-instr-retired Op Li ,mask= Ns Ar flags 2551.Pq "TS event" 2552Count instructions retired during a clock cycle. 2553Qualifer 2554.Ar flags 2555comprises of the following strings separated by 2556.Ql + 2557characters: 2558.Pp 2559.Bl -tag -width indent -compact 2560.It Li nbogusntag 2561Count non-bogus instructions that are not tagged. 2562.It Li nbogustag 2563Count non-bogus instructions that are tagged. 2564.It Li bogusntag 2565Count bogus instructions that are not tagged. 2566.It Li bogustag 2567Count bogus instructions that are tagged. 2568.El 2569.Pp 2570The default qualifier counts all the above kinds of instructions. 2571.It Li p4-ioq-active-entries Xo 2572.Op Li ,mask= Ns Ar qualifier 2573.Op Li ,busreqtype= Ns Ar req-type 2574.Xc 2575.Pq "TS event" 2576Count the number of entries (clipped at 15) in the IOQ that are 2577active. 2578The event masks are specified by qualifier 2579.Ar qualifier 2580and 2581.Ar req-type . 2582.Pp 2583Qualifier 2584.Ar qualifier 2585is a 2586.Ql + 2587separated set of the following flags: 2588.Pp 2589.Bl -tag -width indent -compact 2590.It Li all-read 2591Count read entries. 2592.It Li all-write 2593Count write entries. 2594.It Li mem-uc 2595Count entries accessing uncacheable memory. 2596.It Li mem-wc 2597Count entries accessing write-combining memory. 2598.It Li mem-wt 2599Count entries accessing write-through memory. 2600.It Li mem-wp 2601Count entries accessing write-protected memory 2602.It Li mem-wb 2603Count entries accessing write-back memory. 2604.It Li own 2605Count store requests driven by the processor (i.e., not by other 2606processors or by DMA). 2607.It Li other 2608Count store requests driven by other processors or by DMA. 2609.It Li prefetch 2610Include hardware and software prefetch requests in the count. 2611.El 2612.Pp 2613The default value for 2614.Ar qualifier 2615is to enable all the above flags. 2616.Pp 2617The 2618.Ar req-type 2619qualifier is a 5-bit number can be additionally used to select a 2620specific bus request type. 2621The default is 0. 2622.Pp 2623The 2624.Dq Li edge 2625qualifier should not be used when counting cycles with this event. 2626The exact behaviour of this event depends on the processor revision. 2627.It Li p4-ioq-allocation Xo 2628.Op Li ,mask= Ns Ar qualifier 2629.Op Li ,busreqtype= Ns Ar req-type 2630.Xc 2631.Pq "TS event" 2632Count various types of transactions on the bus matching the flags set 2633in 2634.Ar qualifier 2635and 2636.Ar req-type . 2637.Pp 2638Qualifier 2639.Ar qualifier 2640is a 2641.Ql + 2642separated set of the following flags: 2643.Pp 2644.Bl -tag -width indent -compact 2645.It Li all-read 2646Count read entries. 2647.It Li all-write 2648Count write entries. 2649.It Li mem-uc 2650Count entries accessing uncacheable memory. 2651.It Li mem-wc 2652Count entries accessing write-combining memory. 2653.It Li mem-wt 2654Count entries accessing write-through memory. 2655.It Li mem-wp 2656Count entries accessing write-protected memory 2657.It Li mem-wb 2658Count entries accessing write-back memory. 2659.It Li own 2660Count store requests driven by the processor (i.e., not by other 2661processors or by DMA). 2662.It Li other 2663Count store requests driven by other processors or by DMA. 2664.It Li prefetch 2665Include hardware and software prefetch requests in the count. 2666.El 2667.Pp 2668The default value for 2669.Ar qualifier 2670is to enable all the above flags. 2671.Pp 2672The 2673.Ar req-type 2674qualifier is a 5-bit number can be additionally used to select a 2675specific bus request type. 2676The default is 0. 2677.Pp 2678The 2679.Dq Li edge 2680qualifier is normally used with this event to prevent multiple 2681counting. 2682The exact behaviour of this event depends on the processor revision. 2683.It Li p4-itlb-reference Op mask= Ns Ar qualifier 2684.Pq "TS event" 2685Count translations using the intruction translation look-aside 2686buffer. 2687The 2688.Ar qualifier 2689argument is a list of the following strings separated by 2690.Ql + 2691characters. 2692.Pp 2693.Bl -tag -width indent -compact 2694.It Li hit 2695Count ITLB hits. 2696.It Li miss 2697Count ITLB misses. 2698.It Li hit-uc 2699Count uncacheable ITLB hits. 2700.El 2701.Pp 2702If no 2703.Ar qualifier 2704is specified the default is to count all the three kinds of ITLB 2705translations. 2706.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier 2707.Pq "TS event" 2708Count replayed events at the load port. 2709Qualifier 2710.Ar qualifier 2711can take on one value: 2712.Pp 2713.Bl -tag -width indent -compact 2714.It Li split-ld 2715Count split loads. 2716.El 2717.Pp 2718The default value for 2719.Ar qualifier 2720is 2721.Dq Li split-ld . 2722.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags 2723.Pq "TS event" 2724Count mispredicted IA-32 branch instructions. 2725Qualifier 2726.Ar flags 2727can take the following value (which is also the default): 2728.Pp 2729.Bl -tag -width indent -compact 2730.It Li nbogus 2731Count non-bogus retired branch instructions. 2732.El 2733.It Li p4-machine-clear Op Li ,mask= Ns Ar flags 2734.Pq "TS event" 2735Count the number of pipeline clears seen by the processor. 2736Qualifer 2737.Ar flags 2738is a list of the following strings separated by 2739.Ql + 2740characters: 2741.Pp 2742.Bl -tag -width indent -compact 2743.It Li clear 2744Count for a portion of the many cycles when the machine is being 2745cleared for any reason. 2746.It Li moclear 2747Count machine clears due to memory ordering issues. 2748.It Li smclear 2749Count machine clears due to self-modifying code. 2750.El 2751.Pp 2752Use qualifier 2753.Dq Li edge 2754to get a count of occurrences of machine clears. 2755The default qualifier is 2756.Dq Li clear . 2757.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list 2758.Pq "TS event" 2759Count the cancelling of various kinds of requests in the data cache 2760address control unit of the CPU. 2761The qualifier 2762.Ar event-list 2763is a list of the following strings separated by 2764.Ql + 2765characters: 2766.Pp 2767.Bl -tag -width indent -compact 2768.It Li st-rb-full 2769Requests cancelled because no store request buffer was available. 2770.It Li 64k-conf 2771Requests that conflict due to 64K aliasing. 2772.El 2773.Pp 2774If 2775.Ar event-list 2776is not specified, then the default is to count both kinds of events. 2777.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list 2778.Pq "TS event" 2779Count the completion of load split, store split, uncacheable split and 2780uncacheable load operations selected by qualifier 2781.Ar event-list . 2782The qualifier 2783.Ar event-list 2784is a 2785.Ql + 2786separated list of the following flags: 2787.Pp 2788.Bl -tag -width indent -compact 2789.It Li lsc 2790Count load splits completed, excluding loads from uncacheable or 2791write-combining areas. 2792.It Li ssc 2793Count any split stores completed. 2794.El 2795.Pp 2796The default is to count both kinds of operations. 2797.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier 2798.Pq "TS event" 2799Count load replays triggered by the memory order buffer. 2800Qualifier 2801.Ar qualifier 2802can be a 2803.Ql + 2804separated list of the following flags: 2805.Pp 2806.Bl -tag -width indent -compact 2807.It Li no-sta 2808Count replays because of unknown store addresses. 2809.It Li no-std 2810Count replays because of unknown store data. 2811.It Li partial-data 2812Count replays because of partially overlapped data accesses between 2813load and store operations. 2814.It Li unalgn-addr 2815Count replays because of mismatches in the lower 4 bits of load and 2816store operations. 2817.El 2818.Pp 2819The default qualifier is 2820.Ar no-sta+no-std+partial-data+unalgn-addr . 2821.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags 2822.Pq "TI event" 2823Count packed double-precision uops. 2824Qualifier 2825.Ar flags 2826can take the following value (which is also the default): 2827.Pp 2828.Bl -tag -width indent -compact 2829.It Li all 2830Count all uops operating on packed double-precision operands. 2831.El 2832.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags 2833.Pq "TI event" 2834Count packed single-precision uops. 2835Qualifier 2836.Ar flags 2837can take the following value (which is also the default): 2838.Pp 2839.Bl -tag -width indent -compact 2840.It Li all 2841Count all uops operating on packed single-precision operands. 2842.El 2843.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier 2844.Pq "TI event" 2845Count page walks performed by the page miss handler. 2846Qualifier 2847.Ar qualifier 2848can be a 2849.Ql + 2850separated list of the following keywords: 2851.Pp 2852.Bl -tag -width indent -compact 2853.It Li dtmiss 2854Count page walks for data TLB misses. 2855.It Li itmiss 2856Count page walks for instruction TLB misses. 2857.El 2858.Pp 2859The default value for 2860.Ar qualifier 2861is 2862.Dq Li dtmiss+itmiss . 2863.It Li p4-replay-event Op Li ,mask= Ns Ar flags 2864.Pq "TS event" 2865Count the retirement of tagged uops selected through the replay 2866tagging mechanism. 2867Qualifier 2868.Ar flags 2869contains a 2870.Ql + 2871separated set of the following strings: 2872.Pp 2873.Bl -tag -width indent -compact 2874.It Li nbogus 2875The marked uops are not bogus. 2876.It Li bogus 2877The marked uops are bogus. 2878.El 2879.Pp 2880This event requires additional (upstream) events to be allocated to 2881perform the desired uop tagging. 2882The default qualifier counts both kinds of uops. 2883This event can be used for precise event based sampling. 2884.It Li p4-resource-stall Op Li ,mask= Ns Ar flags 2885.Pq "TS event" 2886Count the occurrence or latency of stalls in the allocator. 2887Qualifier 2888.Ar flags 2889can take the following value (which is also the default): 2890.Pp 2891.Bl -tag -width indent -compact 2892.It Li sbfull 2893A stall due to the lack of store buffers. 2894.El 2895.It Li p4-response 2896.Pq "TI event" 2897Count different types of responses. 2898Further documentation on this event is not available. 2899.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags 2900.Pq "TS event" 2901Count branches retired. 2902Qualifier 2903.Ar flags 2904contains a 2905.Ql + 2906separated list of strings: 2907.Pp 2908.Bl -tag -width indent -compact 2909.It Li conditional 2910Count conditional jumps. 2911.It Li call 2912Count direct and indirect call branches. 2913.It Li return 2914Count return branches. 2915.It Li indirect 2916Count returns, indirect calls or indirect jumps. 2917.El 2918.Pp 2919The default qualifier counts all the above branch types. 2920.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags 2921.Pq "TS event" 2922Count mispredicted branches retired. 2923Qualifier 2924.Ar flags 2925contains a 2926.Ql + 2927separated list of strings: 2928.Pp 2929.Bl -tag -width indent -compact 2930.It Li conditional 2931Count conditional jumps. 2932.It Li call 2933Count indirect call branches. 2934.It Li return 2935Count return branches. 2936.It Li indirect 2937Count returns, indirect calls or indirect jumps. 2938.El 2939.Pp 2940The default qualifier counts all the above branch types. 2941.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags 2942.Pq "TI event" 2943Count the number of scalar double-precision uops. 2944Qualifier 2945.Ar flags 2946can take the following value (which is also the default): 2947.Pp 2948.Bl -tag -width indent -compact 2949.It Li all 2950Count the number of scalar double-precision uops. 2951.El 2952.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags 2953.Pq "TI event" 2954Count the number of scalar single-precision uops. 2955Qualifier 2956.Ar flags 2957can take the following value (which is also the default): 2958.Pp 2959.Bl -tag -width indent -compact 2960.It Li all 2961Count all uops operating on scalar single-precision operands. 2962.El 2963.It Li p4-snoop 2964.Pq "TI event" 2965Count snoop traffic. 2966Further documentation on this event is not available. 2967.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags 2968.Pq "TI event" 2969Count the number of times an assist is required to handle problems 2970with the operands for SSE and SSE2 operations. 2971Qualifier 2972.Ar flags 2973can take the following value (which is also the default): 2974.Pp 2975.Bl -tag -width indent -compact 2976.It Li all 2977Count assists for all SSE and SSE2 uops. 2978.El 2979.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier 2980.Pq "TS event" 2981Count events replayed at the store port. 2982Qualifier 2983.Ar qualifier 2984can take on one value: 2985.Pp 2986.Bl -tag -width indent -compact 2987.It Li split-st 2988Count split stores. 2989.El 2990.Pp 2991The default value for 2992.Ar qualifier 2993is 2994.Dq Li split-st . 2995.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier 2996.Pq "TI event" 2997Count the duration in cycles of operating modes of the trace cache and 2998decode engine. 2999The desired operating mode is selected by 3000.Ar qualifier , 3001which is a list of the following strings separated by 3002.Ql + 3003characters: 3004.Pp 3005.Bl -tag -width indent -compact 3006.It Li DD 3007Both logical processors are in deliver mode. 3008.It Li DB 3009Logical processor 0 is in deliver mode while logical processor 1 is in 3010build mode. 3011.It Li DI 3012Logical processor 0 is in deliver mode while logical processor 1 is 3013halted, or in machine clear, or transitioning to a long microcode 3014flow. 3015.It Li BD 3016Logical processor 0 is in build mode while logical processor 1 is in 3017deliver mode. 3018.It Li BB 3019Both logical processors are in build mode. 3020.It Li BI 3021Logical processor 0 is in build mode while logical processor 1 is 3022halted, or in machine clear or transitioning to a long microcode 3023flow. 3024.It Li ID 3025Logical processor 0 is halted, or in machine clear or transitioning to 3026a long microcode flow while logical processor 1 is in deliver mode. 3027.It Li IB 3028Logical processor 0 is halted, or in machine clear or transitioning to 3029a long microcode flow while logical processor 1 is in build mode. 3030.El 3031.Pp 3032If there is only one logical processor in the processor package then 3033the qualifier for logical processor 1 is ignored. 3034If no qualifier is specified, the default qualifier is 3035.Dq Li DD+DB+DI+BD+BB+BI+ID+IB . 3036.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags 3037.Pq "TI event" 3038Count the number of times uop delivery changed from the trace cache to 3039MS ROM. 3040Qualifier 3041.Ar flags 3042can take the following value (which is also the default): 3043.Pp 3044.Bl -tag -width indent -compact 3045.It Li cisc 3046Count TC to MS transfers. 3047.El 3048.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags 3049.Pq "TS event" 3050Count the number of valid uops written to the uop queue. 3051Qualifier 3052.Ar flags 3053is a list of the following strings, separated by 3054.Ql + 3055characters: 3056.Pp 3057.Bl -tag -width indent -compact 3058.It Li from-tc-build 3059Count uops being written from the trace cache in build mode. 3060.It Li from-tc-deliver 3061Count uops being written from the trace cache in deliver mode. 3062.It Li from-rom 3063Count uops being written from microcode ROM. 3064.El 3065.Pp 3066The default qualifier counts all the above kinds of uops. 3067.It Li p4-uop-type Op Li ,mask= Ns Ar flags 3068.Pq "TS event" 3069This event is used in conjunction with the front-end at-retirement 3070mechanism to tag load and store uops. 3071Qualifer 3072.Ar flags 3073comprises the following strings separated by 3074.Ql + 3075characters: 3076.Pp 3077.Bl -tag -width indent -compact 3078.It Li tagloads 3079Mark uops that are load operations. 3080.It Li tagstores 3081Mark uops that are store operations. 3082.El 3083.Pp 3084The default qualifier counts both kinds of uops. 3085.It Li p4-uops-retired Op Li ,mask= Ns Ar flags 3086.Pq "TS event" 3087Count uops retired during a clock cycle. 3088Qualifier 3089.Ar flags 3090comprises the following strings separated by 3091.Ql + 3092characters: 3093.Pp 3094.Bl -tag -width indent -compact 3095.It Li nbogus 3096Count marked uops that are not bogus. 3097.It Li bogus 3098Count marked uops that are bogus. 3099.El 3100.Pp 3101The default qualifier counts both kinds of uops. 3102.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags 3103.Pq "TI event" 3104Count write-combining buffer operations. 3105Qualifier 3106.Ar flags 3107contains the following strings separated by 3108.Ql + 3109characters: 3110.Pp 3111.Bl -tag -width indent -compact 3112.It Li wcb-evicts 3113WC buffer evictions due to any cause. 3114.It Li wcb-full-evict 3115WC buffer evictions due to no WC buffer being available. 3116.El 3117.Pp 3118The default qualifer counts both kinds of evictions. 3119.It Li p4-x87-assist Op Li ,mask= Ns Ar flags 3120.Pq "TS event" 3121Count the retirement of x87 instructions that required special 3122handling. 3123Qualifier 3124.Ar flags 3125contains the following strings separated by 3126.Ql + 3127characters: 3128.Pp 3129.Bl -tag -width indent -compact 3130.It Li fpsu 3131Count instructions that saw an FP stack underflow. 3132.It Li fpso 3133Count instructions that saw an FP stack overflow. 3134.It Li poao 3135Count instructions that saw an x87 output overflow. 3136.It Li poau 3137Count instructions that saw an x87 output underflow. 3138.It Li prea 3139Count instructions that needed an x87 input assist. 3140.El 3141.Pp 3142The default qualifier counts all the above types of instruction 3143retirements. 3144.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags 3145.Pq "TI event" 3146Count x87 floating-point uops. 3147Qualifier 3148.Ar flags 3149can take the following value (which is also the default): 3150.Pp 3151.Bl -tag -width indent -compact 3152.It Li all 3153Count all x87 floating-point uops. 3154.El 3155.Pp 3156If an instruction contains more than one x87 floating-point uops, then 3157all x87 floating-point uops will be counted. 3158This event does not count x87 floating-point data movement operations. 3159.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags 3160.Pq "TI event" 3161Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store 3162data or perform register-to-register moves. 3163This event does not count integer move uops. 3164Qualifier 3165.Ar flags 3166may contain the following keywords separated by 3167.Ql + 3168characters: 3169.Pp 3170.Bl -tag -width indent -compact 3171.It Li allp0 3172Count all x87 and SIMD store and move uops. 3173.It Li allp2 3174Count all x87 and SIMD load uops. 3175.El 3176.Pp 3177The default is to count all uops. 3178.Pq Errata 3179This event may be affected by processor errata N43. 3180.El 3181.Ss "Cascading P4 PMCs" 3182PMC cascading support is currently poorly implemented. 3183While individual event counters may be allocated with a 3184.Dq Li cascade 3185qualifier, the current API does not offer the ability 3186to name and allocate all the resources needed for a 3187cascaded event counter pair in a single operation. 3188.Ss "Precise Event Based Sampling" 3189Support for precise event based sampling is currently 3190unimplemented. 3191.Sh COMPATIBILITY 3192The interface between the 3193.Nm pmc 3194library and the 3195.Xr hwpmc 4 3196driver is intended to be private to the implementation and may 3197change. 3198In order to ease forward compatibility with future versions of the 3199.Xr hwpmc 4 3200driver, applications are urged to dynamically link with the 3201.Nm pmc 3202library. 3203.Pp 3204The 3205.Nm pmc 3206API is 3207.Ud 3208.Sh SEE ALSO 3209.Xr pmclog 3 , 3210.Xr hwpmc 4 , 3211.Xr pmccontrol 8 , 3212.Xr pmcstat 8 3213.Sh HISTORY 3214The 3215.Nm pmc 3216library first appeared in 3217.Fx 6.0 . 3218.Sh AUTHORS 3219The 3220.Lb libpmc 3221library was written by 3222.An "Joseph Koshy" 3223.Aq jkoshy@FreeBSD.org . 3224