1.\" Copyright (c) 2010 Fabien Thomas. All rights reserved. 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" This software is provided by Joseph Koshy ``as is'' and 13.\" any express or implied warranties, including, but not limited to, the 14.\" implied warranties of merchantability and fitness for a particular purpose 15.\" are disclaimed. in no event shall Joseph Koshy be liable 16.\" for any direct, indirect, incidental, special, exemplary, or consequential 17.\" damages (including, but not limited to, procurement of substitute goods 18.\" or services; loss of use, data, or profits; or business interruption) 19.\" however caused and on any theory of liability, whether in contract, strict 20.\" liability, or tort (including negligence or otherwise) arising in any way 21.\" out of the use of this software, even if advised of the possibility of 22.\" such damage. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd March 24, 2010 27.Dt PMC.WESTMERE 3 28.Os 29.Sh NAME 30.Nm pmc.westmere 31.Nd measurement events for 32.Tn Intel 33.Tn Westmere 34family CPUs 35.Sh LIBRARY 36.Lb libpmc 37.Sh SYNOPSIS 38.In pmc.h 39.Sh DESCRIPTION 40.Tn Intel 41.Tn "Westmere" 42CPUs contain PMCs conforming to version 2 of the 43.Tn Intel 44performance measurement architecture. 45These CPUs may contain up to three classes of PMCs: 46.Bl -tag -width "Li PMC_CLASS_IAP" 47.It Li PMC_CLASS_IAF 48Fixed-function counters that count only one hardware event per counter. 49.It Li PMC_CLASS_IAP 50Programmable counters that may be configured to count one of a defined 51set of hardware events. 52.El 53.Pp 54The number of PMCs available in each class and their widths need to be 55determined at run time by calling 56.Xr pmc_cpuinfo 3 . 57.Pp 58Intel Westmere PMCs are documented in 59.Rs 60.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual" 61.%T "Volume 3B: System Programming Guide, Part 2" 62.%N "Order Number: 253669-033US" 63.%D December 2009 64.%Q "Intel Corporation" 65.Re 66.Ss WESTMERE FIXED FUNCTION PMCS 67These PMCs and their supported events are documented in 68.Xr pmc.iaf 3 . 69.Ss WESTMERE PROGRAMMABLE PMCS 70The programmable PMCs support the following capabilities: 71.Bl -column "PMC_CAP_INTERRUPT" "Support" 72.It Em Capability Ta Em Support 73.It PMC_CAP_CASCADE Ta \&No 74.It PMC_CAP_EDGE Ta Yes 75.It PMC_CAP_INTERRUPT Ta Yes 76.It PMC_CAP_INVERT Ta Yes 77.It PMC_CAP_READ Ta Yes 78.It PMC_CAP_PRECISE Ta \&No 79.It PMC_CAP_SYSTEM Ta Yes 80.It PMC_CAP_TAGGING Ta \&No 81.It PMC_CAP_THRESHOLD Ta Yes 82.It PMC_CAP_USER Ta Yes 83.It PMC_CAP_WRITE Ta Yes 84.El 85.Ss Event Qualifiers 86Event specifiers for these PMCs support the following common 87qualifiers: 88.Bl -tag -width indent 89.It Li rsp= Ns Ar value 90Configure the Off-core Response bits. 91.Bl -tag -width indent 92.It Li DMND_DATA_RD 93Counts the number of demand and DCU prefetch data reads of full 94and partial cachelines as well as demand data page table entry 95cacheline reads. Does not count L2 data read prefetches or 96instruction fetches. 97.It Li DMND_RFO 98Counts the number of demand and DCU prefetch reads for ownership 99(RFO) requests generated by a write to data cacheline. Does not 100count L2 RFO. 101.It Li DMND_IFETCH 102Counts the number of demand and DCU prefetch instruction cacheline 103reads. Does not count L2 code read prefetches. 104WB 105Counts the number of writeback (modified to exclusive) transactions. 106.It Li PF_DATA_RD 107Counts the number of data cacheline reads generated by L2 prefetchers. 108.It Li PF_RFO 109Counts the number of RFO requests generated by L2 prefetchers. 110.It Li PF_IFETCH 111Counts the number of code reads generated by L2 prefetchers. 112.It Li OTHER 113Counts one of the following transaction types, including L3 invalidate, 114I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, 115lock, unlock, split lock. 116.It Li UNCORE_HIT 117L3 Hit: local or remote home requests that hit L3 cache in the uncore 118with no coherency actions required (snooping). 119.It Li OTHER_CORE_HIT_SNP 120L3 Hit: local or remote home requests that hit L3 cache in the uncore 121and was serviced by another core with a cross core snoop where no modified 122copies were found (clean). 123.It Li OTHER_CORE_HITM 124L3 Hit: local or remote home requests that hit L3 cache in the uncore 125and was serviced by another core with a cross core snoop where modified 126copies were found (HITM). 127.It Li REMOTE_CACHE_FWD 128L3 Miss: local homed requests that missed the L3 cache and was serviced 129by forwarded data following a cross package snoop where no modified 130copies found. (Remote home requests are not counted) 131.It Li REMOTE_DRAM 132L3 Miss: remote home requests that missed the L3 cache and were serviced 133by remote DRAM. 134.It Li LOCAL_DRAM 135L3 Miss: local home requests that missed the L3 cache and were serviced 136by local DRAM. 137.It Li NON_DRAM 138Non-DRAM requests that were serviced by IOH. 139.El 140.It Li cmask= Ns Ar value 141Configure the PMC to increment only if the number of configured 142events measured in a cycle is greater than or equal to 143.Ar value . 144.It Li edge 145Configure the PMC to count the number of de-asserted to asserted 146transitions of the conditions expressed by the other qualifiers. 147If specified, the counter will increment only once whenever a 148condition becomes true, irrespective of the number of clocks during 149which the condition remains true. 150.It Li inv 151Invert the sense of comparison when the 152.Dq Li cmask 153qualifier is present, making the counter increment when the number of 154events per cycle is less than the value specified by the 155.Dq Li cmask 156qualifier. 157.It Li os 158Configure the PMC to count events happening at processor privilege 159level 0. 160.It Li usr 161Configure the PMC to count events occurring at privilege levels 1, 2 162or 3. 163.El 164.Pp 165If neither of the 166.Dq Li os 167or 168.Dq Li usr 169qualifiers are specified, the default is to enable both. 170.Ss Event Specifiers (Programmable PMCs) 171Westmere programmable PMCs support the following events: 172.Bl -tag -width indent 173.It Li LOAD_BLOCK.OVERLAP_STORE 174.Pq Event 03H , Umask 02H 175Loads that partially overlap an earlier store 176.It Li SB_DRAIN.ANY 177.Pq Event 04H , Umask 07H 178All Store buffer stall cycles 179.It Li MISALIGN_MEMORY.STORE 180.Pq Event 05H , Umask 02H 181All store referenced with misaligned address 182.It Li STORE_BLOCKS.AT_RET 183.Pq Event 06H , Umask 04H 184Counts number of loads delayed with at-Retirement block code. The following 185loads need to be executed at retirement and wait for all senior stores on 186the same thread to be drained: load splitting across 4K boundary (page 187split), load accessing uncacheable (UC or USWC) memory, load lock, and load 188with page table in UC or USWC memory region. 189.It Li STORE_BLOCKS.L1D_BLOCK 190.Pq Event 06H , Umask 08H 191Cacheable loads delayed with L1D block code 192.It Li PARTIAL_ADDRESS_ALIAS 193.Pq Event 07H , Umask 01H 194Counts false dependency due to partial address aliasing 195.It Li DTLB_LOAD_MISSES.ANY 196.Pq Event 08H , Umask 01H 197Counts all load misses that cause a page walk 198.It Li DTLB_LOAD_MISSES.WALK_COMPLETED 199.Pq Event 08H , Umask 02H 200Counts number of completed page walks due to load miss in the STLB. 201.It Li DTLB_LOAD_MISSES.WALK_CYCLES 202.Pq Event 08H , Umask 04H 203Cycles PMH is busy with a page walk due to a load miss in the STLB. 204.It Li DTLB_LOAD_MISSES.STLB_HIT 205.Pq Event 08H , Umask 10H 206Number of cache load STLB hits 207.It Li DTLB_LOAD_MISSES.PDE_MISS 208.Pq Event 08H , Umask 20H 209Number of DTLB cache load misses where the low part of the linear to 210physical address translation was missed. 211.It Li MEM_INST_RETIRED.LOADS 212.Pq Event 0BH , Umask 01H 213Counts the number of instructions with an architecturally-visible store 214retired on the architected path. 215In conjunction with ld_lat facility 216.It Li MEM_INST_RETIRED.STORES 217.Pq Event 0BH , Umask 02H 218Counts the number of instructions with an architecturally-visible store 219retired on the architected path. 220In conjunction with ld_lat facility 221.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD 222.Pq Event 0BH , Umask 10H 223Counts the number of instructions exceeding the latency specified with 224ld_lat facility. 225In conjunction with ld_lat facility 226.It Li MEM_STORE_RETIRED.DTLB_MISS 227.Pq Event 0CH , Umask 01H 228The event counts the number of retired stores that missed the DTLB. The DTLB 229miss is not counted if the store operation causes a fault. Does not counter 230prefetches. Counts both primary and secondary misses to the TLB 231.It Li UOPS_ISSUED.ANY 232.Pq Event 0EH , Umask 01H 233Counts the number of Uops issued by the Register Allocation Table to the 234Reservation Station, i.e. the UOPs issued from the front end to the back 235end. 236.It Li UOPS_ISSUED.STALLED_CYCLES 237.Pq Event 0EH , Umask 01H 238Counts the number of cycles no Uops issued by the Register Allocation Table 239to the Reservation Station, i.e. the UOPs issued from the front end to the 240back end. 241set invert=1, cmask = 1 242.It Li UOPS_ISSUED.FUSED 243.Pq Event 0EH , Umask 02H 244Counts the number of fused Uops that were issued from the Register 245Allocation Table to the Reservation Station. 246.It Li MEM_UNCORE_RETIRED.LOCAL_HITM 247.Pq Event 0FH , Umask 02H 248Load instructions retired that HIT modified data in sibling core (Precise 249Event) 250.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT 251.Pq Event 0FH , Umask 08H 252Load instructions retired local dram and remote cache HIT data sources 253(Precise Event) 254.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM 255.Pq Event 0FH , Umask 10H 256Load instructions retired with a data source of local DRAM or locally homed 257remote cache HITM (Precise Event) 258.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM 259.Pq Event 0FH , Umask 20H 260Load instructions retired remote DRAM and remote home-remote cache HITM 261(Precise Event) 262.It Li MEM_UNCORE_RETIRED.UNCACHEABLE 263.Pq Event 0FH , Umask 80H 264Load instructions retired I/O (Precise Event) 265.It Li FP_COMP_OPS_EXE.X87 266.Pq Event 10H , Umask 01H 267Counts the number of FP Computational Uops Executed. The number of FADD, 268FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer 269DIVs, and IDIVs. This event does not distinguish an FADD used in the middle 270of a transcendental flow from a separate FADD instruction. 271.It Li FP_COMP_OPS_EXE.MMX 272.Pq Event 10H , Umask 02H 273Counts number of MMX Uops executed. 274.It Li FP_COMP_OPS_EXE.SSE_FP 275.Pq Event 10H , Umask 04H 276Counts number of SSE and SSE2 FP uops executed. 277.It Li FP_COMP_OPS_EXE.SSE2_INTEGER 278.Pq Event 10H , Umask 08H 279Counts number of SSE2 integer uops executed. 280.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED 281.Pq Event 10H , Umask 10H 282Counts number of SSE FP packed uops executed. 283.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR 284.Pq Event 10H , Umask 20H 285Counts number of SSE FP scalar uops executed. 286.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION 287.Pq Event 10H , Umask 40H 288Counts number of SSE* FP single precision uops executed. 289.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION 290.Pq Event 10H , Umask 80H 291Counts number of SSE* FP double precision uops executed. 292.It Li SIMD_INT_128.PACKED_MPY 293.Pq Event 12H , Umask 01H 294Counts number of 128 bit SIMD integer multiply operations. 295.It Li SIMD_INT_128.PACKED_SHIFT 296.Pq Event 12H , Umask 02H 297Counts number of 128 bit SIMD integer shift operations. 298.It Li SIMD_INT_128.PACK 299.Pq Event 12H , Umask 04H 300Counts number of 128 bit SIMD integer pack operations. 301.It Li SIMD_INT_128.UNPACK 302.Pq Event 12H , Umask 08H 303Counts number of 128 bit SIMD integer unpack operations. 304.It Li SIMD_INT_128.PACKED_LOGICAL 305.Pq Event 12H , Umask 10H 306Counts number of 128 bit SIMD integer logical operations. 307.It Li SIMD_INT_128.PACKED_ARITH 308.Pq Event 12H , Umask 20H 309Counts number of 128 bit SIMD integer arithmetic operations. 310.It Li SIMD_INT_128.SHUFFLE_MOVE 311.Pq Event 12H , Umask 40H 312Counts number of 128 bit SIMD integer shuffle and move operations. 313.It Li LOAD_DISPATCH.RS 314.Pq Event 13H , Umask 01H 315Counts number of loads dispatched from the Reservation Station that bypass 316the Memory Order Buffer. 317.It Li LOAD_DISPATCH.RS_DELAYED 318.Pq Event 13H , Umask 02H 319Counts the number of delayed RS dispatches at the stage latch. If an RS 320dispatch can not bypass to LB, it has another chance to dispatch from the 321one-cycle delayed staging latch before it is written into the LB. 322.It Li LOAD_DISPATCH.MOB 323.Pq Event 13H , Umask 04H 324Counts the number of loads dispatched from the Reservation Station to the 325Memory Order Buffer. 326.It Li LOAD_DISPATCH.ANY 327.Pq Event 13H , Umask 07H 328Counts all loads dispatched from the Reservation Station. 329.It Li ARITH.CYCLES_DIV_BUSY 330.Pq Event 14H , Umask 01H 331Counts the number of cycles the divider is busy executing divide or square 332root operations. The divide can be integer, X87 or Streaming SIMD Extensions 333(SSE). The square root operation can be either X87 or SSE. 334Set 'edge =1, invert=1, cmask=1' to count the number of divides. 335Count may be incorrect When SMT is on 336.It Li ARITH.MUL 337.Pq Event 14H , Umask 02H 338Counts the number of multiply operations executed. This includes integer as 339well as floating point multiply operations but excludes DPPS mul and MPSAD. 340Count may be incorrect When SMT is on 341.It Li INST_QUEUE_WRITES 342.Pq Event 17H , Umask 01H 343Counts the number of instructions written into the instruction queue every 344cycle. 345.It Li INST_DECODED.DEC0 346.Pq Event 18H , Umask 01H 347Counts number of instructions that require decoder 0 to be decoded. Usually, 348this means that the instruction maps to more than 1 uop 349.It Li TWO_UOP_INSTS_DECODED 350.Pq Event 19H , Umask 01H 351An instruction that generates two uops was decoded 352.It Li INST_QUEUE_WRITE_CYCLES 353.Pq Event 1EH , Umask 01H 354This event counts the number of cycles during which instructions are written 355to the instruction queue. Dividing this counter by the number of 356instructions written to the instruction queue (INST_QUEUE_WRITES) yields the 357average number of instructions decoded each cycle. If this number is less 358than four and the pipe stalls, this indicates that the decoder is failing to 359decode enough instructions per cycle to sustain the 4-wide pipeline. 360If SSE* instructions that are 6 bytes or longer arrive one after another, 361then front end throughput may limit execution speed. In such case, 362.It Li LSD_OVERFLOW 363.Pq Event 20H , Umask 01H 364Number of loops that can not stream from the instruction queue. 365.It Li L2_RQSTS.LD_HIT 366.Pq Event 24H , Umask 01H 367Counts number of loads that hit the L2 cache. L2 loads include both L1D 368demand misses as well as L1D prefetches. L2 loads can be rejected for 369various reasons. Only non rejected loads are counted. 370.It Li L2_RQSTS.LD_MISS 371.Pq Event 24H , Umask 02H 372Counts the number of loads that miss the L2 cache. L2 loads include both L1D 373demand misses as well as L1D prefetches. 374.It Li L2_RQSTS.LOADS 375.Pq Event 24H , Umask 03H 376Counts all L2 load requests. L2 loads include both L1D demand misses as well 377as L1D prefetches. 378.It Li L2_RQSTS.RFO_HIT 379.Pq Event 24H , Umask 04H 380Counts the number of store RFO requests that hit the L2 cache. L2 RFO 381requests include both L1D demand RFO misses as well as L1D RFO prefetches. 382Count includes WC memory requests, where the data is not fetched but the 383permission to write the line is required. 384.It Li L2_RQSTS.RFO_MISS 385.Pq Event 24H , Umask 08H 386Counts the number of store RFO requests that miss the L2 cache. L2 RFO 387requests include both L1D demand RFO misses as well as L1D RFO prefetches. 388.It Li L2_RQSTS.RFOS 389.Pq Event 24H , Umask 0CH 390Counts all L2 store RFO requests. L2 RFO requests include both L1D demand 391RFO misses as well as L1D RFO prefetches.. 392.It Li L2_RQSTS.IFETCH_HIT 393.Pq Event 24H , Umask 10H 394Counts number of instruction fetches that hit the L2 cache. L2 instruction 395fetches include both L1I demand misses as well as L1I instruction 396prefetches. 397.It Li L2_RQSTS.IFETCH_MISS 398.Pq Event 24H , Umask 20H 399Counts number of instruction fetches that miss the L2 cache. L2 instruction 400fetches include both L1I demand misses as well as L1I instruction 401prefetches. 402.It Li L2_RQSTS.IFETCHES 403.Pq Event 24H , Umask 30H 404Counts all instruction fetches. L2 instruction fetches include both L1I 405demand misses as well as L1I instruction prefetches. 406.It Li L2_RQSTS.PREFETCH_HIT 407.Pq Event 24H , Umask 40H 408Counts L2 prefetch hits for both code and data. 409.It Li L2_RQSTS.PREFETCH_MISS 410.Pq Event 24H , Umask 80H 411Counts L2 prefetch misses for both code and data. 412.It Li L2_RQSTS.PREFETCHES 413.Pq Event 24H , Umask C0H 414Counts all L2 prefetches for both code and data. 415.It Li L2_RQSTS.MISS 416.Pq Event 24H , Umask AAH 417Counts all L2 misses for both code and data. 418.It Li L2_RQSTS.REFERENCES 419.Pq Event 24H , Umask FFH 420Counts all L2 requests for both code and data. 421.It Li L2_DATA_RQSTS.DEMAND.I_STATE 422.Pq Event 26H , Umask 01H 423Counts number of L2 data demand loads where the cache line to be loaded is 424in the I (invalid) state, i.e. a cache miss. L2 demand loads are both L1D 425demand misses and L1D prefetches. 426.It Li L2_DATA_RQSTS.DEMAND.S_STATE 427.Pq Event 26H , Umask 02H 428Counts number of L2 data demand loads where the cache line to be loaded is 429in the S (shared) state. L2 demand loads are both L1D demand misses and L1D 430prefetches. 431.It Li L2_DATA_RQSTS.DEMAND.E_STATE 432.Pq Event 26H , Umask 04H 433Counts number of L2 data demand loads where the cache line to be loaded is 434in the E (exclusive) state. L2 demand loads are both L1D demand misses and 435L1D prefetches. 436.It Li L2_DATA_RQSTS.DEMAND.M_STATE 437.Pq Event 26H , Umask 08H 438Counts number of L2 data demand loads where the cache line to be loaded is 439in the M (modified) state. L2 demand loads are both L1D demand misses and 440L1D prefetches. 441.It Li L2_DATA_RQSTS.DEMAND.MESI 442.Pq Event 26H , Umask 0FH 443Counts all L2 data demand requests. L2 demand loads are both L1D demand 444misses and L1D prefetches. 445.It Li L2_DATA_RQSTS.PREFETCH.I_STATE 446.Pq Event 26H , Umask 10H 447Counts number of L2 prefetch data loads where the cache line to be loaded is 448in the I (invalid) state, i.e. a cache miss. 449.It Li L2_DATA_RQSTS.PREFETCH.S_STATE 450.Pq Event 26H , Umask 20H 451Counts number of L2 prefetch data loads where the cache line to be loaded is 452in the S (shared) state. A prefetch RFO will miss on an S state line, while 453a prefetch read will hit on an S state line. 454.It Li L2_DATA_RQSTS.PREFETCH.E_STATE 455.Pq Event 26H , Umask 40H 456Counts number of L2 prefetch data loads where the cache line to be loaded is 457in the E (exclusive) state. 458.It Li L2_DATA_RQSTS.PREFETCH.M_STATE 459.Pq Event 26H , Umask 80H 460Counts number of L2 prefetch data loads where the cache line to be loaded is 461in the M (modified) state. 462.It Li L2_DATA_RQSTS.PREFETCH.MESI 463.Pq Event 26H , Umask F0H 464Counts all L2 prefetch requests. 465.It Li L2_DATA_RQSTS.ANY 466.Pq Event 26H , Umask FFH 467Counts all L2 data requests. 468.It Li L2_WRITE.RFO.I_STATE 469.Pq Event 27H , Umask 01H 470Counts number of L2 demand store RFO requests where the cache line to be 471loaded is in the I (invalid) state, i.e, a cache miss. The L1D prefetcher 472does not issue a RFO prefetch. 473This is a demand RFO request 474.It Li L2_WRITE.RFO.S_STATE 475.Pq Event 27H , Umask 02H 476Counts number of L2 store RFO requests where the cache line to be loaded is 477in the S (shared) state. The L1D prefetcher does not issue a RFO prefetch,. 478This is a demand RFO request 479.It Li L2_WRITE.RFO.M_STATE 480.Pq Event 27H , Umask 08H 481Counts number of L2 store RFO requests where the cache line to be loaded is 482in the M (modified) state. The L1D prefetcher does not issue a RFO prefetch. 483This is a demand RFO request 484.It Li L2_WRITE.RFO.HIT 485.Pq Event 27H , Umask 0EH 486Counts number of L2 store RFO requests where the cache line to be loaded is 487in either the S, E or M states. The L1D prefetcher does not issue a RFO 488prefetch. 489This is a demand RFO request 490.It Li L2_WRITE.RFO.MESI 491.Pq Event 27H , Umask 0FH 492Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO 493prefetch. 494This is a demand RFO request 495.It Li L2_WRITE.LOCK.I_STATE 496.Pq Event 27H , Umask 10H 497Counts number of L2 demand lock RFO requests where the cache line to be 498loaded is in the I (invalid) state, i.e. a cache miss. 499.It Li L2_WRITE.LOCK.S_STATE 500.Pq Event 27H , Umask 20H 501Counts number of L2 lock RFO requests where the cache line to be loaded is 502in the S (shared) state. 503.It Li L2_WRITE.LOCK.E_STATE 504.Pq Event 27H , Umask 40H 505Counts number of L2 demand lock RFO requests where the cache line to be 506loaded is in the E (exclusive) state. 507.It Li L2_WRITE.LOCK.M_STATE 508.Pq Event 27H , Umask 80H 509Counts number of L2 demand lock RFO requests where the cache line to be 510loaded is in the M (modified) state. 511.It Li L2_WRITE.LOCK.HIT 512.Pq Event 27H , Umask E0H 513Counts number of L2 demand lock RFO requests where the cache line to be 514loaded is in either the S, E, or M state. 515.It Li L2_WRITE.LOCK.MESI 516.Pq Event 27H , Umask F0H 517Counts all L2 demand lock RFO requests. 518.It Li L1D_WB_L2.I_STATE 519.Pq Event 28H , Umask 01H 520Counts number of L1 writebacks to the L2 where the cache line to be written 521is in the I (invalid) state, i.e. a cache miss. 522.It Li L1D_WB_L2.S_STATE 523.Pq Event 28H , Umask 02H 524Counts number of L1 writebacks to the L2 where the cache line to be written 525is in the S state. 526.It Li L1D_WB_L2.E_STATE 527.Pq Event 28H , Umask 04H 528Counts number of L1 writebacks to the L2 where the cache line to be written 529is in the E (exclusive) state. 530.It Li L1D_WB_L2.M_STATE 531.Pq Event 28H , Umask 08H 532Counts number of L1 writebacks to the L2 where the cache line to be written 533is in the M (modified) state. 534.It Li L1D_WB_L2.MESI 535.Pq Event 28H , Umask 0FH 536Counts all L1 writebacks to the L2. 537.It Li L3_LAT_CACHE.REFERENCE 538.Pq Event 2EH , Umask 02H 539Counts uncore Last Level Cache references. Because cache hierarchy, cache 540sizes and other implementation-specific characteristics; value comparison to 541estimate performance differences is not recommended. 542see Table A-1 543.It Li L3_LAT_CACHE.MISS 544.Pq Event 2EH , Umask 01H 545Counts uncore Last Level Cache misses. Because cache hierarchy, cache sizes 546and other implementation-specific characteristics; value comparison to 547estimate performance differences is not recommended. 548see Table A-1 549.It Li CPU_CLK_UNHALTED.THREAD_P 550.Pq Event 3CH , Umask 00H 551Counts the number of thread cycles while the thread is not in a halt state. 552The thread enters the halt state when it is running the HLT instruction. The 553core frequency may change from time to time due to power or thermal 554throttling. 555see Table A-1 556.It Li CPU_CLK_UNHALTED.REF_P 557.Pq Event 3CH , Umask 01H 558Increments at the frequency of TSC when not halted. 559see Table A-1 560.It Li DTLB_MISSES.ANY 561.Pq Event 49H , Umask 01H 562Counts the number of misses in the STLB which causes a page walk. 563.It Li DTLB_MISSES.WALK_COMPLETED 564.Pq Event 49H , Umask 02H 565Counts number of misses in the STLB which resulted in a completed page walk. 566.It Li DTLB_MISSES.WALK_CYCLES 567.Pq Event 49H , Umask 04H 568Counts cycles of page walk due to misses in the STLB. 569.It Li DTLB_MISSES.STLB_HIT 570.Pq Event 49H , Umask 10H 571Counts the number of DTLB first level misses that hit in the second level 572TLB. This event is only relevant if the core contains multiple DTLB levels. 573.It Li DTLB_MISSES.LARGE_WALK_COMPLETED 574.Pq Event 49H , Umask 80H 575Counts number of completed large page walks due to misses in the STLB. 576.It Li LOAD_HIT_PRE 577.Pq Event 4CH , Umask 01H 578Counts load operations sent to the L1 data cache while a previous SSE 579prefetch instruction to the same cache line has started prefetching but has 580not yet finished. 581.It Li L1D_PREFETCH.REQUESTS 582.Pq Event 4EH , Umask 01H 583Counts number of hardware prefetch requests dispatched out of the prefetch 584FIFO. 585.It Li L1D_PREFETCH.MISS 586.Pq Event 4EH , Umask 02H 587Counts number of hardware prefetch requests that miss the L1D. There are two 588prefetchers in the L1D. A streamer, which predicts lines sequentially after 589this one should be fetched, and the IP prefetcher that remembers access 590patterns for the current instruction. The streamer prefetcher stops on an 591L1D hit, while the IP prefetcher does not. 592.It Li L1D_PREFETCH.TRIGGERS 593.Pq Event 4EH , Umask 04H 594Counts number of prefetch requests triggered by the Finite State Machine and 595pushed into the prefetch FIFO. Some of the prefetch requests are dropped due 596to overwrites or competition between the IP index prefetcher and streamer 597prefetcher. The prefetch FIFO contains 4 entries. 598.It Li EPT.WALK_CYCLES 599.Pq Event 4FH , Umask 10H 600Counts Extended Page walk cycles. 601.It Li L1D.REPL 602.Pq Event 51H , Umask 01H 603Counts the number of lines brought into the L1 data cache. 604Counter 0, 1 only 605.It Li L1D.M_REPL 606.Pq Event 51H , Umask 02H 607Counts the number of modified lines brought into the L1 data cache. 608Counter 0, 1 only 609.It Li L1D.M_EVICT 610.Pq Event 51H , Umask 04H 611Counts the number of modified lines evicted from the L1 data cache due to 612replacement. 613Counter 0, 1 only 614.It Li L1D.M_SNOOP_EVICT 615.Pq Event 51H , Umask 08H 616Counts the number of modified lines evicted from the L1 data cache due to 617snoop HITM intervention. 618Counter 0, 1 only 619.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT 620.Pq Event 52H , Umask 01H 621Counts the number of cacheable load lock speculated instructions accepted 622into the fill buffer. 623.It Li L1D_CACHE_LOCK_FB_HIT 624.Pq Event 53H , Umask 01H 625Counts the number of cacheable load lock speculated or retired instructions 626accepted into the fill buffer. 627.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA 628.Pq Event 60H , Umask 01H 629Counts weighted cycles of offcore demand data read requests. Does not 630include L2 prefetch requests. 631counter 0 632.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE 633.Pq Event 60H , Umask 02H 634Counts weighted cycles of offcore demand code read requests. Does not 635include L2 prefetch requests. 636counter 0 637.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO 638.Pq Event 60H , Umask 04H 639Counts weighted cycles of offcore demand RFO requests. Does not include L2 640prefetch requests. 641counter 0 642.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ 643.Pq Event 60H , Umask 08H 644Counts weighted cycles of offcore read requests of any kind. Include L2 645prefetch requests. 646counter 0 647.It Li CACHE_LOCK_CYCLES.L1D_L2 648.Pq Event 63H , Umask 01H 649Cycle count during which the L1D and L2 are locked. A lock is asserted when 650there is a locked memory access, due to uncacheable memory, a locked 651operation that spans two cache lines, or a page walk from an uncacheable 652page table. 653Counter 0, 1 only. L1D and L2 locks have a very high performance penalty and 654it is highly recommended to avoid such accesses. 655.It Li CACHE_LOCK_CYCLES.L1D 656.Pq Event 63H , Umask 02H 657Counts the number of cycles that cacheline in the L1 data cache unit is 658locked. 659Counter 0, 1 only. 660.It Li IO_TRANSACTIONS 661.Pq Event 6CH , Umask 01H 662Counts the number of completed I/O transactions. 663.It Li L1I.HITS 664.Pq Event 80H , Umask 01H 665Counts all instruction fetches that hit the L1 instruction cache. 666.It Li L1I.MISSES 667.Pq Event 80H , Umask 02H 668Counts all instruction fetches that miss the L1I cache. This includes 669instruction cache misses, streaming buffer misses, victim cache misses and 670uncacheable fetches. An instruction fetch miss is counted only once and not 671once for every cycle it is outstanding. 672.It Li L1I.READS 673.Pq Event 80H , Umask 03H 674Counts all instruction fetches, including uncacheable fetches that bypass 675the L1I. 676.It Li L1I.CYCLES_STALLED 677.Pq Event 80H , Umask 04H 678Cycle counts for which an instruction fetch stalls due to a L1I cache miss, 679ITLB miss or ITLB fault. 680.It Li LARGE_ITLB.HIT 681.Pq Event 82H , Umask 01H 682Counts number of large ITLB hits. 683.It Li ITLB_MISSES.ANY 684.Pq Event 85H , Umask 01H 685Counts the number of misses in all levels of the ITLB which causes a page 686walk. 687.It Li ITLB_MISSES.WALK_COMPLETED 688.Pq Event 85H , Umask 02H 689Counts number of misses in all levels of the ITLB which resulted in a 690completed page walk. 691.It Li ITLB_MISSES.WALK_CYCLES 692.Pq Event 85H , Umask 04H 693Counts ITLB miss page walk cycles. 694.It Li ITLB_MISSES.LARGE_WALK_COMPLETED 695.Pq Event 85H , Umask 80H 696Counts number of completed large page walks due to misses in the STLB. 697.It Li ILD_STALL.LCP 698.Pq Event 87H , Umask 01H 699Cycles Instruction Length Decoder stalls due to length changing prefixes: 70066, 67 or REX.W (for EM64T) instructions which change the length of the 701decoded instruction. 702.It Li ILD_STALL.MRU 703.Pq Event 87H , Umask 02H 704Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU) 705Most Recently Used (MRU) bypass. 706.It Li ILD_STALL.IQ_FULL 707.Pq Event 87H , Umask 04H 708Stall cycles due to a full instruction queue. 709.It Li ILD_STALL.REGEN 710.Pq Event 87H , Umask 08H 711Counts the number of regen stalls. 712.It Li ILD_STALL.ANY 713.Pq Event 87H , Umask 0FH 714Counts any cycles the Instruction Length Decoder is stalled. 715.It Li BR_INST_EXEC.COND 716.Pq Event 88H , Umask 01H 717Counts the number of conditional near branch instructions executed, but not 718necessarily retired. 719.It Li BR_INST_EXEC.DIRECT 720.Pq Event 88H , Umask 02H 721Counts all unconditional near branch instructions excluding calls and 722indirect branches. 723.It Li BR_INST_EXEC.INDIRECT_NON_CALL 724.Pq Event 88H , Umask 04H 725Counts the number of executed indirect near branch instructions that are not 726calls. 727.It Li BR_INST_EXEC.NON_CALLS 728.Pq Event 88H , Umask 07H 729Counts all non call near branch instructions executed, but not necessarily 730retired. 731.It Li BR_INST_EXEC.RETURN_NEAR 732.Pq Event 88H , Umask 08H 733Counts indirect near branches that have a return mnemonic. 734.It Li BR_INST_EXEC.DIRECT_NEAR_CALL 735.Pq Event 88H , Umask 10H 736Counts unconditional near call branch instructions, excluding non call 737branch, executed. 738.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL 739.Pq Event 88H , Umask 20H 740Counts indirect near calls, including both register and memory indirect, 741executed. 742.It Li BR_INST_EXEC.NEAR_CALLS 743.Pq Event 88H , Umask 30H 744Counts all near call branches executed, but not necessarily retired. 745.It Li BR_INST_EXEC.TAKEN 746.Pq Event 88H , Umask 40H 747Counts taken near branches executed, but not necessarily retired. 748.It Li BR_INST_EXEC.ANY 749.Pq Event 88H , Umask 7FH 750Counts all near executed branches (not necessarily retired). This includes 751only instructions and not micro-op branches. Frequent branching is not 752necessarily a major performance issue. However frequent branch 753mispredictions may be a problem. 754.It Li BR_MISP_EXEC.COND 755.Pq Event 89H , Umask 01H 756Counts the number of mispredicted conditional near branch instructions 757executed, but not necessarily retired. 758.It Li BR_MISP_EXEC.DIRECT 759.Pq Event 89H , Umask 02H 760Counts mispredicted macro unconditional near branch instructions, excluding 761calls and indirect branches (should always be 0). 762.It Li BR_MISP_EXEC.INDIRECT_NON_CALL 763.Pq Event 89H , Umask 04H 764Counts the number of executed mispredicted indirect near branch instructions 765that are not calls. 766.It Li BR_MISP_EXEC.NON_CALLS 767.Pq Event 89H , Umask 07H 768Counts mispredicted non call near branches executed, but not necessarily 769retired. 770.It Li BR_MISP_EXEC.RETURN_NEAR 771.Pq Event 89H , Umask 08H 772Counts mispredicted indirect branches that have a rear return mnemonic. 773.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL 774.Pq Event 89H , Umask 10H 775Counts mispredicted non-indirect near calls executed, (should always be 0). 776.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL 777.Pq Event 89H , Umask 20H 778Counts mispredicted indirect near calls exeucted, including both register 779and memory indirect. 780.It Li BR_MISP_EXEC.NEAR_CALLS 781.Pq Event 89H , Umask 30H 782Counts all mispredicted near call branches executed, but not necessarily 783retired. 784.It Li BR_MISP_EXEC.TAKEN 785.Pq Event 89H , Umask 40H 786Counts executed mispredicted near branches that are taken, but not 787necessarily retired. 788.It Li BR_MISP_EXEC.ANY 789.Pq Event 89H , Umask 7FH 790Counts the number of mispredicted near branch instructions that were 791executed, but not necessarily retired. 792.It Li RESOURCE_STALLS.ANY 793.Pq Event A2H , Umask 01H 794Counts the number of Allocator resource related stalls. Includes register 795renaming buffer entries, memory buffer entries. In addition to resource 796related stalls, this event counts some other events. Includes stalls arising 797during branch misprediction recovery, such as if retirement of the 798mispredicted branch is delayed and stalls arising while store buffer is 799draining from synchronizing operations. 800Does not include stalls due to SuperQ (off core) queue full, too many cache 801misses, etc. 802.It Li RESOURCE_STALLS.LOAD 803.Pq Event A2H , Umask 02H 804Counts the cycles of stall due to lack of load buffer for load operation. 805.It Li RESOURCE_STALLS.RS_FULL 806.Pq Event A2H , Umask 04H 807This event counts the number of cycles when the number of instructions in 808the pipeline waiting for execution reaches the limit the processor can 809handle. A high count of this event indicates that there are long latency 810operations in the pipe (possibly load and store operations that miss the L2 811cache, or instructions dependent upon instructions further down the pipeline 812that have yet to retire. 813When RS is full, new instructions can not enter the reservation station and 814start execution. 815.It Li RESOURCE_STALLS.STORE 816.Pq Event A2H , Umask 08H 817This event counts the number of cycles that a resource related stall will 818occur due to the number of store instructions reaching the limit of the 819pipeline, (i.e. all store buffers are used). The stall ends when a store 820instruction commits its data to the cache or memory. 821.It Li RESOURCE_STALLS.ROB_FULL 822.Pq Event A2H , Umask 10H 823Counts the cycles of stall due to re- order buffer full. 824.It Li RESOURCE_STALLS.FPCW 825.Pq Event A2H , Umask 20H 826Counts the number of cycles while execution was stalled due to writing the 827floating-point unit (FPU) control word. 828.It Li RESOURCE_STALLS.MXCSR 829.Pq Event A2H , Umask 40H 830Stalls due to the MXCSR register rename occurring to close to a previous 831MXCSR rename. The MXCSR provides control and status for the MMX registers. 832.It Li RESOURCE_STALLS.OTHER 833.Pq Event A2H , Umask 80H 834Counts the number of cycles while execution was stalled due to other 835resource issues. 836.It Li MACRO_INSTS.FUSIONS_DECODED 837.Pq Event A6H , Umask 01H 838Counts the number of instructions decoded that are macro-fused but not 839necessarily executed or retired. 840.It Li BACLEAR_FORCE_IQ 841.Pq Event A7H , Umask 01H 842Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ 843is also responsible for providing conditional branch prediction direction 844based on a static scheme and dynamic data provided by the L2 Branch 845Prediction Unit. If the conditional branch target is not found in the Target 846Array and the IQ predicts that the branch is taken, then the IQ will force 847the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by 848the BAC generates approximately an 8 cycle bubble in the instruction fetch 849pipeline. 850.It Li LSD.UOPS 851.Pq Event A8H , Umask 01H 852Counts the number of micro-ops delivered by loop stream detector 853Use cmask=1 and invert to count cycles 854.It Li ITLB_FLUSH 855.Pq Event AEH , Umask 01H 856Counts the number of ITLB flushes 857.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA 858.Pq Event B0H , Umask 01H 859Counts number of offcore demand data read requests. Does not count L2 860prefetch requests. 861.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE 862.Pq Event B0H , Umask 02H 863Counts number of offcore demand code read requests. Does not count L2 864prefetch requests. 865.It Li OFFCORE_REQUESTS.DEMAND.RFO 866.Pq Event B0H , Umask 04H 867Counts number of offcore demand RFO requests. Does not count L2 prefetch 868requests. 869.It Li OFFCORE_REQUESTS.ANY.READ 870.Pq Event B0H , Umask 08H 871Counts number of offcore read requests. Includes L2 prefetch requests. 872.It Li OFFCORE_REQUESTS.ANY.RFO 873.Pq Event 80H , Umask 10H 874Counts number of offcore RFO requests. Includes L2 prefetch requests. 875.It Li OFFCORE_REQUESTS.L1D_WRITEBACK 876.Pq Event B0H , Umask 40H 877Counts number of L1D writebacks to the uncore. 878.It Li OFFCORE_REQUESTS.ANY 879.Pq Event B0H , Umask 80H 880Counts all offcore requests. 881.It Li UOPS_EXECUTED.PORT0 882.Pq Event B1H , Umask 01H 883Counts number of Uops executed that were issued on port 0. Port 0 handles 884integer arithmetic, SIMD and FP add Uops. 885.It Li UOPS_EXECUTED.PORT1 886.Pq Event B1H , Umask 02H 887Counts number of Uops executed that were issued on port 1. Port 1 handles 888integer arithmetic, SIMD, integer shift, FP multiply and FP divide Uops. 889.It Li UOPS_EXECUTED.PORT2_CORE 890.Pq Event B1H , Umask 04H 891Counts number of Uops executed that were issued on port 2. Port 2 handles 892the load Uops. This is a core count only and can not be collected per 893thread. 894.It Li UOPS_EXECUTED.PORT3_CORE 895.Pq Event B1H , Umask 08H 896Counts number of Uops executed that were issued on port 3. Port 3 handles 897store Uops. This is a core count only and can not be collected per thread. 898.It Li UOPS_EXECUTED.PORT4_CORE 899.Pq Event B1H , Umask 10H 900Counts number of Uops executed that where issued on port 4. Port 4 handles 901the value to be stored for the store Uops issued on port 3. This is a core 902count only and can not be collected per thread. 903.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5 904.Pq Event B1H , Umask 1FH 905Counts number of cycles there are one or more uops being executed and were 906issued on ports 0-4. This is a core count only and can not be collected per 907thread. 908.It Li UOPS_EXECUTED.PORT5 909.Pq Event B1H , Umask 20H 910Counts number of Uops executed that where issued on port 5. 911.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES 912.Pq Event B1H , Umask 3FH 913Counts number of cycles there are one or more uops being executed on any 914ports. This is a core count only and can not be collected per thread. 915.It Li UOPS_EXECUTED.PORT015 916.Pq Event B1H , Umask 40H 917Counts number of Uops executed that where issued on port 0, 1, or 5. 918use cmask=1, invert=1 to count stall cycles 919.It Li UOPS_EXECUTED.PORT234 920.Pq Event B1H , Umask 80H 921Counts number of Uops executed that where issued on port 2, 3, or 4. 922.It Li OFFCORE_REQUESTS_SQ_FULL 923.Pq Event B2H , Umask 01H 924Counts number of cycles the SQ is full to handle off-core requests. 925.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA 926.Pq Event B3H , Umask 01H 927Counts weighted cycles of snoopq requests for data. Counter 0 only 928Use cmask=1 to count cycles not empty. 929.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE 930.Pq Event B3H , Umask 02H 931Counts weighted cycles of snoopq invalidate requests. Counter 0 only 932Use cmask=1 to count cycles not empty. 933.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE 934.Pq Event B3H , Umask 04H 935Counts weighted cycles of snoopq requests for code. Counter 0 only 936Use cmask=1 to count cycles not empty. 937.It Li SNOOPQ_REQUESTS.CODE 938.Pq Event B4H , Umask 01H 939Counts the number of snoop code requests 940.It Li SNOOPQ_REQUESTS.DATA 941.Pq Event B4H , Umask 02H 942Counts the number of snoop data requests 943.It Li SNOOPQ_REQUESTS.INVALIDATE 944.Pq Event B4H , Umask 04H 945Counts the number of snoop invalidate requests 946.It Li OFF_CORE_RESPONSE_0 947.Pq Event B7H , Umask 01H 948see Section 30.6.1.3, Off-core Response Performance Monitoring in the 949Processor Core. 950Requires programming MSR 01A6H 951.It Li SNOOP_RESPONSE.HIT 952.Pq Event B8H , Umask 01H 953Counts HIT snoop response sent by this thread in response to a snoop 954request. 955.It Li SNOOP_RESPONSE.HITE 956.Pq Event B8H , Umask 02H 957Counts HIT E snoop response sent by this thread in response to a snoop 958request. 959.It Li SNOOP_RESPONSE.HITM 960.Pq Event B8H , Umask 04H 961Counts HIT M snoop response sent by this thread in response to a snoop 962request. 963.It Li OFF_CORE_RESPONSE_1 964.Pq Event BBH , Umask 01H 965see Section 30.6.1.3, Off-core Response Performance Monitoring in the 966Processor Core 967Use MSR 01A7H 968.It Li INST_RETIRED.ANY_P 969.Pq Event C0H , Umask 01H 970See Table A-1 971Notes: INST_RETIRED.ANY is counted by a designated fixed counter. 972INST_RETIRED.ANY_P is counted by a programmable counter and is an 973architectural performance event. Event is supported if CPUID.A.EBX[1] = 0. 974Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not 975count as retired instructions. 976.It Li INST_RETIRED.X87 977.Pq Event C0H , Umask 02H 978Counts the number of floating point computational operations retired: 979floating point computational operations executed by the assist handler and 980sub-operations of complex floating point instructions like transcendental 981instructions. 982.It Li INST_RETIRED.MMX 983.Pq Event C0H , Umask 04H 984Counts the number of retired: MMX instructions. 985.It Li UOPS_RETIRED.ANY 986.Pq Event C2H , Umask 01H 987Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2, 988others=1; maximum count of 8 per cycle). Most instructions are composed of 989one or two micro-ops. Some instructions are decoded into longer sequences 990such as repeat instructions, floating point transcendental instructions, and 991assists. 992Use cmask=1 and invert to count active cycles or stalled cycles 993.It Li UOPS_RETIRED.RETIRE_SLOTS 994.Pq Event C2H , Umask 02H 995Counts the number of retirement slots used each cycle 996.It Li UOPS_RETIRED.MACRO_FUSED 997.Pq Event C2H , Umask 04H 998Counts number of macro-fused uops retired. 999.It Li MACHINE_CLEARS.CYCLES 1000.Pq Event C3H , Umask 01H 1001Counts the cycles machine clear is asserted. 1002.It Li MACHINE_CLEARS.MEM_ORDER 1003.Pq Event C3H , Umask 02H 1004Counts the number of machine clears due to memory order conflicts. 1005.It Li MACHINE_CLEARS.SMC 1006.Pq Event C3H , Umask 04H 1007Counts the number of times that a program writes to a code section. 1008Self-modifying code causes a sever penalty in all Intel 64 and IA-32 1009processors. The modified cache line is written back to the L2 and L3caches. 1010.It Li BR_INST_RETIRED.ALL_BRANCHES 1011.Pq Event C4H , Umask 00H 1012See Table A-1 1013.It Li BR_INST_RETIRED.CONDITIONAL 1014.Pq Event C4H , Umask 01H 1015Counts the number of conditional branch instructions retired. 1016.It Li BR_INST_RETIRED.NEAR_CALL 1017.Pq Event C4H , Umask 02H 1018Counts the number of direct & indirect near unconditional calls retired 1019.It Li BR_INST_RETIRED.ALL_BRANCHES 1020.Pq Event C4H , Umask 04H 1021Counts the number of branch instructions retired 1022.It Li BR_MISP_RETIRED.ALL_BRANCHES 1023.Pq Event C5H , Umask 00H 1024See Table A-1 1025.It Li BR_MISP_RETIRED.CONDITIONAL 1026.Pq Event C5H , Umask 01H 1027Counts mispredicted conditional retired calls. 1028.It Li BR_MISP_RETIRED.NEAR_CALL 1029.Pq Event C5H , Umask 02H 1030Counts mispredicted direct & indirect near unconditional retired calls. 1031.It Li BR_MISP_RETIRED.ALL_BRANCHES 1032.Pq Event C5H , Umask 04H 1033Counts all mispredicted retired calls. 1034.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE 1035.Pq Event C7H , Umask 01H 1036Counts SIMD packed single-precision floating point Uops retired. 1037.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE 1038.Pq Event C7H , Umask 02H 1039Counts SIMD calar single-precision floating point Uops retired. 1040.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE 1041.Pq Event C7H , Umask 04H 1042Counts SIMD packed double- precision floating point Uops retired. 1043.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE 1044.Pq Event C7H , Umask 08H 1045Counts SIMD scalar double-precision floating point Uops retired. 1046.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER 1047.Pq Event C7H , Umask 10H 1048Counts 128-bit SIMD vector integer Uops retired. 1049.It Li ITLB_MISS_RETIRED 1050.Pq Event C8H , Umask 20H 1051Counts the number of retired instructions that missed the ITLB when the 1052instruction was fetched. 1053.It Li MEM_LOAD_RETIRED.L1D_HIT 1054.Pq Event CBH , Umask 01H 1055Counts number of retired loads that hit the L1 data cache. 1056.It Li MEM_LOAD_RETIRED.L2_HIT 1057.Pq Event CBH , Umask 02H 1058Counts number of retired loads that hit the L2 data cache. 1059.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT 1060.Pq Event CBH , Umask 04H 1061Counts number of retired loads that hit their own, unshared lines in the L3 1062cache. 1063.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM 1064.Pq Event CBH , Umask 08H 1065Counts number of retired loads that hit in a sibling core's L2 (on die 1066core). Since the L3 is inclusive of all cores on the package, this is an L3 1067hit. This counts both clean or modified hits. 1068.It Li MEM_LOAD_RETIRED.L3_MISS 1069.Pq Event CBH , Umask 10H 1070Counts number of retired loads that miss the L3 cache. The load was 1071satisfied by a remote socket, local memory or an IOH. 1072.It Li MEM_LOAD_RETIRED.HIT_LFB 1073.Pq Event CBH , Umask 40H 1074Counts number of retired loads that miss the L1D and the address is located 1075in an allocated line fill buffer and will soon be committed to cache. This 1076is counting secondary L1D misses. 1077.It Li MEM_LOAD_RETIRED.DTLB_MISS 1078.Pq Event CBH , Umask 80H 1079Counts the number of retired loads that missed the DTLB. The DTLB miss is 1080not counted if the load operation causes a fault. This event counts loads 1081from cacheable memory only. The event does not count loads by software 1082prefetches. Counts both primary and secondary misses to the TLB. 1083.It Li FP_MMX_TRANS.TO_FP 1084.Pq Event CCH , Umask 01H 1085Counts the first floating-point instruction following any MMX instruction. 1086You can use this event to estimate the penalties for the transitions between 1087floating-point and MMX technology states. 1088.It Li FP_MMX_TRANS.TO_MMX 1089.Pq Event CCH , Umask 02H 1090Counts the first MMX instruction following a floating-point instruction. You 1091can use this event to estimate the penalties for the transitions between 1092floating-point and MMX technology states. 1093.It Li FP_MMX_TRANS.ANY 1094.Pq Event CCH , Umask 03H 1095Counts all transitions from floating point to MMX instructions and from MMX 1096instructions to floating point instructions. You can use this event to 1097estimate the penalties for the transitions between floating-point and MMX 1098technology states. 1099.It Li MACRO_INSTS.DECODED 1100.Pq Event D0H , Umask 01H 1101Counts the number of instructions decoded, (but not necessarily executed or 1102retired). 1103.It Li UOPS_DECODED.STALL_CYCLES 1104.Pq Event D1H , Umask 01H 1105Counts the cycles of decoder stalls. 1106.It Li UOPS_DECODED.MS 1107.Pq Event D1H , Umask 02H 1108Counts the number of Uops decoded by the Microcode Sequencer, MS. The MS 1109delivers uops when the instruction is more than 4 uops long or a microcode 1110assist is occurring. 1111.It Li UOPS_DECODED.ESP_FOLDING 1112.Pq Event D1H , Umask 04H 1113Counts number of stack pointer (ESP) instructions decoded: push , pop , call 1114, ret, etc. ESP instructions do not generate a Uop to increment or decrement 1115ESP. Instead, they update an ESP_Offset register that keeps track of the 1116delta to the current value of the ESP register. 1117.It Li UOPS_DECODED.ESP_SYNC 1118.Pq Event D1H , Umask 08H 1119Counts number of stack pointer (ESP) sync operations where an ESP 1120instruction is corrected by adding the ESP offset register to the current 1121value of the ESP register. 1122.It Li RAT_STALLS.FLAGS 1123.Pq Event D2H , Umask 01H 1124Counts the number of cycles during which execution stalled due to several 1125reasons, one of which is a partial flag register stall. A partial register 1126stall may occur when two conditions are met: 1) an instruction modifies 1127some, but not all, of the flags in the flag register and 2) the next 1128instruction, which depends on flags, depends on flags that were not modified 1129by this instruction. 1130.It Li RAT_STALLS.REGISTERS 1131.Pq Event D2H , Umask 02H 1132This event counts the number of cycles instruction execution latency became 1133longer than the defined latency because the instruction used a register that 1134was partially written by previous instruction. 1135.It Li RAT_STALLS.ROB_READ_PORT 1136.Pq Event D2H , Umask 04H 1137Counts the number of cycles when ROB read port stalls occurred, which did 1138not allow new micro-ops to enter the out-of-order pipeline. Note that, at 1139this stage in the pipeline, additional stalls may occur at the same cycle 1140and prevent the stalled micro-ops from entering the pipe. In such a case, 1141micro-ops retry entering the execution pipe in the next cycle and the 1142ROB-read port stall is counted again. 1143.It Li RAT_STALLS.SCOREBOARD 1144.Pq Event D2H , Umask 08H 1145Counts the cycles where we stall due to microarchitecturally required 1146serialization. Microcode scoreboarding stalls. 1147.It Li RAT_STALLS.ANY 1148.Pq Event D2H , Umask 0FH 1149Counts all Register Allocation Table stall cycles due to: Cycles when ROB 1150read port stalls occurred, which did not allow new micro-ops to enter the 1151execution pipe. Cycles when partial register stalls occurred Cycles when 1152flag stalls occurred Cycles floating-point unit (FPU) status word stalls 1153occurred. To count each of these conditions separately use the events: 1154RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and 1155RAT_STALLS.FPSW. 1156.It Li SEG_RENAME_STALLS 1157.Pq Event D4H , Umask 01H 1158Counts the number of stall cycles due to the lack of renaming resources for 1159the ES, DS, FS, and GS segment registers. If a segment is renamed but not 1160retired and a second update to the same segment occurs, a stall occurs in 1161the front- end of the pipeline until the renamed segment retires. 1162.It Li ES_REG_RENAMES 1163.Pq Event D5H , Umask 01H 1164Counts the number of times the ES segment register is renamed. 1165.It Li UOP_UNFUSION 1166.Pq Event DBH , Umask 01H 1167Counts unfusion events due to floating point exception to a fused uop. 1168.It Li BR_INST_DECODED 1169.Pq Event E0H , Umask 01H 1170Counts the number of branch instructions decoded. 1171.It Li BPU_MISSED_CALL_RET 1172.Pq Event E5H , Umask 01H 1173Counts number of times the Branch Prediction Unit missed predicting a call 1174or return branch. 1175.It Li BACLEAR.CLEAR 1176.Pq Event E6H , Umask 01H 1177Counts the number of times the front end is resteered, mainly when the 1178Branch Prediction Unit cannot provide a correct prediction and this is 1179corrected by the Branch Address Calculator at the front end. This can occur 1180if the code has many branches such that they cannot be consumed by the BPU. 1181Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble 1182in the instruction fetch pipeline. The effect on total execution time 1183depends on the surrounding code. 1184.It Li BACLEAR.BAD_TARGET 1185.Pq Event E6H , Umask 02H 1186Counts number of Branch Address Calculator clears (BACLEAR) asserted due to 1187conditional branch instructions in which there was a target hit but the 1188direction was wrong. Each BACLEAR asserted by the BAC generates 1189approximately an 8 cycle bubble in the instruction fetch pipeline. 1190.It Li BPU_CLEARS.EARLY 1191.Pq Event E8H , Umask 01H 1192Counts early (normal) Branch Prediction Unit clears: BPU predicted a taken 1193branch after incorrectly assuming that it was not taken. 1194The BPU clear leads to 2 cycle bubble in the Front End. 1195.It Li BPU_CLEARS.LATE 1196.Pq Event E8H , Umask 02H 1197Counts late Branch Prediction Unit clears due to Most Recently Used 1198conflicts. The PBU clear leads to a 3 cycle bubble in the Front End. 1199.It Li THREAD_ACTIVE 1200.Pq Event ECH , Umask 01H 1201Counts cycles threads are active. 1202.It Li L2_TRANSACTIONS.LOAD 1203.Pq Event F0H , Umask 01H 1204Counts L2 load operations due to HW prefetch or demand loads. 1205.It Li L2_TRANSACTIONS.RFO 1206.Pq Event F0H , Umask 02H 1207Counts L2 RFO operations due to HW prefetch or demand RFOs. 1208.It Li L2_TRANSACTIONS.IFETCH 1209.Pq Event F0H , Umask 04H 1210Counts L2 instruction fetch operations due to HW prefetch or demand ifetch. 1211.It Li L2_TRANSACTIONS.PREFETCH 1212.Pq Event F0H , Umask 08H 1213Counts L2 prefetch operations. 1214.It Li L2_TRANSACTIONS.L1D_WB 1215.Pq Event F0H , Umask 10H 1216Counts L1D writeback operations to the L2. 1217.It Li L2_TRANSACTIONS.FILL 1218.Pq Event F0H , Umask 20H 1219Counts L2 cache line fill operations due to load, RFO, L1D writeback or 1220prefetch. 1221.It Li L2_TRANSACTIONS.WB 1222.Pq Event F0H , Umask 40H 1223Counts L2 writeback operations to the L3. 1224.It Li L2_TRANSACTIONS.ANY 1225.Pq Event F0H , Umask 80H 1226Counts all L2 cache operations. 1227.It Li L2_LINES_IN.S_STATE 1228.Pq Event F1H , Umask 02H 1229Counts the number of cache lines allocated in the L2 cache in the S (shared) 1230state. 1231.It Li L2_LINES_IN.E_STATE 1232.Pq Event F1H , Umask 04H 1233Counts the number of cache lines allocated in the L2 cache in the E 1234(exclusive) state. 1235.It Li L2_LINES_IN.ANY 1236.Pq Event F1H , Umask 07H 1237Counts the number of cache lines allocated in the L2 cache. 1238.It Li L2_LINES_OUT.DEMAND_CLEAN 1239.Pq Event F2H , Umask 01H 1240Counts L2 clean cache lines evicted by a demand request. 1241.It Li L2_LINES_OUT.DEMAND_DIRTY 1242.Pq Event F2H , Umask 02H 1243Counts L2 dirty (modified) cache lines evicted by a demand request. 1244.It Li L2_LINES_OUT.PREFETCH_CLEAN 1245.Pq Event F2H , Umask 04H 1246Counts L2 clean cache line evicted by a prefetch request. 1247.It Li L2_LINES_OUT.PREFETCH_DIRTY 1248.Pq Event F2H , Umask 08H 1249Counts L2 modified cache line evicted by a prefetch request. 1250.It Li L2_LINES_OUT.ANY 1251.Pq Event F2H , Umask 0FH 1252Counts all L2 cache lines evicted for any reason. 1253.It Li SQ_MISC.LRU_HINTS 1254.Pq Event F4H , Umask 04H 1255Counts number of Super Queue LRU hints sent to L3. 1256.It Li SQ_MISC.SPLIT_LOCK 1257.Pq Event F4H , Umask 10H 1258Counts the number of SQ lock splits across a cache line. 1259.It Li SQ_FULL_STALL_CYCLES 1260.Pq Event F6H , Umask 01H 1261Counts cycles the Super Queue is full. Neither of the threads on this core 1262will be able to access the uncore. 1263.It Li FP_ASSIST.ALL 1264.Pq Event F7H , Umask 01H 1265Counts the number of floating point operations executed that required 1266micro-code assist intervention. Assists are required in the following cases: 1267SSE instructions, (Denormal input when the DAZ flag is off or Underflow 1268result when the FTZ flag is off): x87 instructions, (NaN or denormal are 1269loaded to a register or used as input from memory, Division by 0 or 1270Underflow output). 1271.It Li FP_ASSIST.OUTPUT 1272.Pq Event F7H , Umask 02H 1273Counts number of floating point micro-code assist when the output value 1274(destination register) is invalid. 1275.It Li FP_ASSIST.INPUT 1276.Pq Event F7H , Umask 04H 1277Counts number of floating point micro-code assist when the input value (one 1278of the source operands to an FP instruction) is invalid. 1279.It Li SIMD_INT_64.PACKED_MPY 1280.Pq Event FDH , Umask 01H 1281Counts number of SID integer 64 bit packed multiply operations. 1282.It Li SIMD_INT_64.PACKED_SHIFT 1283.Pq Event FDH , Umask 02H 1284Counts number of SID integer 64 bit packed shift operations. 1285.It Li SIMD_INT_64.PACK 1286.Pq Event FDH , Umask 04H 1287Counts number of SID integer 64 bit pack operations. 1288.It Li SIMD_INT_64.UNPACK 1289.Pq Event FDH , Umask 08H 1290Counts number of SID integer 64 bit unpack operations. 1291.It Li SIMD_INT_64.PACKED_LOGICAL 1292.Pq Event FDH , Umask 10H 1293Counts number of SID integer 64 bit logical operations. 1294.It Li SIMD_INT_64.PACKED_ARITH 1295.Pq Event FDH , Umask 20H 1296Counts number of SID integer 64 bit arithmetic operations. 1297.It Li SIMD_INT_64.SHUFFLE_MOVE 1298.Pq Event FDH , Umask 40H 1299Counts number of SID integer 64 bit shift or move operations. 1300.El 1301.Sh SEE ALSO 1302.Xr pmc 3 , 1303.Xr pmc.atom 3 , 1304.Xr pmc.core 3 , 1305.Xr pmc.iaf 3 , 1306.Xr pmc.ucf 3 , 1307.Xr pmc.k7 3 , 1308.Xr pmc.k8 3 , 1309.Xr pmc.p4 3 , 1310.Xr pmc.p5 3 , 1311.Xr pmc.p6 3 , 1312.Xr pmc.corei7 3 , 1313.Xr pmc.corei7uc 3 , 1314.Xr pmc.westmereuc 3 , 1315.Xr pmc.tsc 3 , 1316.Xr pmc_cpuinfo 3 , 1317.Xr pmclog 3 , 1318.Xr hwpmc 4 1319.Sh HISTORY 1320The 1321.Nm pmc 1322library first appeared in 1323.Fx 6.0 . 1324.Sh AUTHORS 1325The 1326.Lb libpmc 1327library was written by 1328.An "Joseph Koshy" 1329.Aq jkoshy@FreeBSD.org . 1330