11fa7f10bSFabien Thomas.\" Copyright (c) 2010 Fabien Thomas. All rights reserved. 21fa7f10bSFabien Thomas.\" 31fa7f10bSFabien Thomas.\" Redistribution and use in source and binary forms, with or without 41fa7f10bSFabien Thomas.\" modification, are permitted provided that the following conditions 51fa7f10bSFabien Thomas.\" are met: 61fa7f10bSFabien Thomas.\" 1. Redistributions of source code must retain the above copyright 71fa7f10bSFabien Thomas.\" notice, this list of conditions and the following disclaimer. 81fa7f10bSFabien Thomas.\" 2. Redistributions in binary form must reproduce the above copyright 91fa7f10bSFabien Thomas.\" notice, this list of conditions and the following disclaimer in the 101fa7f10bSFabien Thomas.\" documentation and/or other materials provided with the distribution. 111fa7f10bSFabien Thomas.\" 12026dbd29SChristian Brueffer.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13026dbd29SChristian Brueffer.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14026dbd29SChristian Brueffer.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15026dbd29SChristian Brueffer.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16026dbd29SChristian Brueffer.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17026dbd29SChristian Brueffer.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18026dbd29SChristian Brueffer.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19026dbd29SChristian Brueffer.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20026dbd29SChristian Brueffer.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21026dbd29SChristian Brueffer.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22026dbd29SChristian Brueffer.\" SUCH DAMAGE. 231fa7f10bSFabien Thomas.\" 243102cfe2SGlen Barber.Dd February 25, 2012 251fa7f10bSFabien Thomas.Dt PMC.WESTMERE 3 26aa12cea2SUlrich Spörlein.Os 271fa7f10bSFabien Thomas.Sh NAME 281fa7f10bSFabien Thomas.Nm pmc.westmere 291fa7f10bSFabien Thomas.Nd measurement events for 301fa7f10bSFabien Thomas.Tn Intel 311fa7f10bSFabien Thomas.Tn Westmere 321fa7f10bSFabien Thomasfamily CPUs 331fa7f10bSFabien Thomas.Sh LIBRARY 341fa7f10bSFabien Thomas.Lb libpmc 351fa7f10bSFabien Thomas.Sh SYNOPSIS 361fa7f10bSFabien Thomas.In pmc.h 371fa7f10bSFabien Thomas.Sh DESCRIPTION 381fa7f10bSFabien Thomas.Tn Intel 391fa7f10bSFabien Thomas.Tn "Westmere" 401fa7f10bSFabien ThomasCPUs contain PMCs conforming to version 2 of the 411fa7f10bSFabien Thomas.Tn Intel 421fa7f10bSFabien Thomasperformance measurement architecture. 431fa7f10bSFabien ThomasThese CPUs may contain up to three classes of PMCs: 441fa7f10bSFabien Thomas.Bl -tag -width "Li PMC_CLASS_IAP" 451fa7f10bSFabien Thomas.It Li PMC_CLASS_IAF 461fa7f10bSFabien ThomasFixed-function counters that count only one hardware event per counter. 471fa7f10bSFabien Thomas.It Li PMC_CLASS_IAP 481fa7f10bSFabien ThomasProgrammable counters that may be configured to count one of a defined 491fa7f10bSFabien Thomasset of hardware events. 501fa7f10bSFabien Thomas.El 511fa7f10bSFabien Thomas.Pp 521fa7f10bSFabien ThomasThe number of PMCs available in each class and their widths need to be 531fa7f10bSFabien Thomasdetermined at run time by calling 541fa7f10bSFabien Thomas.Xr pmc_cpuinfo 3 . 551fa7f10bSFabien Thomas.Pp 561fa7f10bSFabien ThomasIntel Westmere PMCs are documented in 571fa7f10bSFabien Thomas.Rs 581fa7f10bSFabien Thomas.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual" 591fa7f10bSFabien Thomas.%T "Volume 3B: System Programming Guide, Part 2" 601fa7f10bSFabien Thomas.%N "Order Number: 253669-033US" 611fa7f10bSFabien Thomas.%D December 2009 621fa7f10bSFabien Thomas.%Q "Intel Corporation" 631fa7f10bSFabien Thomas.Re 641fa7f10bSFabien Thomas.Ss WESTMERE FIXED FUNCTION PMCS 651fa7f10bSFabien ThomasThese PMCs and their supported events are documented in 661fa7f10bSFabien Thomas.Xr pmc.iaf 3 . 671fa7f10bSFabien Thomas.Ss WESTMERE PROGRAMMABLE PMCS 681fa7f10bSFabien ThomasThe programmable PMCs support the following capabilities: 691fa7f10bSFabien Thomas.Bl -column "PMC_CAP_INTERRUPT" "Support" 701fa7f10bSFabien Thomas.It Em Capability Ta Em Support 711fa7f10bSFabien Thomas.It PMC_CAP_CASCADE Ta \&No 721fa7f10bSFabien Thomas.It PMC_CAP_EDGE Ta Yes 731fa7f10bSFabien Thomas.It PMC_CAP_INTERRUPT Ta Yes 741fa7f10bSFabien Thomas.It PMC_CAP_INVERT Ta Yes 751fa7f10bSFabien Thomas.It PMC_CAP_READ Ta Yes 761fa7f10bSFabien Thomas.It PMC_CAP_PRECISE Ta \&No 771fa7f10bSFabien Thomas.It PMC_CAP_SYSTEM Ta Yes 781fa7f10bSFabien Thomas.It PMC_CAP_TAGGING Ta \&No 791fa7f10bSFabien Thomas.It PMC_CAP_THRESHOLD Ta Yes 801fa7f10bSFabien Thomas.It PMC_CAP_USER Ta Yes 811fa7f10bSFabien Thomas.It PMC_CAP_WRITE Ta Yes 821fa7f10bSFabien Thomas.El 831fa7f10bSFabien Thomas.Ss Event Qualifiers 841fa7f10bSFabien ThomasEvent specifiers for these PMCs support the following common 851fa7f10bSFabien Thomasqualifiers: 861fa7f10bSFabien Thomas.Bl -tag -width indent 871fa7f10bSFabien Thomas.It Li rsp= Ns Ar value 881fa7f10bSFabien ThomasConfigure the Off-core Response bits. 891fa7f10bSFabien Thomas.Bl -tag -width indent 901fa7f10bSFabien Thomas.It Li DMND_DATA_RD 911fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch data reads of full 921fa7f10bSFabien Thomasand partial cachelines as well as demand data page table entry 93bb374ac2SGlen Barbercacheline reads. 94bb374ac2SGlen BarberDoes not count L2 data read prefetches or 951fa7f10bSFabien Thomasinstruction fetches. 961fa7f10bSFabien Thomas.It Li DMND_RFO 971fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch reads for ownership 98bb374ac2SGlen Barber(RFO) requests generated by a write to data cacheline. 99bb374ac2SGlen BarberDoes not count L2 RFO. 1001fa7f10bSFabien Thomas.It Li DMND_IFETCH 1011fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch instruction cacheline 102bb374ac2SGlen Barberreads. 103bb374ac2SGlen BarberDoes not count L2 code read prefetches. 1041fa7f10bSFabien ThomasWB 1051fa7f10bSFabien ThomasCounts the number of writeback (modified to exclusive) transactions. 1061fa7f10bSFabien Thomas.It Li PF_DATA_RD 1071fa7f10bSFabien ThomasCounts the number of data cacheline reads generated by L2 prefetchers. 1081fa7f10bSFabien Thomas.It Li PF_RFO 1091fa7f10bSFabien ThomasCounts the number of RFO requests generated by L2 prefetchers. 1101fa7f10bSFabien Thomas.It Li PF_IFETCH 1111fa7f10bSFabien ThomasCounts the number of code reads generated by L2 prefetchers. 1121fa7f10bSFabien Thomas.It Li OTHER 1131fa7f10bSFabien ThomasCounts one of the following transaction types, including L3 invalidate, 1141fa7f10bSFabien ThomasI/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences, 1151fa7f10bSFabien Thomaslock, unlock, split lock. 1161fa7f10bSFabien Thomas.It Li UNCORE_HIT 1171fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore 1181fa7f10bSFabien Thomaswith no coherency actions required (snooping). 1191fa7f10bSFabien Thomas.It Li OTHER_CORE_HIT_SNP 1201fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore 1211fa7f10bSFabien Thomasand was serviced by another core with a cross core snoop where no modified 1221fa7f10bSFabien Thomascopies were found (clean). 1231fa7f10bSFabien Thomas.It Li OTHER_CORE_HITM 1241fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore 1251fa7f10bSFabien Thomasand was serviced by another core with a cross core snoop where modified 1261fa7f10bSFabien Thomascopies were found (HITM). 1271fa7f10bSFabien Thomas.It Li REMOTE_CACHE_FWD 1281fa7f10bSFabien ThomasL3 Miss: local homed requests that missed the L3 cache and was serviced 1291fa7f10bSFabien Thomasby forwarded data following a cross package snoop where no modified 1301fa7f10bSFabien Thomascopies found. (Remote home requests are not counted) 1311fa7f10bSFabien Thomas.It Li REMOTE_DRAM 1321fa7f10bSFabien ThomasL3 Miss: remote home requests that missed the L3 cache and were serviced 1331fa7f10bSFabien Thomasby remote DRAM. 1341fa7f10bSFabien Thomas.It Li LOCAL_DRAM 1351fa7f10bSFabien ThomasL3 Miss: local home requests that missed the L3 cache and were serviced 1361fa7f10bSFabien Thomasby local DRAM. 1371fa7f10bSFabien Thomas.It Li NON_DRAM 1381fa7f10bSFabien ThomasNon-DRAM requests that were serviced by IOH. 1391fa7f10bSFabien Thomas.El 1401fa7f10bSFabien Thomas.It Li cmask= Ns Ar value 1411fa7f10bSFabien ThomasConfigure the PMC to increment only if the number of configured 1421fa7f10bSFabien Thomasevents measured in a cycle is greater than or equal to 1431fa7f10bSFabien Thomas.Ar value . 1441fa7f10bSFabien Thomas.It Li edge 1451fa7f10bSFabien ThomasConfigure the PMC to count the number of de-asserted to asserted 1461fa7f10bSFabien Thomastransitions of the conditions expressed by the other qualifiers. 1471fa7f10bSFabien ThomasIf specified, the counter will increment only once whenever a 1481fa7f10bSFabien Thomascondition becomes true, irrespective of the number of clocks during 1491fa7f10bSFabien Thomaswhich the condition remains true. 1501fa7f10bSFabien Thomas.It Li inv 1511fa7f10bSFabien ThomasInvert the sense of comparison when the 1521fa7f10bSFabien Thomas.Dq Li cmask 1531fa7f10bSFabien Thomasqualifier is present, making the counter increment when the number of 1541fa7f10bSFabien Thomasevents per cycle is less than the value specified by the 1551fa7f10bSFabien Thomas.Dq Li cmask 1561fa7f10bSFabien Thomasqualifier. 1571fa7f10bSFabien Thomas.It Li os 1581fa7f10bSFabien ThomasConfigure the PMC to count events happening at processor privilege 1591fa7f10bSFabien Thomaslevel 0. 1601fa7f10bSFabien Thomas.It Li usr 1611fa7f10bSFabien ThomasConfigure the PMC to count events occurring at privilege levels 1, 2 1621fa7f10bSFabien Thomasor 3. 1631fa7f10bSFabien Thomas.El 1641fa7f10bSFabien Thomas.Pp 1651fa7f10bSFabien ThomasIf neither of the 1661fa7f10bSFabien Thomas.Dq Li os 1671fa7f10bSFabien Thomasor 1681fa7f10bSFabien Thomas.Dq Li usr 1691fa7f10bSFabien Thomasqualifiers are specified, the default is to enable both. 1701fa7f10bSFabien Thomas.Ss Event Specifiers (Programmable PMCs) 1711fa7f10bSFabien ThomasWestmere programmable PMCs support the following events: 1721fa7f10bSFabien Thomas.Bl -tag -width indent 1731fa7f10bSFabien Thomas.It Li LOAD_BLOCK.OVERLAP_STORE 1741fa7f10bSFabien Thomas.Pq Event 03H , Umask 02H 1751fa7f10bSFabien ThomasLoads that partially overlap an earlier store 1761fa7f10bSFabien Thomas.It Li SB_DRAIN.ANY 1771fa7f10bSFabien Thomas.Pq Event 04H , Umask 07H 1781fa7f10bSFabien ThomasAll Store buffer stall cycles 1791fa7f10bSFabien Thomas.It Li MISALIGN_MEMORY.STORE 1801fa7f10bSFabien Thomas.Pq Event 05H , Umask 02H 1811fa7f10bSFabien ThomasAll store referenced with misaligned address 1821fa7f10bSFabien Thomas.It Li STORE_BLOCKS.AT_RET 1831fa7f10bSFabien Thomas.Pq Event 06H , Umask 04H 184bb374ac2SGlen BarberCounts number of loads delayed with at-Retirement block code. 185bb374ac2SGlen BarberThe following 1861fa7f10bSFabien Thomasloads need to be executed at retirement and wait for all senior stores on 1871fa7f10bSFabien Thomasthe same thread to be drained: load splitting across 4K boundary (page 1881fa7f10bSFabien Thomassplit), load accessing uncacheable (UC or USWC) memory, load lock, and load 1891fa7f10bSFabien Thomaswith page table in UC or USWC memory region. 1901fa7f10bSFabien Thomas.It Li STORE_BLOCKS.L1D_BLOCK 1911fa7f10bSFabien Thomas.Pq Event 06H , Umask 08H 1921fa7f10bSFabien ThomasCacheable loads delayed with L1D block code 1931fa7f10bSFabien Thomas.It Li PARTIAL_ADDRESS_ALIAS 1941fa7f10bSFabien Thomas.Pq Event 07H , Umask 01H 1951fa7f10bSFabien ThomasCounts false dependency due to partial address aliasing 1961fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.ANY 1971fa7f10bSFabien Thomas.Pq Event 08H , Umask 01H 1981fa7f10bSFabien ThomasCounts all load misses that cause a page walk 1991fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.WALK_COMPLETED 2001fa7f10bSFabien Thomas.Pq Event 08H , Umask 02H 2011fa7f10bSFabien ThomasCounts number of completed page walks due to load miss in the STLB. 2021fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.WALK_CYCLES 2031fa7f10bSFabien Thomas.Pq Event 08H , Umask 04H 2041fa7f10bSFabien ThomasCycles PMH is busy with a page walk due to a load miss in the STLB. 2051fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.STLB_HIT 2061fa7f10bSFabien Thomas.Pq Event 08H , Umask 10H 2071fa7f10bSFabien ThomasNumber of cache load STLB hits 2081fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.PDE_MISS 2091fa7f10bSFabien Thomas.Pq Event 08H , Umask 20H 2101fa7f10bSFabien ThomasNumber of DTLB cache load misses where the low part of the linear to 2111fa7f10bSFabien Thomasphysical address translation was missed. 2121fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.LOADS 2131fa7f10bSFabien Thomas.Pq Event 0BH , Umask 01H 2141fa7f10bSFabien ThomasCounts the number of instructions with an architecturally-visible store 2151fa7f10bSFabien Thomasretired on the architected path. 2161fa7f10bSFabien ThomasIn conjunction with ld_lat facility 2171fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.STORES 2181fa7f10bSFabien Thomas.Pq Event 0BH , Umask 02H 2191fa7f10bSFabien ThomasCounts the number of instructions with an architecturally-visible store 2201fa7f10bSFabien Thomasretired on the architected path. 2211fa7f10bSFabien ThomasIn conjunction with ld_lat facility 2221fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD 2231fa7f10bSFabien Thomas.Pq Event 0BH , Umask 10H 2241fa7f10bSFabien ThomasCounts the number of instructions exceeding the latency specified with 2251fa7f10bSFabien Thomasld_lat facility. 2261fa7f10bSFabien ThomasIn conjunction with ld_lat facility 2271fa7f10bSFabien Thomas.It Li MEM_STORE_RETIRED.DTLB_MISS 2281fa7f10bSFabien Thomas.Pq Event 0CH , Umask 01H 229bb374ac2SGlen BarberThe event counts the number of retired stores that missed the DTLB. 230bb374ac2SGlen BarberThe DTLB miss is not counted if the store operation causes a fault. 231bb374ac2SGlen BarberDoes not counter prefetches. 232bb374ac2SGlen BarberCounts both primary and secondary misses to the TLB 2331fa7f10bSFabien Thomas.It Li UOPS_ISSUED.ANY 2341fa7f10bSFabien Thomas.Pq Event 0EH , Umask 01H 2351fa7f10bSFabien ThomasCounts the number of Uops issued by the Register Allocation Table to the 2361fa7f10bSFabien ThomasReservation Station, i.e. the UOPs issued from the front end to the back 2371fa7f10bSFabien Thomasend. 2381fa7f10bSFabien Thomas.It Li UOPS_ISSUED.STALLED_CYCLES 2391fa7f10bSFabien Thomas.Pq Event 0EH , Umask 01H 2401fa7f10bSFabien ThomasCounts the number of cycles no Uops issued by the Register Allocation Table 2411fa7f10bSFabien Thomasto the Reservation Station, i.e. the UOPs issued from the front end to the 2421fa7f10bSFabien Thomasback end. 2431fa7f10bSFabien Thomasset invert=1, cmask = 1 2441fa7f10bSFabien Thomas.It Li UOPS_ISSUED.FUSED 2451fa7f10bSFabien Thomas.Pq Event 0EH , Umask 02H 2461fa7f10bSFabien ThomasCounts the number of fused Uops that were issued from the Register 2471fa7f10bSFabien ThomasAllocation Table to the Reservation Station. 2481fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_HITM 2491fa7f10bSFabien Thomas.Pq Event 0FH , Umask 02H 2501fa7f10bSFabien ThomasLoad instructions retired that HIT modified data in sibling core (Precise 2511fa7f10bSFabien ThomasEvent) 2521fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT 2531fa7f10bSFabien Thomas.Pq Event 0FH , Umask 08H 2541fa7f10bSFabien ThomasLoad instructions retired local dram and remote cache HIT data sources 2551fa7f10bSFabien Thomas(Precise Event) 2561fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM 2571fa7f10bSFabien Thomas.Pq Event 0FH , Umask 10H 2581fa7f10bSFabien ThomasLoad instructions retired with a data source of local DRAM or locally homed 2591fa7f10bSFabien Thomasremote cache HITM (Precise Event) 2601fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM 2611fa7f10bSFabien Thomas.Pq Event 0FH , Umask 20H 2621fa7f10bSFabien ThomasLoad instructions retired remote DRAM and remote home-remote cache HITM 2631fa7f10bSFabien Thomas(Precise Event) 2641fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.UNCACHEABLE 2651fa7f10bSFabien Thomas.Pq Event 0FH , Umask 80H 2661fa7f10bSFabien ThomasLoad instructions retired I/O (Precise Event) 2671fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.X87 2681fa7f10bSFabien Thomas.Pq Event 10H , Umask 01H 269bb374ac2SGlen BarberCounts the number of FP Computational Uops Executed. 270bb374ac2SGlen BarberThe number of FADD, 2711fa7f10bSFabien ThomasFSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer 272bb374ac2SGlen BarberDIVs, and IDIVs. 273bb374ac2SGlen BarberThis event does not distinguish an FADD used in the middle 2741fa7f10bSFabien Thomasof a transcendental flow from a separate FADD instruction. 2751fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.MMX 2761fa7f10bSFabien Thomas.Pq Event 10H , Umask 02H 2771fa7f10bSFabien ThomasCounts number of MMX Uops executed. 2781fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP 2791fa7f10bSFabien Thomas.Pq Event 10H , Umask 04H 2801fa7f10bSFabien ThomasCounts number of SSE and SSE2 FP uops executed. 2811fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE2_INTEGER 2821fa7f10bSFabien Thomas.Pq Event 10H , Umask 08H 2831fa7f10bSFabien ThomasCounts number of SSE2 integer uops executed. 2841fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED 2851fa7f10bSFabien Thomas.Pq Event 10H , Umask 10H 2861fa7f10bSFabien ThomasCounts number of SSE FP packed uops executed. 2871fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR 2881fa7f10bSFabien Thomas.Pq Event 10H , Umask 20H 2891fa7f10bSFabien ThomasCounts number of SSE FP scalar uops executed. 2901fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION 2911fa7f10bSFabien Thomas.Pq Event 10H , Umask 40H 2921fa7f10bSFabien ThomasCounts number of SSE* FP single precision uops executed. 2931fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION 2941fa7f10bSFabien Thomas.Pq Event 10H , Umask 80H 2951fa7f10bSFabien ThomasCounts number of SSE* FP double precision uops executed. 2961fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_MPY 2971fa7f10bSFabien Thomas.Pq Event 12H , Umask 01H 2981fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer multiply operations. 2991fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_SHIFT 3001fa7f10bSFabien Thomas.Pq Event 12H , Umask 02H 3011fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer shift operations. 3021fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACK 3031fa7f10bSFabien Thomas.Pq Event 12H , Umask 04H 3041fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer pack operations. 3051fa7f10bSFabien Thomas.It Li SIMD_INT_128.UNPACK 3061fa7f10bSFabien Thomas.Pq Event 12H , Umask 08H 3071fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer unpack operations. 3081fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_LOGICAL 3091fa7f10bSFabien Thomas.Pq Event 12H , Umask 10H 3101fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer logical operations. 3111fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_ARITH 3121fa7f10bSFabien Thomas.Pq Event 12H , Umask 20H 3131fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer arithmetic operations. 3141fa7f10bSFabien Thomas.It Li SIMD_INT_128.SHUFFLE_MOVE 3151fa7f10bSFabien Thomas.Pq Event 12H , Umask 40H 3161fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer shuffle and move operations. 3171fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.RS 3181fa7f10bSFabien Thomas.Pq Event 13H , Umask 01H 3191fa7f10bSFabien ThomasCounts number of loads dispatched from the Reservation Station that bypass 3201fa7f10bSFabien Thomasthe Memory Order Buffer. 3211fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.RS_DELAYED 3221fa7f10bSFabien Thomas.Pq Event 13H , Umask 02H 323bb374ac2SGlen BarberCounts the number of delayed RS dispatches at the stage latch. 324bb374ac2SGlen BarberIf an RS dispatch can not bypass to LB, it has another chance to dispatch 325bb374ac2SGlen Barberfrom the one-cycle delayed staging latch before it is written into the LB. 3261fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.MOB 3271fa7f10bSFabien Thomas.Pq Event 13H , Umask 04H 3281fa7f10bSFabien ThomasCounts the number of loads dispatched from the Reservation Station to the 3291fa7f10bSFabien ThomasMemory Order Buffer. 3301fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.ANY 3311fa7f10bSFabien Thomas.Pq Event 13H , Umask 07H 3321fa7f10bSFabien ThomasCounts all loads dispatched from the Reservation Station. 3331fa7f10bSFabien Thomas.It Li ARITH.CYCLES_DIV_BUSY 3341fa7f10bSFabien Thomas.Pq Event 14H , Umask 01H 3351fa7f10bSFabien ThomasCounts the number of cycles the divider is busy executing divide or square 336bb374ac2SGlen Barberroot operations. 337bb374ac2SGlen BarberThe divide can be integer, X87 or Streaming SIMD Extensions (SSE). 338bb374ac2SGlen BarberThe square root operation can be either X87 or SSE. 3391fa7f10bSFabien ThomasSet 'edge =1, invert=1, cmask=1' to count the number of divides. 3401fa7f10bSFabien ThomasCount may be incorrect When SMT is on 3411fa7f10bSFabien Thomas.It Li ARITH.MUL 3421fa7f10bSFabien Thomas.Pq Event 14H , Umask 02H 343bb374ac2SGlen BarberCounts the number of multiply operations executed. 344bb374ac2SGlen BarberThis includes integer as 3451fa7f10bSFabien Thomaswell as floating point multiply operations but excludes DPPS mul and MPSAD. 3461fa7f10bSFabien ThomasCount may be incorrect When SMT is on 3471fa7f10bSFabien Thomas.It Li INST_QUEUE_WRITES 3481fa7f10bSFabien Thomas.Pq Event 17H , Umask 01H 3491fa7f10bSFabien ThomasCounts the number of instructions written into the instruction queue every 3501fa7f10bSFabien Thomascycle. 3511fa7f10bSFabien Thomas.It Li INST_DECODED.DEC0 3521fa7f10bSFabien Thomas.Pq Event 18H , Umask 01H 353bb374ac2SGlen BarberCounts number of instructions that require decoder 0 to be decoded. 354bb374ac2SGlen BarberUsually, this means that the instruction maps to more than 1 uop 3551fa7f10bSFabien Thomas.It Li TWO_UOP_INSTS_DECODED 3561fa7f10bSFabien Thomas.Pq Event 19H , Umask 01H 3571fa7f10bSFabien ThomasAn instruction that generates two uops was decoded 3581fa7f10bSFabien Thomas.It Li INST_QUEUE_WRITE_CYCLES 3591fa7f10bSFabien Thomas.Pq Event 1EH , Umask 01H 3601fa7f10bSFabien ThomasThis event counts the number of cycles during which instructions are written 361bb374ac2SGlen Barberto the instruction queue. 362bb374ac2SGlen BarberDividing this counter by the number of 3631fa7f10bSFabien Thomasinstructions written to the instruction queue (INST_QUEUE_WRITES) yields the 364bb374ac2SGlen Barberaverage number of instructions decoded each cycle. 365bb374ac2SGlen BarberIf this number is less 3661fa7f10bSFabien Thomasthan four and the pipe stalls, this indicates that the decoder is failing to 3671fa7f10bSFabien Thomasdecode enough instructions per cycle to sustain the 4-wide pipeline. 3681fa7f10bSFabien ThomasIf SSE* instructions that are 6 bytes or longer arrive one after another, 369bb374ac2SGlen Barberthen front end throughput may limit execution speed. 370bb374ac2SGlen BarberIn such case, 3711fa7f10bSFabien Thomas.It Li LSD_OVERFLOW 3721fa7f10bSFabien Thomas.Pq Event 20H , Umask 01H 3731fa7f10bSFabien ThomasNumber of loops that can not stream from the instruction queue. 3741fa7f10bSFabien Thomas.It Li L2_RQSTS.LD_HIT 3751fa7f10bSFabien Thomas.Pq Event 24H , Umask 01H 376bb374ac2SGlen BarberCounts number of loads that hit the L2 cache. 377bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches. 378bb374ac2SGlen BarberL2 loads can be rejected for various reasons. 379bb374ac2SGlen BarberOnly non rejected loads are counted. 3801fa7f10bSFabien Thomas.It Li L2_RQSTS.LD_MISS 3811fa7f10bSFabien Thomas.Pq Event 24H , Umask 02H 382bb374ac2SGlen BarberCounts the number of loads that miss the L2 cache. 383bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches. 3841fa7f10bSFabien Thomas.It Li L2_RQSTS.LOADS 3851fa7f10bSFabien Thomas.Pq Event 24H , Umask 03H 386bb374ac2SGlen BarberCounts all L2 load requests. 387bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches. 3881fa7f10bSFabien Thomas.It Li L2_RQSTS.RFO_HIT 3891fa7f10bSFabien Thomas.Pq Event 24H , Umask 04H 390bb374ac2SGlen BarberCounts the number of store RFO requests that hit the L2 cache. 391bb374ac2SGlen BarberL2 RFO requests include both L1D demand RFO misses as well as L1D RFO 392bb374ac2SGlen Barberprefetches. 3931fa7f10bSFabien ThomasCount includes WC memory requests, where the data is not fetched but the 3941fa7f10bSFabien Thomaspermission to write the line is required. 3951fa7f10bSFabien Thomas.It Li L2_RQSTS.RFO_MISS 3961fa7f10bSFabien Thomas.Pq Event 24H , Umask 08H 397bb374ac2SGlen BarberCounts the number of store RFO requests that miss the L2 cache. 398bb374ac2SGlen BarberL2 RFO requests include both L1D demand RFO misses as well as L1D RFO 399bb374ac2SGlen Barberprefetches. 4001fa7f10bSFabien Thomas.It Li L2_RQSTS.RFOS 4011fa7f10bSFabien Thomas.Pq Event 24H , Umask 0CH 402bb374ac2SGlen BarberCounts all L2 store RFO requests. 403bb374ac2SGlen BarberL2 RFO requests include both L1D demand 40451cc3ad7SGeorge V. Neville-NeilRFO misses as well as L1D RFO prefetches. 4051fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCH_HIT 4061fa7f10bSFabien Thomas.Pq Event 24H , Umask 10H 407bb374ac2SGlen BarberCounts number of instruction fetches that hit the L2 cache. 408bb374ac2SGlen BarberL2 instruction fetches include both L1I demand misses as well as L1I 409bb374ac2SGlen Barberinstruction prefetches. 4101fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCH_MISS 4111fa7f10bSFabien Thomas.Pq Event 24H , Umask 20H 412bb374ac2SGlen BarberCounts number of instruction fetches that miss the L2 cache. 413bb374ac2SGlen BarberL2 instruction fetches include both L1I demand misses as well as L1I 414bb374ac2SGlen Barberinstruction prefetches. 4151fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCHES 4161fa7f10bSFabien Thomas.Pq Event 24H , Umask 30H 417bb374ac2SGlen BarberCounts all instruction fetches. 418bb374ac2SGlen BarberL2 instruction fetches include both L1I 4191fa7f10bSFabien Thomasdemand misses as well as L1I instruction prefetches. 4201fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCH_HIT 4211fa7f10bSFabien Thomas.Pq Event 24H , Umask 40H 4221fa7f10bSFabien ThomasCounts L2 prefetch hits for both code and data. 4231fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCH_MISS 4241fa7f10bSFabien Thomas.Pq Event 24H , Umask 80H 4251fa7f10bSFabien ThomasCounts L2 prefetch misses for both code and data. 4261fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCHES 4271fa7f10bSFabien Thomas.Pq Event 24H , Umask C0H 4281fa7f10bSFabien ThomasCounts all L2 prefetches for both code and data. 4291fa7f10bSFabien Thomas.It Li L2_RQSTS.MISS 4301fa7f10bSFabien Thomas.Pq Event 24H , Umask AAH 4311fa7f10bSFabien ThomasCounts all L2 misses for both code and data. 4321fa7f10bSFabien Thomas.It Li L2_RQSTS.REFERENCES 4331fa7f10bSFabien Thomas.Pq Event 24H , Umask FFH 4341fa7f10bSFabien ThomasCounts all L2 requests for both code and data. 4351fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.I_STATE 4361fa7f10bSFabien Thomas.Pq Event 26H , Umask 01H 4371fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is 438bb374ac2SGlen Barberin the I (invalid) state, i.e. a cache miss. 439bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and L1D prefetches. 4401fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.S_STATE 4411fa7f10bSFabien Thomas.Pq Event 26H , Umask 02H 4421fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is 443bb374ac2SGlen Barberin the S (shared) state. 444bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and L1D 4451fa7f10bSFabien Thomasprefetches. 4461fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.E_STATE 4471fa7f10bSFabien Thomas.Pq Event 26H , Umask 04H 4481fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is 449bb374ac2SGlen Barberin the E (exclusive) state. 450bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and 4511fa7f10bSFabien ThomasL1D prefetches. 4521fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.M_STATE 4531fa7f10bSFabien Thomas.Pq Event 26H , Umask 08H 4541fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is 455bb374ac2SGlen Barberin the M (modified) state. 456bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and 4571fa7f10bSFabien ThomasL1D prefetches. 4581fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.MESI 4591fa7f10bSFabien Thomas.Pq Event 26H , Umask 0FH 460bb374ac2SGlen BarberCounts all L2 data demand requests. 461bb374ac2SGlen BarberL2 demand loads are both L1D demand 4621fa7f10bSFabien Thomasmisses and L1D prefetches. 4631fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.I_STATE 4641fa7f10bSFabien Thomas.Pq Event 26H , Umask 10H 4651fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is 4661fa7f10bSFabien Thomasin the I (invalid) state, i.e. a cache miss. 4671fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.S_STATE 4681fa7f10bSFabien Thomas.Pq Event 26H , Umask 20H 4691fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is 470bb374ac2SGlen Barberin the S (shared) state. 471bb374ac2SGlen BarberA prefetch RFO will miss on an S state line, while 4721fa7f10bSFabien Thomasa prefetch read will hit on an S state line. 4731fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.E_STATE 4741fa7f10bSFabien Thomas.Pq Event 26H , Umask 40H 4751fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is 4761fa7f10bSFabien Thomasin the E (exclusive) state. 4771fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.M_STATE 4781fa7f10bSFabien Thomas.Pq Event 26H , Umask 80H 4791fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is 4801fa7f10bSFabien Thomasin the M (modified) state. 4811fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.MESI 4821fa7f10bSFabien Thomas.Pq Event 26H , Umask F0H 4831fa7f10bSFabien ThomasCounts all L2 prefetch requests. 4841fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.ANY 4851fa7f10bSFabien Thomas.Pq Event 26H , Umask FFH 4861fa7f10bSFabien ThomasCounts all L2 data requests. 4871fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.I_STATE 4881fa7f10bSFabien Thomas.Pq Event 27H , Umask 01H 4891fa7f10bSFabien ThomasCounts number of L2 demand store RFO requests where the cache line to be 490bb374ac2SGlen Barberloaded is in the I (invalid) state, i.e, a cache miss. 491bb374ac2SGlen BarberThe L1D prefetcher 4921fa7f10bSFabien Thomasdoes not issue a RFO prefetch. 4931fa7f10bSFabien ThomasThis is a demand RFO request 4941fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.S_STATE 4951fa7f10bSFabien Thomas.Pq Event 27H , Umask 02H 4961fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is 497bb374ac2SGlen Barberin the S (shared) state. 498bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO prefetch. 49951cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request. 5001fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.M_STATE 5011fa7f10bSFabien Thomas.Pq Event 27H , Umask 08H 5021fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is 503bb374ac2SGlen Barberin the M (modified) state. 504bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO prefetch. 50551cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request. 5061fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.HIT 5071fa7f10bSFabien Thomas.Pq Event 27H , Umask 0EH 5081fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is 509bb374ac2SGlen Barberin either the S, E or M states. 510bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO 5111fa7f10bSFabien Thomasprefetch. 5121fa7f10bSFabien ThomasThis is a demand RFO request 5131fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.MESI 5141fa7f10bSFabien Thomas.Pq Event 27H , Umask 0FH 5151fa7f10bSFabien ThomasCounts all L2 store RFO requests.The L1D prefetcher does not issue a RFO 5161fa7f10bSFabien Thomasprefetch. 51751cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request. 5181fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.I_STATE 5191fa7f10bSFabien Thomas.Pq Event 27H , Umask 10H 5201fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be 5211fa7f10bSFabien Thomasloaded is in the I (invalid) state, i.e. a cache miss. 5221fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.S_STATE 5231fa7f10bSFabien Thomas.Pq Event 27H , Umask 20H 5241fa7f10bSFabien ThomasCounts number of L2 lock RFO requests where the cache line to be loaded is 5251fa7f10bSFabien Thomasin the S (shared) state. 5261fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.E_STATE 5271fa7f10bSFabien Thomas.Pq Event 27H , Umask 40H 5281fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be 5291fa7f10bSFabien Thomasloaded is in the E (exclusive) state. 5301fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.M_STATE 5311fa7f10bSFabien Thomas.Pq Event 27H , Umask 80H 5321fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be 5331fa7f10bSFabien Thomasloaded is in the M (modified) state. 5341fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.HIT 5351fa7f10bSFabien Thomas.Pq Event 27H , Umask E0H 5361fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be 5371fa7f10bSFabien Thomasloaded is in either the S, E, or M state. 5381fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.MESI 5391fa7f10bSFabien Thomas.Pq Event 27H , Umask F0H 5401fa7f10bSFabien ThomasCounts all L2 demand lock RFO requests. 5411fa7f10bSFabien Thomas.It Li L1D_WB_L2.I_STATE 5421fa7f10bSFabien Thomas.Pq Event 28H , Umask 01H 5431fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written 5441fa7f10bSFabien Thomasis in the I (invalid) state, i.e. a cache miss. 5451fa7f10bSFabien Thomas.It Li L1D_WB_L2.S_STATE 5461fa7f10bSFabien Thomas.Pq Event 28H , Umask 02H 5471fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written 5481fa7f10bSFabien Thomasis in the S state. 5491fa7f10bSFabien Thomas.It Li L1D_WB_L2.E_STATE 5501fa7f10bSFabien Thomas.Pq Event 28H , Umask 04H 5511fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written 5521fa7f10bSFabien Thomasis in the E (exclusive) state. 5531fa7f10bSFabien Thomas.It Li L1D_WB_L2.M_STATE 5541fa7f10bSFabien Thomas.Pq Event 28H , Umask 08H 5551fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written 5561fa7f10bSFabien Thomasis in the M (modified) state. 5571fa7f10bSFabien Thomas.It Li L1D_WB_L2.MESI 5581fa7f10bSFabien Thomas.Pq Event 28H , Umask 0FH 5591fa7f10bSFabien ThomasCounts all L1 writebacks to the L2. 5601fa7f10bSFabien Thomas.It Li L3_LAT_CACHE.REFERENCE 5611fa7f10bSFabien Thomas.Pq Event 2EH , Umask 02H 562bb374ac2SGlen BarberCounts uncore Last Level Cache references. 563bb374ac2SGlen BarberBecause cache hierarchy, cache 5641fa7f10bSFabien Thomassizes and other implementation-specific characteristics; value comparison to 5651fa7f10bSFabien Thomasestimate performance differences is not recommended. 56651cc3ad7SGeorge V. Neville-NeilSee Table A-1. 5671fa7f10bSFabien Thomas.It Li L3_LAT_CACHE.MISS 5681fa7f10bSFabien Thomas.Pq Event 2EH , Umask 01H 569bb374ac2SGlen BarberCounts uncore Last Level Cache misses. 570bb374ac2SGlen BarberBecause cache hierarchy, cache sizes 5711fa7f10bSFabien Thomasand other implementation-specific characteristics; value comparison to 5721fa7f10bSFabien Thomasestimate performance differences is not recommended. 57351cc3ad7SGeorge V. Neville-NeilSee Table A-1. 5741fa7f10bSFabien Thomas.It Li CPU_CLK_UNHALTED.THREAD_P 5751fa7f10bSFabien Thomas.Pq Event 3CH , Umask 00H 5761fa7f10bSFabien ThomasCounts the number of thread cycles while the thread is not in a halt state. 577bb374ac2SGlen BarberThe thread enters the halt state when it is running the HLT instruction. 578bb374ac2SGlen BarberThe core frequency may change from time to time due to power or thermal 5791fa7f10bSFabien Thomasthrottling. 5801fa7f10bSFabien Thomassee Table A-1 5811fa7f10bSFabien Thomas.It Li CPU_CLK_UNHALTED.REF_P 5821fa7f10bSFabien Thomas.Pq Event 3CH , Umask 01H 5831fa7f10bSFabien ThomasIncrements at the frequency of TSC when not halted. 5841fa7f10bSFabien Thomassee Table A-1 5851fa7f10bSFabien Thomas.It Li DTLB_MISSES.ANY 5861fa7f10bSFabien Thomas.Pq Event 49H , Umask 01H 5871fa7f10bSFabien ThomasCounts the number of misses in the STLB which causes a page walk. 5881fa7f10bSFabien Thomas.It Li DTLB_MISSES.WALK_COMPLETED 5891fa7f10bSFabien Thomas.Pq Event 49H , Umask 02H 5901fa7f10bSFabien ThomasCounts number of misses in the STLB which resulted in a completed page walk. 5911fa7f10bSFabien Thomas.It Li DTLB_MISSES.WALK_CYCLES 5921fa7f10bSFabien Thomas.Pq Event 49H , Umask 04H 5931fa7f10bSFabien ThomasCounts cycles of page walk due to misses in the STLB. 5941fa7f10bSFabien Thomas.It Li DTLB_MISSES.STLB_HIT 5951fa7f10bSFabien Thomas.Pq Event 49H , Umask 10H 5961fa7f10bSFabien ThomasCounts the number of DTLB first level misses that hit in the second level 597bb374ac2SGlen BarberTLB. 598bb374ac2SGlen BarberThis event is only relevant if the core contains multiple DTLB levels. 5991fa7f10bSFabien Thomas.It Li DTLB_MISSES.LARGE_WALK_COMPLETED 6001fa7f10bSFabien Thomas.Pq Event 49H , Umask 80H 6011fa7f10bSFabien ThomasCounts number of completed large page walks due to misses in the STLB. 6021fa7f10bSFabien Thomas.It Li LOAD_HIT_PRE 6031fa7f10bSFabien Thomas.Pq Event 4CH , Umask 01H 6041fa7f10bSFabien ThomasCounts load operations sent to the L1 data cache while a previous SSE 6051fa7f10bSFabien Thomasprefetch instruction to the same cache line has started prefetching but has 6061fa7f10bSFabien Thomasnot yet finished. 6071fa7f10bSFabien Thomas.It Li L1D_PREFETCH.REQUESTS 6081fa7f10bSFabien Thomas.Pq Event 4EH , Umask 01H 6091fa7f10bSFabien ThomasCounts number of hardware prefetch requests dispatched out of the prefetch 6101fa7f10bSFabien ThomasFIFO. 6111fa7f10bSFabien Thomas.It Li L1D_PREFETCH.MISS 6121fa7f10bSFabien Thomas.Pq Event 4EH , Umask 02H 613bb374ac2SGlen BarberCounts number of hardware prefetch requests that miss the L1D. 614bb374ac2SGlen BarberThere are two 615bb374ac2SGlen Barberprefetchers in the L1D. 616bb374ac2SGlen BarberA streamer, which predicts lines sequentially after 6171fa7f10bSFabien Thomasthis one should be fetched, and the IP prefetcher that remembers access 618bb374ac2SGlen Barberpatterns for the current instruction. 619bb374ac2SGlen BarberThe streamer prefetcher stops on an 6201fa7f10bSFabien ThomasL1D hit, while the IP prefetcher does not. 6211fa7f10bSFabien Thomas.It Li L1D_PREFETCH.TRIGGERS 6221fa7f10bSFabien Thomas.Pq Event 4EH , Umask 04H 6231fa7f10bSFabien ThomasCounts number of prefetch requests triggered by the Finite State Machine and 624bb374ac2SGlen Barberpushed into the prefetch FIFO. 625bb374ac2SGlen BarberSome of the prefetch requests are dropped due 6261fa7f10bSFabien Thomasto overwrites or competition between the IP index prefetcher and streamer 627bb374ac2SGlen Barberprefetcher. 628bb374ac2SGlen BarberThe prefetch FIFO contains 4 entries. 6291fa7f10bSFabien Thomas.It Li EPT.WALK_CYCLES 6301fa7f10bSFabien Thomas.Pq Event 4FH , Umask 10H 6311fa7f10bSFabien ThomasCounts Extended Page walk cycles. 6321fa7f10bSFabien Thomas.It Li L1D.REPL 6331fa7f10bSFabien Thomas.Pq Event 51H , Umask 01H 6341fa7f10bSFabien ThomasCounts the number of lines brought into the L1 data cache. 63551cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only. 6361fa7f10bSFabien Thomas.It Li L1D.M_REPL 6371fa7f10bSFabien Thomas.Pq Event 51H , Umask 02H 6381fa7f10bSFabien ThomasCounts the number of modified lines brought into the L1 data cache. 63951cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only. 6401fa7f10bSFabien Thomas.It Li L1D.M_EVICT 6411fa7f10bSFabien Thomas.Pq Event 51H , Umask 04H 6421fa7f10bSFabien ThomasCounts the number of modified lines evicted from the L1 data cache due to 6431fa7f10bSFabien Thomasreplacement. 64451cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only. 6451fa7f10bSFabien Thomas.It Li L1D.M_SNOOP_EVICT 6461fa7f10bSFabien Thomas.Pq Event 51H , Umask 08H 6471fa7f10bSFabien ThomasCounts the number of modified lines evicted from the L1 data cache due to 6481fa7f10bSFabien Thomassnoop HITM intervention. 6491fa7f10bSFabien ThomasCounter 0, 1 only 6501fa7f10bSFabien Thomas.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT 6511fa7f10bSFabien Thomas.Pq Event 52H , Umask 01H 6521fa7f10bSFabien ThomasCounts the number of cacheable load lock speculated instructions accepted 6531fa7f10bSFabien Thomasinto the fill buffer. 6541fa7f10bSFabien Thomas.It Li L1D_CACHE_LOCK_FB_HIT 6551fa7f10bSFabien Thomas.Pq Event 53H , Umask 01H 6561fa7f10bSFabien ThomasCounts the number of cacheable load lock speculated or retired instructions 6571fa7f10bSFabien Thomasaccepted into the fill buffer. 6581fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA 6591fa7f10bSFabien Thomas.Pq Event 60H , Umask 01H 660bb374ac2SGlen BarberCounts weighted cycles of offcore demand data read requests. 661bb374ac2SGlen BarberDoes not include L2 prefetch requests. 66251cc3ad7SGeorge V. Neville-NeilCounter 0. 6631fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE 6641fa7f10bSFabien Thomas.Pq Event 60H , Umask 02H 665bb374ac2SGlen BarberCounts weighted cycles of offcore demand code read requests. 666bb374ac2SGlen BarberDoes not include L2 prefetch requests. 66751cc3ad7SGeorge V. Neville-NeilCounter 0. 6681fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO 6691fa7f10bSFabien Thomas.Pq Event 60H , Umask 04H 670bb374ac2SGlen BarberCounts weighted cycles of offcore demand RFO requests. 671bb374ac2SGlen BarberDoes not include L2 prefetch requests. 67251cc3ad7SGeorge V. Neville-NeilCounter 0. 6731fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ 6741fa7f10bSFabien Thomas.Pq Event 60H , Umask 08H 675bb374ac2SGlen BarberCounts weighted cycles of offcore read requests of any kind. 676bb374ac2SGlen BarberInclude L2 prefetch requests. 6773102cfe2SGlen BarberCounter 0. 6781fa7f10bSFabien Thomas.It Li CACHE_LOCK_CYCLES.L1D_L2 6791fa7f10bSFabien Thomas.Pq Event 63H , Umask 01H 680bb374ac2SGlen BarberCycle count during which the L1D and L2 are locked. 681bb374ac2SGlen BarberA lock is asserted when 6821fa7f10bSFabien Thomasthere is a locked memory access, due to uncacheable memory, a locked 6831fa7f10bSFabien Thomasoperation that spans two cache lines, or a page walk from an uncacheable 6841fa7f10bSFabien Thomaspage table. 685bb374ac2SGlen BarberCounter 0, 1 only. 686bb374ac2SGlen BarberL1D and L2 locks have a very high performance penalty and 6871fa7f10bSFabien Thomasit is highly recommended to avoid such accesses. 6881fa7f10bSFabien Thomas.It Li CACHE_LOCK_CYCLES.L1D 6891fa7f10bSFabien Thomas.Pq Event 63H , Umask 02H 6901fa7f10bSFabien ThomasCounts the number of cycles that cacheline in the L1 data cache unit is 6911fa7f10bSFabien Thomaslocked. 6921fa7f10bSFabien ThomasCounter 0, 1 only. 6931fa7f10bSFabien Thomas.It Li IO_TRANSACTIONS 6941fa7f10bSFabien Thomas.Pq Event 6CH , Umask 01H 6951fa7f10bSFabien ThomasCounts the number of completed I/O transactions. 6961fa7f10bSFabien Thomas.It Li L1I.HITS 6971fa7f10bSFabien Thomas.Pq Event 80H , Umask 01H 6981fa7f10bSFabien ThomasCounts all instruction fetches that hit the L1 instruction cache. 6991fa7f10bSFabien Thomas.It Li L1I.MISSES 7001fa7f10bSFabien Thomas.Pq Event 80H , Umask 02H 701bb374ac2SGlen BarberCounts all instruction fetches that miss the L1I cache. 702bb374ac2SGlen BarberThis includes 7031fa7f10bSFabien Thomasinstruction cache misses, streaming buffer misses, victim cache misses and 704bb374ac2SGlen Barberuncacheable fetches. 705bb374ac2SGlen BarberAn instruction fetch miss is counted only once and not 7061fa7f10bSFabien Thomasonce for every cycle it is outstanding. 7071fa7f10bSFabien Thomas.It Li L1I.READS 7081fa7f10bSFabien Thomas.Pq Event 80H , Umask 03H 7091fa7f10bSFabien ThomasCounts all instruction fetches, including uncacheable fetches that bypass 7101fa7f10bSFabien Thomasthe L1I. 7111fa7f10bSFabien Thomas.It Li L1I.CYCLES_STALLED 7121fa7f10bSFabien Thomas.Pq Event 80H , Umask 04H 7131fa7f10bSFabien ThomasCycle counts for which an instruction fetch stalls due to a L1I cache miss, 7141fa7f10bSFabien ThomasITLB miss or ITLB fault. 7151fa7f10bSFabien Thomas.It Li LARGE_ITLB.HIT 7161fa7f10bSFabien Thomas.Pq Event 82H , Umask 01H 7171fa7f10bSFabien ThomasCounts number of large ITLB hits. 7181fa7f10bSFabien Thomas.It Li ITLB_MISSES.ANY 7191fa7f10bSFabien Thomas.Pq Event 85H , Umask 01H 7201fa7f10bSFabien ThomasCounts the number of misses in all levels of the ITLB which causes a page 7211fa7f10bSFabien Thomaswalk. 7221fa7f10bSFabien Thomas.It Li ITLB_MISSES.WALK_COMPLETED 7231fa7f10bSFabien Thomas.Pq Event 85H , Umask 02H 7241fa7f10bSFabien ThomasCounts number of misses in all levels of the ITLB which resulted in a 7251fa7f10bSFabien Thomascompleted page walk. 7261fa7f10bSFabien Thomas.It Li ITLB_MISSES.WALK_CYCLES 7271fa7f10bSFabien Thomas.Pq Event 85H , Umask 04H 7281fa7f10bSFabien ThomasCounts ITLB miss page walk cycles. 7291fa7f10bSFabien Thomas.It Li ITLB_MISSES.LARGE_WALK_COMPLETED 7301fa7f10bSFabien Thomas.Pq Event 85H , Umask 80H 7311fa7f10bSFabien ThomasCounts number of completed large page walks due to misses in the STLB. 7321fa7f10bSFabien Thomas.It Li ILD_STALL.LCP 7331fa7f10bSFabien Thomas.Pq Event 87H , Umask 01H 7341fa7f10bSFabien ThomasCycles Instruction Length Decoder stalls due to length changing prefixes: 7351fa7f10bSFabien Thomas66, 67 or REX.W (for EM64T) instructions which change the length of the 7361fa7f10bSFabien Thomasdecoded instruction. 7371fa7f10bSFabien Thomas.It Li ILD_STALL.MRU 7381fa7f10bSFabien Thomas.Pq Event 87H , Umask 02H 7391fa7f10bSFabien ThomasInstruction Length Decoder stall cycles due to Brand Prediction Unit (PBU) 7401fa7f10bSFabien ThomasMost Recently Used (MRU) bypass. 7411fa7f10bSFabien Thomas.It Li ILD_STALL.IQ_FULL 7421fa7f10bSFabien Thomas.Pq Event 87H , Umask 04H 7431fa7f10bSFabien ThomasStall cycles due to a full instruction queue. 7441fa7f10bSFabien Thomas.It Li ILD_STALL.REGEN 7451fa7f10bSFabien Thomas.Pq Event 87H , Umask 08H 7461fa7f10bSFabien ThomasCounts the number of regen stalls. 7471fa7f10bSFabien Thomas.It Li ILD_STALL.ANY 7481fa7f10bSFabien Thomas.Pq Event 87H , Umask 0FH 7491fa7f10bSFabien ThomasCounts any cycles the Instruction Length Decoder is stalled. 7501fa7f10bSFabien Thomas.It Li BR_INST_EXEC.COND 7511fa7f10bSFabien Thomas.Pq Event 88H , Umask 01H 7521fa7f10bSFabien ThomasCounts the number of conditional near branch instructions executed, but not 7531fa7f10bSFabien Thomasnecessarily retired. 7541fa7f10bSFabien Thomas.It Li BR_INST_EXEC.DIRECT 7551fa7f10bSFabien Thomas.Pq Event 88H , Umask 02H 7561fa7f10bSFabien ThomasCounts all unconditional near branch instructions excluding calls and 7571fa7f10bSFabien Thomasindirect branches. 7581fa7f10bSFabien Thomas.It Li BR_INST_EXEC.INDIRECT_NON_CALL 7591fa7f10bSFabien Thomas.Pq Event 88H , Umask 04H 7601fa7f10bSFabien ThomasCounts the number of executed indirect near branch instructions that are not 7611fa7f10bSFabien Thomascalls. 7621fa7f10bSFabien Thomas.It Li BR_INST_EXEC.NON_CALLS 7631fa7f10bSFabien Thomas.Pq Event 88H , Umask 07H 7641fa7f10bSFabien ThomasCounts all non call near branch instructions executed, but not necessarily 7651fa7f10bSFabien Thomasretired. 7661fa7f10bSFabien Thomas.It Li BR_INST_EXEC.RETURN_NEAR 7671fa7f10bSFabien Thomas.Pq Event 88H , Umask 08H 7681fa7f10bSFabien ThomasCounts indirect near branches that have a return mnemonic. 7691fa7f10bSFabien Thomas.It Li BR_INST_EXEC.DIRECT_NEAR_CALL 7701fa7f10bSFabien Thomas.Pq Event 88H , Umask 10H 7711fa7f10bSFabien ThomasCounts unconditional near call branch instructions, excluding non call 7721fa7f10bSFabien Thomasbranch, executed. 7731fa7f10bSFabien Thomas.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL 7741fa7f10bSFabien Thomas.Pq Event 88H , Umask 20H 7751fa7f10bSFabien ThomasCounts indirect near calls, including both register and memory indirect, 7761fa7f10bSFabien Thomasexecuted. 7771fa7f10bSFabien Thomas.It Li BR_INST_EXEC.NEAR_CALLS 7781fa7f10bSFabien Thomas.Pq Event 88H , Umask 30H 7791fa7f10bSFabien ThomasCounts all near call branches executed, but not necessarily retired. 7801fa7f10bSFabien Thomas.It Li BR_INST_EXEC.TAKEN 7811fa7f10bSFabien Thomas.Pq Event 88H , Umask 40H 7821fa7f10bSFabien ThomasCounts taken near branches executed, but not necessarily retired. 7831fa7f10bSFabien Thomas.It Li BR_INST_EXEC.ANY 7841fa7f10bSFabien Thomas.Pq Event 88H , Umask 7FH 785bb374ac2SGlen BarberCounts all near executed branches (not necessarily retired). 786bb374ac2SGlen BarberThis includes only instructions and not micro-op branches. 787bb374ac2SGlen BarberFrequent branching is not necessarily a major performance issue. 788bb374ac2SGlen BarberHowever frequent branch mispredictions may be a problem. 7891fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.COND 7901fa7f10bSFabien Thomas.Pq Event 89H , Umask 01H 7911fa7f10bSFabien ThomasCounts the number of mispredicted conditional near branch instructions 7921fa7f10bSFabien Thomasexecuted, but not necessarily retired. 7931fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.DIRECT 7941fa7f10bSFabien Thomas.Pq Event 89H , Umask 02H 7951fa7f10bSFabien ThomasCounts mispredicted macro unconditional near branch instructions, excluding 7961fa7f10bSFabien Thomascalls and indirect branches (should always be 0). 7971fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.INDIRECT_NON_CALL 7981fa7f10bSFabien Thomas.Pq Event 89H , Umask 04H 7991fa7f10bSFabien ThomasCounts the number of executed mispredicted indirect near branch instructions 8001fa7f10bSFabien Thomasthat are not calls. 8011fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.NON_CALLS 8021fa7f10bSFabien Thomas.Pq Event 89H , Umask 07H 8031fa7f10bSFabien ThomasCounts mispredicted non call near branches executed, but not necessarily 8041fa7f10bSFabien Thomasretired. 8051fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.RETURN_NEAR 8061fa7f10bSFabien Thomas.Pq Event 89H , Umask 08H 8071fa7f10bSFabien ThomasCounts mispredicted indirect branches that have a rear return mnemonic. 8081fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL 8091fa7f10bSFabien Thomas.Pq Event 89H , Umask 10H 8101fa7f10bSFabien ThomasCounts mispredicted non-indirect near calls executed, (should always be 0). 8111fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL 8121fa7f10bSFabien Thomas.Pq Event 89H , Umask 20H 813f6ac2391SJoel DahlCounts mispredicted indirect near calls executed, including both register 8141fa7f10bSFabien Thomasand memory indirect. 8151fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.NEAR_CALLS 8161fa7f10bSFabien Thomas.Pq Event 89H , Umask 30H 8171fa7f10bSFabien ThomasCounts all mispredicted near call branches executed, but not necessarily 8181fa7f10bSFabien Thomasretired. 8191fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.TAKEN 8201fa7f10bSFabien Thomas.Pq Event 89H , Umask 40H 8211fa7f10bSFabien ThomasCounts executed mispredicted near branches that are taken, but not 8221fa7f10bSFabien Thomasnecessarily retired. 8231fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.ANY 8241fa7f10bSFabien Thomas.Pq Event 89H , Umask 7FH 8251fa7f10bSFabien ThomasCounts the number of mispredicted near branch instructions that were 8261fa7f10bSFabien Thomasexecuted, but not necessarily retired. 8271fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.ANY 8281fa7f10bSFabien Thomas.Pq Event A2H , Umask 01H 829bb374ac2SGlen BarberCounts the number of Allocator resource related stalls. 830bb374ac2SGlen BarberIncludes register renaming buffer entries, memory buffer entries. 831bb374ac2SGlen BarberIn addition to resource related stalls, this event counts some other events. 832bb374ac2SGlen BarberIncludes stalls arising 8331fa7f10bSFabien Thomasduring branch misprediction recovery, such as if retirement of the 8341fa7f10bSFabien Thomasmispredicted branch is delayed and stalls arising while store buffer is 8351fa7f10bSFabien Thomasdraining from synchronizing operations. 8361fa7f10bSFabien ThomasDoes not include stalls due to SuperQ (off core) queue full, too many cache 8371fa7f10bSFabien Thomasmisses, etc. 8381fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.LOAD 8391fa7f10bSFabien Thomas.Pq Event A2H , Umask 02H 8401fa7f10bSFabien ThomasCounts the cycles of stall due to lack of load buffer for load operation. 8411fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.RS_FULL 8421fa7f10bSFabien Thomas.Pq Event A2H , Umask 04H 8431fa7f10bSFabien ThomasThis event counts the number of cycles when the number of instructions in 8441fa7f10bSFabien Thomasthe pipeline waiting for execution reaches the limit the processor can 845bb374ac2SGlen Barberhandle. 846bb374ac2SGlen BarberA high count of this event indicates that there are long latency 8471fa7f10bSFabien Thomasoperations in the pipe (possibly load and store operations that miss the L2 8481fa7f10bSFabien Thomascache, or instructions dependent upon instructions further down the pipeline 8491fa7f10bSFabien Thomasthat have yet to retire. 8501fa7f10bSFabien ThomasWhen RS is full, new instructions can not enter the reservation station and 8511fa7f10bSFabien Thomasstart execution. 8521fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.STORE 8531fa7f10bSFabien Thomas.Pq Event A2H , Umask 08H 8541fa7f10bSFabien ThomasThis event counts the number of cycles that a resource related stall will 8551fa7f10bSFabien Thomasoccur due to the number of store instructions reaching the limit of the 856bb374ac2SGlen Barberpipeline, (i.e. all store buffers are used). 857bb374ac2SGlen BarberThe stall ends when a store 8581fa7f10bSFabien Thomasinstruction commits its data to the cache or memory. 8591fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.ROB_FULL 8601fa7f10bSFabien Thomas.Pq Event A2H , Umask 10H 8611fa7f10bSFabien ThomasCounts the cycles of stall due to re- order buffer full. 8621fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.FPCW 8631fa7f10bSFabien Thomas.Pq Event A2H , Umask 20H 8641fa7f10bSFabien ThomasCounts the number of cycles while execution was stalled due to writing the 8651fa7f10bSFabien Thomasfloating-point unit (FPU) control word. 8661fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.MXCSR 8671fa7f10bSFabien Thomas.Pq Event A2H , Umask 40H 8681fa7f10bSFabien ThomasStalls due to the MXCSR register rename occurring to close to a previous 869bb374ac2SGlen BarberMXCSR rename. 870bb374ac2SGlen BarberThe MXCSR provides control and status for the MMX registers. 8711fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.OTHER 8721fa7f10bSFabien Thomas.Pq Event A2H , Umask 80H 8731fa7f10bSFabien ThomasCounts the number of cycles while execution was stalled due to other 8741fa7f10bSFabien Thomasresource issues. 8751fa7f10bSFabien Thomas.It Li MACRO_INSTS.FUSIONS_DECODED 8761fa7f10bSFabien Thomas.Pq Event A6H , Umask 01H 8771fa7f10bSFabien ThomasCounts the number of instructions decoded that are macro-fused but not 8781fa7f10bSFabien Thomasnecessarily executed or retired. 8791fa7f10bSFabien Thomas.It Li BACLEAR_FORCE_IQ 8801fa7f10bSFabien Thomas.Pq Event A7H , Umask 01H 881bb374ac2SGlen BarberCounts number of times a BACLEAR was forced by the Instruction Queue. 882bb374ac2SGlen BarberThe IQ is also responsible for providing conditional branch prediction 883bb374ac2SGlen Barberdirection based on a static scheme and dynamic data provided by the L2 884bb374ac2SGlen BarberBranch Prediction Unit. 885bb374ac2SGlen BarberIf the conditional branch target is not found in the Target 8861fa7f10bSFabien ThomasArray and the IQ predicts that the branch is taken, then the IQ will force 887bb374ac2SGlen Barberthe Branch Address Calculator to issue a BACLEAR. 888bb374ac2SGlen BarberEach BACLEAR asserted by 8891fa7f10bSFabien Thomasthe BAC generates approximately an 8 cycle bubble in the instruction fetch 8901fa7f10bSFabien Thomaspipeline. 8911fa7f10bSFabien Thomas.It Li LSD.UOPS 8921fa7f10bSFabien Thomas.Pq Event A8H , Umask 01H 8931fa7f10bSFabien ThomasCounts the number of micro-ops delivered by loop stream detector 8941fa7f10bSFabien ThomasUse cmask=1 and invert to count cycles 8951fa7f10bSFabien Thomas.It Li ITLB_FLUSH 8961fa7f10bSFabien Thomas.Pq Event AEH , Umask 01H 8971fa7f10bSFabien ThomasCounts the number of ITLB flushes 8981fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA 8991fa7f10bSFabien Thomas.Pq Event B0H , Umask 01H 900bb374ac2SGlen BarberCounts number of offcore demand data read requests. 901bb374ac2SGlen BarberDoes not count L2 prefetch requests. 9021fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE 9031fa7f10bSFabien Thomas.Pq Event B0H , Umask 02H 904bb374ac2SGlen BarberCounts number of offcore demand code read requests. 905bb374ac2SGlen BarberDoes not count L2 prefetch requests. 9061fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.RFO 9071fa7f10bSFabien Thomas.Pq Event B0H , Umask 04H 908bb374ac2SGlen BarberCounts number of offcore demand RFO requests. 909bb374ac2SGlen BarberDoes not count L2 prefetch requests. 9101fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY.READ 9111fa7f10bSFabien Thomas.Pq Event B0H , Umask 08H 912bb374ac2SGlen BarberCounts number of offcore read requests. 913bb374ac2SGlen BarberIncludes L2 prefetch requests. 9141fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY.RFO 9151fa7f10bSFabien Thomas.Pq Event 80H , Umask 10H 916bb374ac2SGlen BarberCounts number of offcore RFO requests. 917bb374ac2SGlen BarberIncludes L2 prefetch requests. 9181fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.L1D_WRITEBACK 9191fa7f10bSFabien Thomas.Pq Event B0H , Umask 40H 9201fa7f10bSFabien ThomasCounts number of L1D writebacks to the uncore. 9211fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY 9221fa7f10bSFabien Thomas.Pq Event B0H , Umask 80H 9231fa7f10bSFabien ThomasCounts all offcore requests. 9241fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT0 9251fa7f10bSFabien Thomas.Pq Event B1H , Umask 01H 926bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 0. 927bb374ac2SGlen BarberPort 0 handles integer arithmetic, SIMD and FP add Uops. 9281fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT1 9291fa7f10bSFabien Thomas.Pq Event B1H , Umask 02H 930bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 1. 931bb374ac2SGlen BarberPort 1 handles integer arithmetic, SIMD, integer shift, FP multiply and 932bb374ac2SGlen BarberFP divide Uops. 9331fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT2_CORE 9341fa7f10bSFabien Thomas.Pq Event B1H , Umask 04H 935bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 2. 936bb374ac2SGlen BarberPort 2 handles the load Uops. 937bb374ac2SGlen BarberThis is a core count only and can not be collected per 9381fa7f10bSFabien Thomasthread. 9391fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT3_CORE 9401fa7f10bSFabien Thomas.Pq Event B1H , Umask 08H 941bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 3. 942bb374ac2SGlen BarberPort 3 handles store Uops. 943bb374ac2SGlen BarberThis is a core count only and can not be collected per thread. 9441fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT4_CORE 9451fa7f10bSFabien Thomas.Pq Event B1H , Umask 10H 946bb374ac2SGlen BarberCounts number of Uops executed that where issued on port 4. 947bb374ac2SGlen BarberPort 4 handles the value to be stored for the store Uops issued on port 3. 948bb374ac2SGlen BarberThis is a core count only and can not be collected per thread. 9491fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5 9501fa7f10bSFabien Thomas.Pq Event B1H , Umask 1FH 9511fa7f10bSFabien ThomasCounts number of cycles there are one or more uops being executed and were 952bb374ac2SGlen Barberissued on ports 0-4. 953bb374ac2SGlen BarberThis is a core count only and can not be collected per thread. 9541fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT5 9551fa7f10bSFabien Thomas.Pq Event B1H , Umask 20H 9561fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 5. 9571fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES 9581fa7f10bSFabien Thomas.Pq Event B1H , Umask 3FH 9591fa7f10bSFabien ThomasCounts number of cycles there are one or more uops being executed on any 960bb374ac2SGlen Barberports. 961bb374ac2SGlen BarberThis is a core count only and can not be collected per thread. 9621fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT015 9631fa7f10bSFabien Thomas.Pq Event B1H , Umask 40H 9641fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 0, 1, or 5. 96551cc3ad7SGeorge V. Neville-NeilUse cmask=1, invert=1 to count stall cycles. 9661fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT234 9671fa7f10bSFabien Thomas.Pq Event B1H , Umask 80H 9681fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 2, 3, or 4. 9691fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_SQ_FULL 9701fa7f10bSFabien Thomas.Pq Event B2H , Umask 01H 9711fa7f10bSFabien ThomasCounts number of cycles the SQ is full to handle off-core requests. 9721fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA 9731fa7f10bSFabien Thomas.Pq Event B3H , Umask 01H 974bb374ac2SGlen BarberCounts weighted cycles of snoopq requests for data. 975bb374ac2SGlen BarberCounter 0 only 9761fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty. 9771fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE 9781fa7f10bSFabien Thomas.Pq Event B3H , Umask 02H 979bb374ac2SGlen BarberCounts weighted cycles of snoopq invalidate requests. 980bb374ac2SGlen BarberCounter 0 only. 9811fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty. 9821fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE 9831fa7f10bSFabien Thomas.Pq Event B3H , Umask 04H 984bb374ac2SGlen BarberCounts weighted cycles of snoopq requests for code. 985bb374ac2SGlen BarberCounter 0 only. 9861fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty. 9871fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.CODE 9881fa7f10bSFabien Thomas.Pq Event B4H , Umask 01H 98951cc3ad7SGeorge V. Neville-NeilCounts the number of snoop code requests. 9901fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.DATA 9911fa7f10bSFabien Thomas.Pq Event B4H , Umask 02H 99251cc3ad7SGeorge V. Neville-NeilCounts the number of snoop data requests. 9931fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.INVALIDATE 9941fa7f10bSFabien Thomas.Pq Event B4H , Umask 04H 9951fa7f10bSFabien ThomasCounts the number of snoop invalidate requests 9961fa7f10bSFabien Thomas.It Li OFF_CORE_RESPONSE_0 9971fa7f10bSFabien Thomas.Pq Event B7H , Umask 01H 9981fa7f10bSFabien Thomassee Section 30.6.1.3, Off-core Response Performance Monitoring in the 9991fa7f10bSFabien ThomasProcessor Core. 100051cc3ad7SGeorge V. Neville-NeilRequires programming MSR 01A6H. 10011fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HIT 10021fa7f10bSFabien Thomas.Pq Event B8H , Umask 01H 10031fa7f10bSFabien ThomasCounts HIT snoop response sent by this thread in response to a snoop 10041fa7f10bSFabien Thomasrequest. 10051fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HITE 10061fa7f10bSFabien Thomas.Pq Event B8H , Umask 02H 10071fa7f10bSFabien ThomasCounts HIT E snoop response sent by this thread in response to a snoop 10081fa7f10bSFabien Thomasrequest. 10091fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HITM 10101fa7f10bSFabien Thomas.Pq Event B8H , Umask 04H 10111fa7f10bSFabien ThomasCounts HIT M snoop response sent by this thread in response to a snoop 10121fa7f10bSFabien Thomasrequest. 10131fa7f10bSFabien Thomas.It Li OFF_CORE_RESPONSE_1 10141fa7f10bSFabien Thomas.Pq Event BBH , Umask 01H 10151fa7f10bSFabien Thomassee Section 30.6.1.3, Off-core Response Performance Monitoring in the 101651cc3ad7SGeorge V. Neville-NeilProcessor Core. 101751cc3ad7SGeorge V. Neville-NeilUse MSR 01A7H. 10181fa7f10bSFabien Thomas.It Li INST_RETIRED.ANY_P 10191fa7f10bSFabien Thomas.Pq Event C0H , Umask 01H 10201fa7f10bSFabien ThomasSee Table A-1 10211fa7f10bSFabien ThomasNotes: INST_RETIRED.ANY is counted by a designated fixed counter. 10221fa7f10bSFabien ThomasINST_RETIRED.ANY_P is counted by a programmable counter and is an 1023bb374ac2SGlen Barberarchitectural performance event. 1024bb374ac2SGlen BarberEvent is supported if CPUID.A.EBX[1] = 0. 10251fa7f10bSFabien ThomasCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not 10261fa7f10bSFabien Thomascount as retired instructions. 10271fa7f10bSFabien Thomas.It Li INST_RETIRED.X87 10281fa7f10bSFabien Thomas.Pq Event C0H , Umask 02H 1029c2025a76SJoel DahlCounts the number of floating point computational operations retired 10301fa7f10bSFabien Thomasfloating point computational operations executed by the assist handler and 10311fa7f10bSFabien Thomassub-operations of complex floating point instructions like transcendental 10321fa7f10bSFabien Thomasinstructions. 10331fa7f10bSFabien Thomas.It Li INST_RETIRED.MMX 10341fa7f10bSFabien Thomas.Pq Event C0H , Umask 04H 10351fa7f10bSFabien ThomasCounts the number of retired: MMX instructions. 10361fa7f10bSFabien Thomas.It Li UOPS_RETIRED.ANY 10371fa7f10bSFabien Thomas.Pq Event C2H , Umask 01H 10381fa7f10bSFabien ThomasCounts the number of micro-ops retired, (macro-fused=1, micro- fused=2, 1039bb374ac2SGlen Barberothers=1; maximum count of 8 per cycle). 1040bb374ac2SGlen BarberMost instructions are composed of one or two micro-ops. 1041bb374ac2SGlen BarberSome instructions are decoded into longer sequences 10421fa7f10bSFabien Thomassuch as repeat instructions, floating point transcendental instructions, and 10431fa7f10bSFabien Thomasassists. 10441fa7f10bSFabien ThomasUse cmask=1 and invert to count active cycles or stalled cycles 10451fa7f10bSFabien Thomas.It Li UOPS_RETIRED.RETIRE_SLOTS 10461fa7f10bSFabien Thomas.Pq Event C2H , Umask 02H 10471fa7f10bSFabien ThomasCounts the number of retirement slots used each cycle 10481fa7f10bSFabien Thomas.It Li UOPS_RETIRED.MACRO_FUSED 10491fa7f10bSFabien Thomas.Pq Event C2H , Umask 04H 10501fa7f10bSFabien ThomasCounts number of macro-fused uops retired. 10511fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.CYCLES 10521fa7f10bSFabien Thomas.Pq Event C3H , Umask 01H 10531fa7f10bSFabien ThomasCounts the cycles machine clear is asserted. 10541fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.MEM_ORDER 10551fa7f10bSFabien Thomas.Pq Event C3H , Umask 02H 10561fa7f10bSFabien ThomasCounts the number of machine clears due to memory order conflicts. 10571fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.SMC 10581fa7f10bSFabien Thomas.Pq Event C3H , Umask 04H 10591fa7f10bSFabien ThomasCounts the number of times that a program writes to a code section. 10601fa7f10bSFabien ThomasSelf-modifying code causes a sever penalty in all Intel 64 and IA-32 1061bb374ac2SGlen Barberprocessors. 1062bb374ac2SGlen BarberThe modified cache line is written back to the L2 and L3caches. 106351cc3ad7SGeorge V. Neville-Neil.It Li BR_INST_RETIRED.ANY_P 10641fa7f10bSFabien Thomas.Pq Event C4H , Umask 00H 106551cc3ad7SGeorge V. Neville-NeilSee Table A-1. 10661fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.CONDITIONAL 10671fa7f10bSFabien Thomas.Pq Event C4H , Umask 01H 10681fa7f10bSFabien ThomasCounts the number of conditional branch instructions retired. 10691fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.NEAR_CALL 10701fa7f10bSFabien Thomas.Pq Event C4H , Umask 02H 107151cc3ad7SGeorge V. Neville-NeilCounts the number of direct & indirect near unconditional calls retired. 10721fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.ALL_BRANCHES 10731fa7f10bSFabien Thomas.Pq Event C4H , Umask 04H 107451cc3ad7SGeorge V. Neville-NeilCounts the number of branch instructions retired. 107551cc3ad7SGeorge V. Neville-Neil.It Li BR_MISP_RETIRED.ANY_P 10761fa7f10bSFabien Thomas.Pq Event C5H , Umask 00H 107751cc3ad7SGeorge V. Neville-NeilSee Table A-1. 10781fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.CONDITIONAL 10791fa7f10bSFabien Thomas.Pq Event C5H , Umask 01H 10801fa7f10bSFabien ThomasCounts mispredicted conditional retired calls. 10811fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.NEAR_CALL 10821fa7f10bSFabien Thomas.Pq Event C5H , Umask 02H 10831fa7f10bSFabien ThomasCounts mispredicted direct & indirect near unconditional retired calls. 10841fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.ALL_BRANCHES 10851fa7f10bSFabien Thomas.Pq Event C5H , Umask 04H 10861fa7f10bSFabien ThomasCounts all mispredicted retired calls. 10871fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE 10881fa7f10bSFabien Thomas.Pq Event C7H , Umask 01H 10891fa7f10bSFabien ThomasCounts SIMD packed single-precision floating point Uops retired. 10901fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE 10911fa7f10bSFabien Thomas.Pq Event C7H , Umask 02H 10921fa7f10bSFabien ThomasCounts SIMD calar single-precision floating point Uops retired. 10931fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE 10941fa7f10bSFabien Thomas.Pq Event C7H , Umask 04H 10951fa7f10bSFabien ThomasCounts SIMD packed double- precision floating point Uops retired. 10961fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE 10971fa7f10bSFabien Thomas.Pq Event C7H , Umask 08H 10981fa7f10bSFabien ThomasCounts SIMD scalar double-precision floating point Uops retired. 10991fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER 11001fa7f10bSFabien Thomas.Pq Event C7H , Umask 10H 11011fa7f10bSFabien ThomasCounts 128-bit SIMD vector integer Uops retired. 11021fa7f10bSFabien Thomas.It Li ITLB_MISS_RETIRED 11031fa7f10bSFabien Thomas.Pq Event C8H , Umask 20H 11041fa7f10bSFabien ThomasCounts the number of retired instructions that missed the ITLB when the 11051fa7f10bSFabien Thomasinstruction was fetched. 11061fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L1D_HIT 11071fa7f10bSFabien Thomas.Pq Event CBH , Umask 01H 11081fa7f10bSFabien ThomasCounts number of retired loads that hit the L1 data cache. 11091fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L2_HIT 11101fa7f10bSFabien Thomas.Pq Event CBH , Umask 02H 11111fa7f10bSFabien ThomasCounts number of retired loads that hit the L2 data cache. 11121fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT 11131fa7f10bSFabien Thomas.Pq Event CBH , Umask 04H 11141fa7f10bSFabien ThomasCounts number of retired loads that hit their own, unshared lines in the L3 11151fa7f10bSFabien Thomascache. 11161fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM 11171fa7f10bSFabien Thomas.Pq Event CBH , Umask 08H 11181fa7f10bSFabien ThomasCounts number of retired loads that hit in a sibling core's L2 (on die 1119bb374ac2SGlen Barbercore). 1120bb374ac2SGlen BarberSince the L3 is inclusive of all cores on the package, this is an L3 hit. 1121bb374ac2SGlen BarberThis counts both clean or modified hits. 11221fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L3_MISS 11231fa7f10bSFabien Thomas.Pq Event CBH , Umask 10H 1124bb374ac2SGlen BarberCounts number of retired loads that miss the L3 cache. 1125bb374ac2SGlen BarberThe load was satisfied by a remote socket, local memory or an IOH. 11261fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.HIT_LFB 11271fa7f10bSFabien Thomas.Pq Event CBH , Umask 40H 11281fa7f10bSFabien ThomasCounts number of retired loads that miss the L1D and the address is located 1129bb374ac2SGlen Barberin an allocated line fill buffer and will soon be committed to cache. 1130bb374ac2SGlen BarberThis is counting secondary L1D misses. 11311fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.DTLB_MISS 11321fa7f10bSFabien Thomas.Pq Event CBH , Umask 80H 1133bb374ac2SGlen BarberCounts the number of retired loads that missed the DTLB. 1134bb374ac2SGlen BarberThe DTLB miss is not counted if the load operation causes a fault. 1135bb374ac2SGlen BarberThis event counts loads from cacheable memory only. 1136bb374ac2SGlen BarberThe event does not count loads by software prefetches. 1137bb374ac2SGlen BarberCounts both primary and secondary misses to the TLB. 11381fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.TO_FP 11391fa7f10bSFabien Thomas.Pq Event CCH , Umask 01H 11401fa7f10bSFabien ThomasCounts the first floating-point instruction following any MMX instruction. 11411fa7f10bSFabien ThomasYou can use this event to estimate the penalties for the transitions between 11421fa7f10bSFabien Thomasfloating-point and MMX technology states. 11431fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.TO_MMX 11441fa7f10bSFabien Thomas.Pq Event CCH , Umask 02H 1145bb374ac2SGlen BarberCounts the first MMX instruction following a floating-point instruction. 1146bb374ac2SGlen BarberYou can use this event to estimate the penalties for the transitions between 11471fa7f10bSFabien Thomasfloating-point and MMX technology states. 11481fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.ANY 11491fa7f10bSFabien Thomas.Pq Event CCH , Umask 03H 11501fa7f10bSFabien ThomasCounts all transitions from floating point to MMX instructions and from MMX 1151bb374ac2SGlen Barberinstructions to floating point instructions. 1152bb374ac2SGlen BarberYou can use this event to estimate the penalties for the transitions between 1153bb374ac2SGlen Barberfloating-point and MMX technology states. 11541fa7f10bSFabien Thomas.It Li MACRO_INSTS.DECODED 11551fa7f10bSFabien Thomas.Pq Event D0H , Umask 01H 11561fa7f10bSFabien ThomasCounts the number of instructions decoded, (but not necessarily executed or 11571fa7f10bSFabien Thomasretired). 11581fa7f10bSFabien Thomas.It Li UOPS_DECODED.STALL_CYCLES 11591fa7f10bSFabien Thomas.Pq Event D1H , Umask 01H 11601fa7f10bSFabien ThomasCounts the cycles of decoder stalls. 11611fa7f10bSFabien Thomas.It Li UOPS_DECODED.MS 11621fa7f10bSFabien Thomas.Pq Event D1H , Umask 02H 1163bb374ac2SGlen BarberCounts the number of Uops decoded by the Microcode Sequencer, MS. 1164bb374ac2SGlen BarberThe MS delivers uops when the instruction is more than 4 uops long or a 1165bb374ac2SGlen Barbermicrocode assist is occurring. 11661fa7f10bSFabien Thomas.It Li UOPS_DECODED.ESP_FOLDING 11671fa7f10bSFabien Thomas.Pq Event D1H , Umask 04H 11681fa7f10bSFabien ThomasCounts number of stack pointer (ESP) instructions decoded: push , pop , call 11690b129325SGordon Bergling, ret, etc. 11700b129325SGordon BerglingESP instructions do not generate a Uop to increment or decrement ESP. 1171bb374ac2SGlen BarberInstead, they update an ESP_Offset register that keeps track of the 11721fa7f10bSFabien Thomasdelta to the current value of the ESP register. 11731fa7f10bSFabien Thomas.It Li UOPS_DECODED.ESP_SYNC 11741fa7f10bSFabien Thomas.Pq Event D1H , Umask 08H 11751fa7f10bSFabien ThomasCounts number of stack pointer (ESP) sync operations where an ESP 11761fa7f10bSFabien Thomasinstruction is corrected by adding the ESP offset register to the current 11771fa7f10bSFabien Thomasvalue of the ESP register. 11781fa7f10bSFabien Thomas.It Li RAT_STALLS.FLAGS 11791fa7f10bSFabien Thomas.Pq Event D2H , Umask 01H 11801fa7f10bSFabien ThomasCounts the number of cycles during which execution stalled due to several 1181bb374ac2SGlen Barberreasons, one of which is a partial flag register stall. 1182bb374ac2SGlen BarberA partial register 11831fa7f10bSFabien Thomasstall may occur when two conditions are met: 1) an instruction modifies 11841fa7f10bSFabien Thomassome, but not all, of the flags in the flag register and 2) the next 11851fa7f10bSFabien Thomasinstruction, which depends on flags, depends on flags that were not modified 11861fa7f10bSFabien Thomasby this instruction. 11871fa7f10bSFabien Thomas.It Li RAT_STALLS.REGISTERS 11881fa7f10bSFabien Thomas.Pq Event D2H , Umask 02H 11891fa7f10bSFabien ThomasThis event counts the number of cycles instruction execution latency became 11901fa7f10bSFabien Thomaslonger than the defined latency because the instruction used a register that 11911fa7f10bSFabien Thomaswas partially written by previous instruction. 11921fa7f10bSFabien Thomas.It Li RAT_STALLS.ROB_READ_PORT 11931fa7f10bSFabien Thomas.Pq Event D2H , Umask 04H 11941fa7f10bSFabien ThomasCounts the number of cycles when ROB read port stalls occurred, which did 1195bb374ac2SGlen Barbernot allow new micro-ops to enter the out-of-order pipeline. 1196bb374ac2SGlen BarberNote that, at 11971fa7f10bSFabien Thomasthis stage in the pipeline, additional stalls may occur at the same cycle 1198bb374ac2SGlen Barberand prevent the stalled micro-ops from entering the pipe. 1199bb374ac2SGlen BarberIn such a case, 12001fa7f10bSFabien Thomasmicro-ops retry entering the execution pipe in the next cycle and the 12011fa7f10bSFabien ThomasROB-read port stall is counted again. 12021fa7f10bSFabien Thomas.It Li RAT_STALLS.SCOREBOARD 12031fa7f10bSFabien Thomas.Pq Event D2H , Umask 08H 12041fa7f10bSFabien ThomasCounts the cycles where we stall due to microarchitecturally required 1205bb374ac2SGlen Barberserialization. 1206bb374ac2SGlen BarberMicrocode scoreboarding stalls. 12071fa7f10bSFabien Thomas.It Li RAT_STALLS.ANY 12081fa7f10bSFabien Thomas.Pq Event D2H , Umask 0FH 12091fa7f10bSFabien ThomasCounts all Register Allocation Table stall cycles due to: Cycles when ROB 12101fa7f10bSFabien Thomasread port stalls occurred, which did not allow new micro-ops to enter the 1211bb374ac2SGlen Barberexecution pipe. 1212bb374ac2SGlen BarberCycles when partial register stalls occurred Cycles when 12131fa7f10bSFabien Thomasflag stalls occurred Cycles floating-point unit (FPU) status word stalls 1214bb374ac2SGlen Barberoccurred. 1215bb374ac2SGlen BarberTo count each of these conditions separately use the events: 12161fa7f10bSFabien ThomasRAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and 12171fa7f10bSFabien ThomasRAT_STALLS.FPSW. 12181fa7f10bSFabien Thomas.It Li SEG_RENAME_STALLS 12191fa7f10bSFabien Thomas.Pq Event D4H , Umask 01H 12201fa7f10bSFabien ThomasCounts the number of stall cycles due to the lack of renaming resources for 1221bb374ac2SGlen Barberthe ES, DS, FS, and GS segment registers. 1222bb374ac2SGlen BarberIf a segment is renamed but not 12231fa7f10bSFabien Thomasretired and a second update to the same segment occurs, a stall occurs in 12241fa7f10bSFabien Thomasthe front- end of the pipeline until the renamed segment retires. 12251fa7f10bSFabien Thomas.It Li ES_REG_RENAMES 12261fa7f10bSFabien Thomas.Pq Event D5H , Umask 01H 12271fa7f10bSFabien ThomasCounts the number of times the ES segment register is renamed. 12281fa7f10bSFabien Thomas.It Li UOP_UNFUSION 12291fa7f10bSFabien Thomas.Pq Event DBH , Umask 01H 12301fa7f10bSFabien ThomasCounts unfusion events due to floating point exception to a fused uop. 12311fa7f10bSFabien Thomas.It Li BR_INST_DECODED 12321fa7f10bSFabien Thomas.Pq Event E0H , Umask 01H 12331fa7f10bSFabien ThomasCounts the number of branch instructions decoded. 12341fa7f10bSFabien Thomas.It Li BPU_MISSED_CALL_RET 12351fa7f10bSFabien Thomas.Pq Event E5H , Umask 01H 1236799162a6SJoel DahlCounts number of times the Branch Prediction Unit missed predicting a call 12371fa7f10bSFabien Thomasor return branch. 12381fa7f10bSFabien Thomas.It Li BACLEAR.CLEAR 12391fa7f10bSFabien Thomas.Pq Event E6H , Umask 01H 12401fa7f10bSFabien ThomasCounts the number of times the front end is resteered, mainly when the 12411fa7f10bSFabien ThomasBranch Prediction Unit cannot provide a correct prediction and this is 1242bb374ac2SGlen Barbercorrected by the Branch Address Calculator at the front end. 1243bb374ac2SGlen BarberThis can occur 12441fa7f10bSFabien Thomasif the code has many branches such that they cannot be consumed by the BPU. 12451fa7f10bSFabien ThomasEach BACLEAR asserted by the BAC generates approximately an 8 cycle bubble 1246bb374ac2SGlen Barberin the instruction fetch pipeline. 1247bb374ac2SGlen BarberThe effect on total execution time depends on the surrounding code. 12481fa7f10bSFabien Thomas.It Li BACLEAR.BAD_TARGET 12491fa7f10bSFabien Thomas.Pq Event E6H , Umask 02H 12501fa7f10bSFabien ThomasCounts number of Branch Address Calculator clears (BACLEAR) asserted due to 12511fa7f10bSFabien Thomasconditional branch instructions in which there was a target hit but the 1252bb374ac2SGlen Barberdirection was wrong. 1253bb374ac2SGlen BarberEach BACLEAR asserted by the BAC generates 12541fa7f10bSFabien Thomasapproximately an 8 cycle bubble in the instruction fetch pipeline. 12551fa7f10bSFabien Thomas.It Li BPU_CLEARS.EARLY 12561fa7f10bSFabien Thomas.Pq Event E8H , Umask 01H 12571fa7f10bSFabien ThomasCounts early (normal) Branch Prediction Unit clears: BPU predicted a taken 12581fa7f10bSFabien Thomasbranch after incorrectly assuming that it was not taken. 12591fa7f10bSFabien ThomasThe BPU clear leads to 2 cycle bubble in the Front End. 12601fa7f10bSFabien Thomas.It Li BPU_CLEARS.LATE 12611fa7f10bSFabien Thomas.Pq Event E8H , Umask 02H 12621fa7f10bSFabien ThomasCounts late Branch Prediction Unit clears due to Most Recently Used 1263bb374ac2SGlen Barberconflicts. 1264bb374ac2SGlen BarberThe PBU clear leads to a 3 cycle bubble in the Front End. 12651fa7f10bSFabien Thomas.It Li THREAD_ACTIVE 12661fa7f10bSFabien Thomas.Pq Event ECH , Umask 01H 12671fa7f10bSFabien ThomasCounts cycles threads are active. 12681fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.LOAD 12691fa7f10bSFabien Thomas.Pq Event F0H , Umask 01H 12701fa7f10bSFabien ThomasCounts L2 load operations due to HW prefetch or demand loads. 12711fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.RFO 12721fa7f10bSFabien Thomas.Pq Event F0H , Umask 02H 12731fa7f10bSFabien ThomasCounts L2 RFO operations due to HW prefetch or demand RFOs. 12741fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.IFETCH 12751fa7f10bSFabien Thomas.Pq Event F0H , Umask 04H 12761fa7f10bSFabien ThomasCounts L2 instruction fetch operations due to HW prefetch or demand ifetch. 12771fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.PREFETCH 12781fa7f10bSFabien Thomas.Pq Event F0H , Umask 08H 12791fa7f10bSFabien ThomasCounts L2 prefetch operations. 12801fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.L1D_WB 12811fa7f10bSFabien Thomas.Pq Event F0H , Umask 10H 12821fa7f10bSFabien ThomasCounts L1D writeback operations to the L2. 12831fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.FILL 12841fa7f10bSFabien Thomas.Pq Event F0H , Umask 20H 12851fa7f10bSFabien ThomasCounts L2 cache line fill operations due to load, RFO, L1D writeback or 12861fa7f10bSFabien Thomasprefetch. 12871fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.WB 12881fa7f10bSFabien Thomas.Pq Event F0H , Umask 40H 12891fa7f10bSFabien ThomasCounts L2 writeback operations to the L3. 12901fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.ANY 12911fa7f10bSFabien Thomas.Pq Event F0H , Umask 80H 12921fa7f10bSFabien ThomasCounts all L2 cache operations. 12931fa7f10bSFabien Thomas.It Li L2_LINES_IN.S_STATE 12941fa7f10bSFabien Thomas.Pq Event F1H , Umask 02H 12951fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache in the S (shared) 12961fa7f10bSFabien Thomasstate. 12971fa7f10bSFabien Thomas.It Li L2_LINES_IN.E_STATE 12981fa7f10bSFabien Thomas.Pq Event F1H , Umask 04H 12991fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache in the E 13001fa7f10bSFabien Thomas(exclusive) state. 13011fa7f10bSFabien Thomas.It Li L2_LINES_IN.ANY 13021fa7f10bSFabien Thomas.Pq Event F1H , Umask 07H 13031fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache. 13041fa7f10bSFabien Thomas.It Li L2_LINES_OUT.DEMAND_CLEAN 13051fa7f10bSFabien Thomas.Pq Event F2H , Umask 01H 13061fa7f10bSFabien ThomasCounts L2 clean cache lines evicted by a demand request. 13071fa7f10bSFabien Thomas.It Li L2_LINES_OUT.DEMAND_DIRTY 13081fa7f10bSFabien Thomas.Pq Event F2H , Umask 02H 13091fa7f10bSFabien ThomasCounts L2 dirty (modified) cache lines evicted by a demand request. 13101fa7f10bSFabien Thomas.It Li L2_LINES_OUT.PREFETCH_CLEAN 13111fa7f10bSFabien Thomas.Pq Event F2H , Umask 04H 13121fa7f10bSFabien ThomasCounts L2 clean cache line evicted by a prefetch request. 13131fa7f10bSFabien Thomas.It Li L2_LINES_OUT.PREFETCH_DIRTY 13141fa7f10bSFabien Thomas.Pq Event F2H , Umask 08H 13151fa7f10bSFabien ThomasCounts L2 modified cache line evicted by a prefetch request. 13161fa7f10bSFabien Thomas.It Li L2_LINES_OUT.ANY 13171fa7f10bSFabien Thomas.Pq Event F2H , Umask 0FH 13181fa7f10bSFabien ThomasCounts all L2 cache lines evicted for any reason. 13191fa7f10bSFabien Thomas.It Li SQ_MISC.LRU_HINTS 13201fa7f10bSFabien Thomas.Pq Event F4H , Umask 04H 13211fa7f10bSFabien ThomasCounts number of Super Queue LRU hints sent to L3. 13221fa7f10bSFabien Thomas.It Li SQ_MISC.SPLIT_LOCK 13231fa7f10bSFabien Thomas.Pq Event F4H , Umask 10H 13241fa7f10bSFabien ThomasCounts the number of SQ lock splits across a cache line. 13251fa7f10bSFabien Thomas.It Li SQ_FULL_STALL_CYCLES 13261fa7f10bSFabien Thomas.Pq Event F6H , Umask 01H 1327bb374ac2SGlen BarberCounts cycles the Super Queue is full. 1328bb374ac2SGlen BarberNeither of the threads on this core will be able to access the uncore. 13291fa7f10bSFabien Thomas.It Li FP_ASSIST.ALL 13301fa7f10bSFabien Thomas.Pq Event F7H , Umask 01H 13311fa7f10bSFabien ThomasCounts the number of floating point operations executed that required 1332bb374ac2SGlen Barbermicro-code assist intervention. 1333bb374ac2SGlen BarberAssists are required in the following cases: 13341fa7f10bSFabien ThomasSSE instructions, (Denormal input when the DAZ flag is off or Underflow 13351fa7f10bSFabien Thomasresult when the FTZ flag is off): x87 instructions, (NaN or denormal are 13361fa7f10bSFabien Thomasloaded to a register or used as input from memory, Division by 0 or 13371fa7f10bSFabien ThomasUnderflow output). 13381fa7f10bSFabien Thomas.It Li FP_ASSIST.OUTPUT 13391fa7f10bSFabien Thomas.Pq Event F7H , Umask 02H 13401fa7f10bSFabien ThomasCounts number of floating point micro-code assist when the output value 13411fa7f10bSFabien Thomas(destination register) is invalid. 13421fa7f10bSFabien Thomas.It Li FP_ASSIST.INPUT 13431fa7f10bSFabien Thomas.Pq Event F7H , Umask 04H 13441fa7f10bSFabien ThomasCounts number of floating point micro-code assist when the input value (one 13451fa7f10bSFabien Thomasof the source operands to an FP instruction) is invalid. 13461fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_MPY 13471fa7f10bSFabien Thomas.Pq Event FDH , Umask 01H 13481fa7f10bSFabien ThomasCounts number of SID integer 64 bit packed multiply operations. 13491fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_SHIFT 13501fa7f10bSFabien Thomas.Pq Event FDH , Umask 02H 13511fa7f10bSFabien ThomasCounts number of SID integer 64 bit packed shift operations. 13521fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACK 13531fa7f10bSFabien Thomas.Pq Event FDH , Umask 04H 13541fa7f10bSFabien ThomasCounts number of SID integer 64 bit pack operations. 13551fa7f10bSFabien Thomas.It Li SIMD_INT_64.UNPACK 13561fa7f10bSFabien Thomas.Pq Event FDH , Umask 08H 13571fa7f10bSFabien ThomasCounts number of SID integer 64 bit unpack operations. 13581fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_LOGICAL 13591fa7f10bSFabien Thomas.Pq Event FDH , Umask 10H 13601fa7f10bSFabien ThomasCounts number of SID integer 64 bit logical operations. 13611fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_ARITH 13621fa7f10bSFabien Thomas.Pq Event FDH , Umask 20H 13631fa7f10bSFabien ThomasCounts number of SID integer 64 bit arithmetic operations. 13641fa7f10bSFabien Thomas.It Li SIMD_INT_64.SHUFFLE_MOVE 13651fa7f10bSFabien Thomas.Pq Event FDH , Umask 40H 13661fa7f10bSFabien ThomasCounts number of SID integer 64 bit shift or move operations. 13671fa7f10bSFabien Thomas.El 13681fa7f10bSFabien Thomas.Sh SEE ALSO 13691fa7f10bSFabien Thomas.Xr pmc 3 , 1370*b2934971SMitchell Horne.Xr pmc.amd 3 , 13711fa7f10bSFabien Thomas.Xr pmc.atom 3 , 13721fa7f10bSFabien Thomas.Xr pmc.core 3 , 137373461c24SJoel Dahl.Xr pmc.corei7 3 , 137473461c24SJoel Dahl.Xr pmc.corei7uc 3 , 13751fa7f10bSFabien Thomas.Xr pmc.iaf 3 , 1376f5f9340bSFabien Thomas.Xr pmc.soft 3 , 13771fa7f10bSFabien Thomas.Xr pmc.tsc 3 , 137873461c24SJoel Dahl.Xr pmc.ucf 3 , 137973461c24SJoel Dahl.Xr pmc.westmereuc 3 , 13801fa7f10bSFabien Thomas.Xr pmc_cpuinfo 3 , 13811fa7f10bSFabien Thomas.Xr pmclog 3 , 13821fa7f10bSFabien Thomas.Xr hwpmc 4 13831fa7f10bSFabien Thomas.Sh HISTORY 13841fa7f10bSFabien ThomasThe 13851fa7f10bSFabien Thomas.Nm pmc 13861fa7f10bSFabien Thomaslibrary first appeared in 13871fa7f10bSFabien Thomas.Fx 6.0 . 13881fa7f10bSFabien Thomas.Sh AUTHORS 13891fa7f10bSFabien ThomasThe 13901fa7f10bSFabien Thomas.Lb libpmc 13911fa7f10bSFabien Thomaslibrary was written by 13922b7af31cSBaptiste Daroussin.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org . 1393