xref: /freebsd/lib/libpmc/pmc.westmere.3 (revision f5f9340b9807d44d200658ba1bbbbbb57ab72e07)
11fa7f10bSFabien Thomas.\" Copyright (c) 2010 Fabien Thomas.  All rights reserved.
21fa7f10bSFabien Thomas.\"
31fa7f10bSFabien Thomas.\" Redistribution and use in source and binary forms, with or without
41fa7f10bSFabien Thomas.\" modification, are permitted provided that the following conditions
51fa7f10bSFabien Thomas.\" are met:
61fa7f10bSFabien Thomas.\" 1. Redistributions of source code must retain the above copyright
71fa7f10bSFabien Thomas.\"    notice, this list of conditions and the following disclaimer.
81fa7f10bSFabien Thomas.\" 2. Redistributions in binary form must reproduce the above copyright
91fa7f10bSFabien Thomas.\"    notice, this list of conditions and the following disclaimer in the
101fa7f10bSFabien Thomas.\"    documentation and/or other materials provided with the distribution.
111fa7f10bSFabien Thomas.\"
12026dbd29SChristian Brueffer.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13026dbd29SChristian Brueffer.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14026dbd29SChristian Brueffer.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15026dbd29SChristian Brueffer.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16026dbd29SChristian Brueffer.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17026dbd29SChristian Brueffer.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18026dbd29SChristian Brueffer.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19026dbd29SChristian Brueffer.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20026dbd29SChristian Brueffer.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21026dbd29SChristian Brueffer.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22026dbd29SChristian Brueffer.\" SUCH DAMAGE.
231fa7f10bSFabien Thomas.\"
241fa7f10bSFabien Thomas.\" $FreeBSD$
251fa7f10bSFabien Thomas.\"
263102cfe2SGlen Barber.Dd February 25, 2012
271fa7f10bSFabien Thomas.Dt PMC.WESTMERE 3
28aa12cea2SUlrich Spörlein.Os
291fa7f10bSFabien Thomas.Sh NAME
301fa7f10bSFabien Thomas.Nm pmc.westmere
311fa7f10bSFabien Thomas.Nd measurement events for
321fa7f10bSFabien Thomas.Tn Intel
331fa7f10bSFabien Thomas.Tn Westmere
341fa7f10bSFabien Thomasfamily CPUs
351fa7f10bSFabien Thomas.Sh LIBRARY
361fa7f10bSFabien Thomas.Lb libpmc
371fa7f10bSFabien Thomas.Sh SYNOPSIS
381fa7f10bSFabien Thomas.In pmc.h
391fa7f10bSFabien Thomas.Sh DESCRIPTION
401fa7f10bSFabien Thomas.Tn Intel
411fa7f10bSFabien Thomas.Tn "Westmere"
421fa7f10bSFabien ThomasCPUs contain PMCs conforming to version 2 of the
431fa7f10bSFabien Thomas.Tn Intel
441fa7f10bSFabien Thomasperformance measurement architecture.
451fa7f10bSFabien ThomasThese CPUs may contain up to three classes of PMCs:
461fa7f10bSFabien Thomas.Bl -tag -width "Li PMC_CLASS_IAP"
471fa7f10bSFabien Thomas.It Li PMC_CLASS_IAF
481fa7f10bSFabien ThomasFixed-function counters that count only one hardware event per counter.
491fa7f10bSFabien Thomas.It Li PMC_CLASS_IAP
501fa7f10bSFabien ThomasProgrammable counters that may be configured to count one of a defined
511fa7f10bSFabien Thomasset of hardware events.
521fa7f10bSFabien Thomas.El
531fa7f10bSFabien Thomas.Pp
541fa7f10bSFabien ThomasThe number of PMCs available in each class and their widths need to be
551fa7f10bSFabien Thomasdetermined at run time by calling
561fa7f10bSFabien Thomas.Xr pmc_cpuinfo 3 .
571fa7f10bSFabien Thomas.Pp
581fa7f10bSFabien ThomasIntel Westmere PMCs are documented in
591fa7f10bSFabien Thomas.Rs
601fa7f10bSFabien Thomas.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual"
611fa7f10bSFabien Thomas.%T "Volume 3B: System Programming Guide, Part 2"
621fa7f10bSFabien Thomas.%N "Order Number: 253669-033US"
631fa7f10bSFabien Thomas.%D December 2009
641fa7f10bSFabien Thomas.%Q "Intel Corporation"
651fa7f10bSFabien Thomas.Re
661fa7f10bSFabien Thomas.Ss WESTMERE FIXED FUNCTION PMCS
671fa7f10bSFabien ThomasThese PMCs and their supported events are documented in
681fa7f10bSFabien Thomas.Xr pmc.iaf 3 .
691fa7f10bSFabien Thomas.Ss WESTMERE PROGRAMMABLE PMCS
701fa7f10bSFabien ThomasThe programmable PMCs support the following capabilities:
711fa7f10bSFabien Thomas.Bl -column "PMC_CAP_INTERRUPT" "Support"
721fa7f10bSFabien Thomas.It Em Capability Ta Em Support
731fa7f10bSFabien Thomas.It PMC_CAP_CASCADE Ta \&No
741fa7f10bSFabien Thomas.It PMC_CAP_EDGE Ta Yes
751fa7f10bSFabien Thomas.It PMC_CAP_INTERRUPT Ta Yes
761fa7f10bSFabien Thomas.It PMC_CAP_INVERT Ta Yes
771fa7f10bSFabien Thomas.It PMC_CAP_READ Ta Yes
781fa7f10bSFabien Thomas.It PMC_CAP_PRECISE Ta \&No
791fa7f10bSFabien Thomas.It PMC_CAP_SYSTEM Ta Yes
801fa7f10bSFabien Thomas.It PMC_CAP_TAGGING Ta \&No
811fa7f10bSFabien Thomas.It PMC_CAP_THRESHOLD Ta Yes
821fa7f10bSFabien Thomas.It PMC_CAP_USER Ta Yes
831fa7f10bSFabien Thomas.It PMC_CAP_WRITE Ta Yes
841fa7f10bSFabien Thomas.El
851fa7f10bSFabien Thomas.Ss Event Qualifiers
861fa7f10bSFabien ThomasEvent specifiers for these PMCs support the following common
871fa7f10bSFabien Thomasqualifiers:
881fa7f10bSFabien Thomas.Bl -tag -width indent
891fa7f10bSFabien Thomas.It Li rsp= Ns Ar value
901fa7f10bSFabien ThomasConfigure the Off-core Response bits.
911fa7f10bSFabien Thomas.Bl -tag -width indent
921fa7f10bSFabien Thomas.It Li DMND_DATA_RD
931fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch data reads of full
941fa7f10bSFabien Thomasand partial cachelines as well as demand data page table entry
95bb374ac2SGlen Barbercacheline reads.
96bb374ac2SGlen BarberDoes not count L2 data read prefetches or
971fa7f10bSFabien Thomasinstruction fetches.
981fa7f10bSFabien Thomas.It Li DMND_RFO
991fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch reads for ownership
100bb374ac2SGlen Barber(RFO) requests generated by a write to data cacheline.
101bb374ac2SGlen BarberDoes not count L2 RFO.
1021fa7f10bSFabien Thomas.It Li DMND_IFETCH
1031fa7f10bSFabien ThomasCounts the number of demand and DCU prefetch instruction cacheline
104bb374ac2SGlen Barberreads.
105bb374ac2SGlen BarberDoes not count L2 code read prefetches.
1061fa7f10bSFabien ThomasWB
1071fa7f10bSFabien ThomasCounts the number of writeback (modified to exclusive) transactions.
1081fa7f10bSFabien Thomas.It Li PF_DATA_RD
1091fa7f10bSFabien ThomasCounts the number of data cacheline reads generated by L2 prefetchers.
1101fa7f10bSFabien Thomas.It Li PF_RFO
1111fa7f10bSFabien ThomasCounts the number of RFO requests generated by L2 prefetchers.
1121fa7f10bSFabien Thomas.It Li PF_IFETCH
1131fa7f10bSFabien ThomasCounts the number of code reads generated by L2 prefetchers.
1141fa7f10bSFabien Thomas.It Li OTHER
1151fa7f10bSFabien ThomasCounts one of the following transaction types, including L3 invalidate,
1161fa7f10bSFabien ThomasI/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences,
1171fa7f10bSFabien Thomaslock, unlock, split lock.
1181fa7f10bSFabien Thomas.It Li UNCORE_HIT
1191fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore
1201fa7f10bSFabien Thomaswith no coherency actions required (snooping).
1211fa7f10bSFabien Thomas.It Li OTHER_CORE_HIT_SNP
1221fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore
1231fa7f10bSFabien Thomasand was serviced by another core with a cross core snoop where no modified
1241fa7f10bSFabien Thomascopies were found (clean).
1251fa7f10bSFabien Thomas.It Li OTHER_CORE_HITM
1261fa7f10bSFabien ThomasL3 Hit: local or remote home requests that hit L3 cache in the uncore
1271fa7f10bSFabien Thomasand was serviced by another core with a cross core snoop where modified
1281fa7f10bSFabien Thomascopies were found (HITM).
1291fa7f10bSFabien Thomas.It Li REMOTE_CACHE_FWD
1301fa7f10bSFabien ThomasL3 Miss: local homed requests that missed the L3 cache and was serviced
1311fa7f10bSFabien Thomasby forwarded data following a cross package snoop where no modified
1321fa7f10bSFabien Thomascopies found. (Remote home requests are not counted)
1331fa7f10bSFabien Thomas.It Li REMOTE_DRAM
1341fa7f10bSFabien ThomasL3 Miss: remote home requests that missed the L3 cache and were serviced
1351fa7f10bSFabien Thomasby remote DRAM.
1361fa7f10bSFabien Thomas.It Li LOCAL_DRAM
1371fa7f10bSFabien ThomasL3 Miss: local home requests that missed the L3 cache and were serviced
1381fa7f10bSFabien Thomasby local DRAM.
1391fa7f10bSFabien Thomas.It Li NON_DRAM
1401fa7f10bSFabien ThomasNon-DRAM requests that were serviced by IOH.
1411fa7f10bSFabien Thomas.El
1421fa7f10bSFabien Thomas.It Li cmask= Ns Ar value
1431fa7f10bSFabien ThomasConfigure the PMC to increment only if the number of configured
1441fa7f10bSFabien Thomasevents measured in a cycle is greater than or equal to
1451fa7f10bSFabien Thomas.Ar value .
1461fa7f10bSFabien Thomas.It Li edge
1471fa7f10bSFabien ThomasConfigure the PMC to count the number of de-asserted to asserted
1481fa7f10bSFabien Thomastransitions of the conditions expressed by the other qualifiers.
1491fa7f10bSFabien ThomasIf specified, the counter will increment only once whenever a
1501fa7f10bSFabien Thomascondition becomes true, irrespective of the number of clocks during
1511fa7f10bSFabien Thomaswhich the condition remains true.
1521fa7f10bSFabien Thomas.It Li inv
1531fa7f10bSFabien ThomasInvert the sense of comparison when the
1541fa7f10bSFabien Thomas.Dq Li cmask
1551fa7f10bSFabien Thomasqualifier is present, making the counter increment when the number of
1561fa7f10bSFabien Thomasevents per cycle is less than the value specified by the
1571fa7f10bSFabien Thomas.Dq Li cmask
1581fa7f10bSFabien Thomasqualifier.
1591fa7f10bSFabien Thomas.It Li os
1601fa7f10bSFabien ThomasConfigure the PMC to count events happening at processor privilege
1611fa7f10bSFabien Thomaslevel 0.
1621fa7f10bSFabien Thomas.It Li usr
1631fa7f10bSFabien ThomasConfigure the PMC to count events occurring at privilege levels 1, 2
1641fa7f10bSFabien Thomasor 3.
1651fa7f10bSFabien Thomas.El
1661fa7f10bSFabien Thomas.Pp
1671fa7f10bSFabien ThomasIf neither of the
1681fa7f10bSFabien Thomas.Dq Li os
1691fa7f10bSFabien Thomasor
1701fa7f10bSFabien Thomas.Dq Li usr
1711fa7f10bSFabien Thomasqualifiers are specified, the default is to enable both.
1721fa7f10bSFabien Thomas.Ss Event Specifiers (Programmable PMCs)
1731fa7f10bSFabien ThomasWestmere programmable PMCs support the following events:
1741fa7f10bSFabien Thomas.Bl -tag -width indent
1751fa7f10bSFabien Thomas.It Li LOAD_BLOCK.OVERLAP_STORE
1761fa7f10bSFabien Thomas.Pq Event 03H , Umask 02H
1771fa7f10bSFabien ThomasLoads that partially overlap an earlier store
1781fa7f10bSFabien Thomas.It Li SB_DRAIN.ANY
1791fa7f10bSFabien Thomas.Pq Event 04H , Umask 07H
1801fa7f10bSFabien ThomasAll Store buffer stall cycles
1811fa7f10bSFabien Thomas.It Li MISALIGN_MEMORY.STORE
1821fa7f10bSFabien Thomas.Pq Event 05H , Umask 02H
1831fa7f10bSFabien ThomasAll store referenced with misaligned address
1841fa7f10bSFabien Thomas.It Li STORE_BLOCKS.AT_RET
1851fa7f10bSFabien Thomas.Pq Event 06H , Umask 04H
186bb374ac2SGlen BarberCounts number of loads delayed with at-Retirement block code.
187bb374ac2SGlen BarberThe following
1881fa7f10bSFabien Thomasloads need to be executed at retirement and wait for all senior stores on
1891fa7f10bSFabien Thomasthe same thread to be drained: load splitting across 4K boundary (page
1901fa7f10bSFabien Thomassplit), load accessing uncacheable (UC or USWC) memory, load lock, and load
1911fa7f10bSFabien Thomaswith page table in UC or USWC memory region.
1921fa7f10bSFabien Thomas.It Li STORE_BLOCKS.L1D_BLOCK
1931fa7f10bSFabien Thomas.Pq Event 06H , Umask 08H
1941fa7f10bSFabien ThomasCacheable loads delayed with L1D block code
1951fa7f10bSFabien Thomas.It Li PARTIAL_ADDRESS_ALIAS
1961fa7f10bSFabien Thomas.Pq Event 07H , Umask 01H
1971fa7f10bSFabien ThomasCounts false dependency due to partial address aliasing
1981fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.ANY
1991fa7f10bSFabien Thomas.Pq Event 08H , Umask 01H
2001fa7f10bSFabien ThomasCounts all load misses that cause a page walk
2011fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.WALK_COMPLETED
2021fa7f10bSFabien Thomas.Pq Event 08H , Umask 02H
2031fa7f10bSFabien ThomasCounts number of completed page walks due to load miss in the STLB.
2041fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.WALK_CYCLES
2051fa7f10bSFabien Thomas.Pq Event 08H , Umask 04H
2061fa7f10bSFabien ThomasCycles PMH is busy with a page walk due to a load miss in the STLB.
2071fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.STLB_HIT
2081fa7f10bSFabien Thomas.Pq Event 08H , Umask 10H
2091fa7f10bSFabien ThomasNumber of cache load STLB hits
2101fa7f10bSFabien Thomas.It Li DTLB_LOAD_MISSES.PDE_MISS
2111fa7f10bSFabien Thomas.Pq Event 08H , Umask 20H
2121fa7f10bSFabien ThomasNumber of DTLB cache load misses where the low part of the linear to
2131fa7f10bSFabien Thomasphysical address translation was missed.
2141fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.LOADS
2151fa7f10bSFabien Thomas.Pq Event 0BH , Umask 01H
2161fa7f10bSFabien ThomasCounts the number of instructions with an architecturally-visible store
2171fa7f10bSFabien Thomasretired on the architected path.
2181fa7f10bSFabien ThomasIn conjunction with ld_lat facility
2191fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.STORES
2201fa7f10bSFabien Thomas.Pq Event 0BH , Umask 02H
2211fa7f10bSFabien ThomasCounts the number of instructions with an architecturally-visible store
2221fa7f10bSFabien Thomasretired on the architected path.
2231fa7f10bSFabien ThomasIn conjunction with ld_lat facility
2241fa7f10bSFabien Thomas.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD
2251fa7f10bSFabien Thomas.Pq Event 0BH , Umask 10H
2261fa7f10bSFabien ThomasCounts the number of instructions exceeding the latency specified with
2271fa7f10bSFabien Thomasld_lat facility.
2281fa7f10bSFabien ThomasIn conjunction with ld_lat facility
2291fa7f10bSFabien Thomas.It Li MEM_STORE_RETIRED.DTLB_MISS
2301fa7f10bSFabien Thomas.Pq Event 0CH , Umask 01H
231bb374ac2SGlen BarberThe event counts the number of retired stores that missed the DTLB.
232bb374ac2SGlen BarberThe DTLB miss is not counted if the store operation causes a fault.
233bb374ac2SGlen BarberDoes not counter prefetches.
234bb374ac2SGlen BarberCounts both primary and secondary misses to the TLB
2351fa7f10bSFabien Thomas.It Li UOPS_ISSUED.ANY
2361fa7f10bSFabien Thomas.Pq Event 0EH , Umask 01H
2371fa7f10bSFabien ThomasCounts the number of Uops issued by the Register Allocation Table to the
2381fa7f10bSFabien ThomasReservation Station, i.e. the UOPs issued from the front end to the back
2391fa7f10bSFabien Thomasend.
2401fa7f10bSFabien Thomas.It Li UOPS_ISSUED.STALLED_CYCLES
2411fa7f10bSFabien Thomas.Pq Event 0EH , Umask 01H
2421fa7f10bSFabien ThomasCounts the number of cycles no Uops issued by the Register Allocation Table
2431fa7f10bSFabien Thomasto the Reservation Station, i.e. the UOPs issued from the front end to the
2441fa7f10bSFabien Thomasback end.
2451fa7f10bSFabien Thomasset invert=1, cmask = 1
2461fa7f10bSFabien Thomas.It Li UOPS_ISSUED.FUSED
2471fa7f10bSFabien Thomas.Pq Event 0EH , Umask 02H
2481fa7f10bSFabien ThomasCounts the number of fused Uops that were issued from the Register
2491fa7f10bSFabien ThomasAllocation Table to the Reservation Station.
2501fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_HITM
2511fa7f10bSFabien Thomas.Pq Event 0FH , Umask 02H
2521fa7f10bSFabien ThomasLoad instructions retired that HIT modified data in sibling core (Precise
2531fa7f10bSFabien ThomasEvent)
2541fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT
2551fa7f10bSFabien Thomas.Pq Event 0FH , Umask 08H
2561fa7f10bSFabien ThomasLoad instructions retired local dram and remote cache HIT data sources
2571fa7f10bSFabien Thomas(Precise Event)
2581fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM
2591fa7f10bSFabien Thomas.Pq Event 0FH , Umask 10H
2601fa7f10bSFabien ThomasLoad instructions retired with a data source of local DRAM or locally homed
2611fa7f10bSFabien Thomasremote cache HITM (Precise Event)
2621fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM
2631fa7f10bSFabien Thomas.Pq Event 0FH , Umask 20H
2641fa7f10bSFabien ThomasLoad instructions retired remote DRAM and remote home-remote cache HITM
2651fa7f10bSFabien Thomas(Precise Event)
2661fa7f10bSFabien Thomas.It Li MEM_UNCORE_RETIRED.UNCACHEABLE
2671fa7f10bSFabien Thomas.Pq Event 0FH , Umask 80H
2681fa7f10bSFabien ThomasLoad instructions retired I/O (Precise Event)
2691fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.X87
2701fa7f10bSFabien Thomas.Pq Event 10H , Umask 01H
271bb374ac2SGlen BarberCounts the number of FP Computational Uops Executed.
272bb374ac2SGlen BarberThe number of FADD,
2731fa7f10bSFabien ThomasFSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
274bb374ac2SGlen BarberDIVs, and IDIVs.
275bb374ac2SGlen BarberThis event does not distinguish an FADD used in the middle
2761fa7f10bSFabien Thomasof a transcendental flow from a separate FADD instruction.
2771fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.MMX
2781fa7f10bSFabien Thomas.Pq Event 10H , Umask 02H
2791fa7f10bSFabien ThomasCounts number of MMX Uops executed.
2801fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP
2811fa7f10bSFabien Thomas.Pq Event 10H , Umask 04H
2821fa7f10bSFabien ThomasCounts number of SSE and SSE2 FP uops executed.
2831fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE2_INTEGER
2841fa7f10bSFabien Thomas.Pq Event 10H , Umask 08H
2851fa7f10bSFabien ThomasCounts number of SSE2 integer uops executed.
2861fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED
2871fa7f10bSFabien Thomas.Pq Event 10H , Umask 10H
2881fa7f10bSFabien ThomasCounts number of SSE FP packed uops executed.
2891fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR
2901fa7f10bSFabien Thomas.Pq Event 10H , Umask 20H
2911fa7f10bSFabien ThomasCounts number of SSE FP scalar uops executed.
2921fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
2931fa7f10bSFabien Thomas.Pq Event 10H , Umask 40H
2941fa7f10bSFabien ThomasCounts number of SSE* FP single precision uops executed.
2951fa7f10bSFabien Thomas.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
2961fa7f10bSFabien Thomas.Pq Event 10H , Umask 80H
2971fa7f10bSFabien ThomasCounts number of SSE* FP double precision uops executed.
2981fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_MPY
2991fa7f10bSFabien Thomas.Pq Event 12H , Umask 01H
3001fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer multiply operations.
3011fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_SHIFT
3021fa7f10bSFabien Thomas.Pq Event 12H , Umask 02H
3031fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer shift operations.
3041fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACK
3051fa7f10bSFabien Thomas.Pq Event 12H , Umask 04H
3061fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer pack operations.
3071fa7f10bSFabien Thomas.It Li SIMD_INT_128.UNPACK
3081fa7f10bSFabien Thomas.Pq Event 12H , Umask 08H
3091fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer unpack operations.
3101fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_LOGICAL
3111fa7f10bSFabien Thomas.Pq Event 12H , Umask 10H
3121fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer logical operations.
3131fa7f10bSFabien Thomas.It Li SIMD_INT_128.PACKED_ARITH
3141fa7f10bSFabien Thomas.Pq Event 12H , Umask 20H
3151fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer arithmetic operations.
3161fa7f10bSFabien Thomas.It Li SIMD_INT_128.SHUFFLE_MOVE
3171fa7f10bSFabien Thomas.Pq Event 12H , Umask 40H
3181fa7f10bSFabien ThomasCounts number of 128 bit SIMD integer shuffle and move operations.
3191fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.RS
3201fa7f10bSFabien Thomas.Pq Event 13H , Umask 01H
3211fa7f10bSFabien ThomasCounts number of loads dispatched from the Reservation Station that bypass
3221fa7f10bSFabien Thomasthe Memory Order Buffer.
3231fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.RS_DELAYED
3241fa7f10bSFabien Thomas.Pq Event 13H , Umask 02H
325bb374ac2SGlen BarberCounts the number of delayed RS dispatches at the stage latch.
326bb374ac2SGlen BarberIf an RS dispatch can not bypass to LB, it has another chance to dispatch
327bb374ac2SGlen Barberfrom the one-cycle delayed staging latch before it is written into the LB.
3281fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.MOB
3291fa7f10bSFabien Thomas.Pq Event 13H , Umask 04H
3301fa7f10bSFabien ThomasCounts the number of loads dispatched from the Reservation Station to the
3311fa7f10bSFabien ThomasMemory Order Buffer.
3321fa7f10bSFabien Thomas.It Li LOAD_DISPATCH.ANY
3331fa7f10bSFabien Thomas.Pq Event 13H , Umask 07H
3341fa7f10bSFabien ThomasCounts all loads dispatched from the Reservation Station.
3351fa7f10bSFabien Thomas.It Li ARITH.CYCLES_DIV_BUSY
3361fa7f10bSFabien Thomas.Pq Event 14H , Umask 01H
3371fa7f10bSFabien ThomasCounts the number of cycles the divider is busy executing divide or square
338bb374ac2SGlen Barberroot operations.
339bb374ac2SGlen BarberThe divide can be integer, X87 or Streaming SIMD Extensions (SSE).
340bb374ac2SGlen BarberThe square root operation can be either X87 or SSE.
3411fa7f10bSFabien ThomasSet 'edge =1, invert=1, cmask=1' to count the number of divides.
3421fa7f10bSFabien ThomasCount may be incorrect When SMT is on
3431fa7f10bSFabien Thomas.It Li ARITH.MUL
3441fa7f10bSFabien Thomas.Pq Event 14H , Umask 02H
345bb374ac2SGlen BarberCounts the number of multiply operations executed.
346bb374ac2SGlen BarberThis includes integer as
3471fa7f10bSFabien Thomaswell as floating point multiply operations but excludes DPPS mul and MPSAD.
3481fa7f10bSFabien ThomasCount may be incorrect When SMT is on
3491fa7f10bSFabien Thomas.It Li INST_QUEUE_WRITES
3501fa7f10bSFabien Thomas.Pq Event 17H , Umask 01H
3511fa7f10bSFabien ThomasCounts the number of instructions written into the instruction queue every
3521fa7f10bSFabien Thomascycle.
3531fa7f10bSFabien Thomas.It Li INST_DECODED.DEC0
3541fa7f10bSFabien Thomas.Pq Event 18H , Umask 01H
355bb374ac2SGlen BarberCounts number of instructions that require decoder 0 to be decoded.
356bb374ac2SGlen BarberUsually, this means that the instruction maps to more than 1 uop
3571fa7f10bSFabien Thomas.It Li TWO_UOP_INSTS_DECODED
3581fa7f10bSFabien Thomas.Pq Event 19H , Umask 01H
3591fa7f10bSFabien ThomasAn instruction that generates two uops was decoded
3601fa7f10bSFabien Thomas.It Li INST_QUEUE_WRITE_CYCLES
3611fa7f10bSFabien Thomas.Pq Event 1EH , Umask 01H
3621fa7f10bSFabien ThomasThis event counts the number of cycles during which instructions are written
363bb374ac2SGlen Barberto the instruction queue.
364bb374ac2SGlen BarberDividing this counter by the number of
3651fa7f10bSFabien Thomasinstructions written to the instruction queue (INST_QUEUE_WRITES) yields the
366bb374ac2SGlen Barberaverage number of instructions decoded each cycle.
367bb374ac2SGlen BarberIf this number is less
3681fa7f10bSFabien Thomasthan four and the pipe stalls, this indicates that the decoder is failing to
3691fa7f10bSFabien Thomasdecode enough instructions per cycle to sustain the 4-wide pipeline.
3701fa7f10bSFabien ThomasIf SSE* instructions that are 6 bytes or longer arrive one after another,
371bb374ac2SGlen Barberthen front end throughput may limit execution speed.
372bb374ac2SGlen BarberIn such case,
3731fa7f10bSFabien Thomas.It Li LSD_OVERFLOW
3741fa7f10bSFabien Thomas.Pq Event 20H , Umask 01H
3751fa7f10bSFabien ThomasNumber of loops that can not stream from the instruction queue.
3761fa7f10bSFabien Thomas.It Li L2_RQSTS.LD_HIT
3771fa7f10bSFabien Thomas.Pq Event 24H , Umask 01H
378bb374ac2SGlen BarberCounts number of loads that hit the L2 cache.
379bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches.
380bb374ac2SGlen BarberL2 loads can be rejected for various reasons.
381bb374ac2SGlen BarberOnly non rejected loads are counted.
3821fa7f10bSFabien Thomas.It Li L2_RQSTS.LD_MISS
3831fa7f10bSFabien Thomas.Pq Event 24H , Umask 02H
384bb374ac2SGlen BarberCounts the number of loads that miss the L2 cache.
385bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches.
3861fa7f10bSFabien Thomas.It Li L2_RQSTS.LOADS
3871fa7f10bSFabien Thomas.Pq Event 24H , Umask 03H
388bb374ac2SGlen BarberCounts all L2 load requests.
389bb374ac2SGlen BarberL2 loads include both L1D demand misses as well as L1D prefetches.
3901fa7f10bSFabien Thomas.It Li L2_RQSTS.RFO_HIT
3911fa7f10bSFabien Thomas.Pq Event 24H , Umask 04H
392bb374ac2SGlen BarberCounts the number of store RFO requests that hit the L2 cache.
393bb374ac2SGlen BarberL2 RFO requests include both L1D demand RFO misses as well as L1D RFO
394bb374ac2SGlen Barberprefetches.
3951fa7f10bSFabien ThomasCount includes WC memory requests, where the data is not fetched but the
3961fa7f10bSFabien Thomaspermission to write the line is required.
3971fa7f10bSFabien Thomas.It Li L2_RQSTS.RFO_MISS
3981fa7f10bSFabien Thomas.Pq Event 24H , Umask 08H
399bb374ac2SGlen BarberCounts the number of store RFO requests that miss the L2 cache.
400bb374ac2SGlen BarberL2 RFO requests include both L1D demand RFO misses as well as L1D RFO
401bb374ac2SGlen Barberprefetches.
4021fa7f10bSFabien Thomas.It Li L2_RQSTS.RFOS
4031fa7f10bSFabien Thomas.Pq Event 24H , Umask 0CH
404bb374ac2SGlen BarberCounts all L2 store RFO requests.
405bb374ac2SGlen BarberL2 RFO requests include both L1D demand
40651cc3ad7SGeorge V. Neville-NeilRFO misses as well as L1D RFO prefetches.
4071fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCH_HIT
4081fa7f10bSFabien Thomas.Pq Event 24H , Umask 10H
409bb374ac2SGlen BarberCounts number of instruction fetches that hit the L2 cache.
410bb374ac2SGlen BarberL2 instruction fetches include both L1I demand misses as well as L1I
411bb374ac2SGlen Barberinstruction prefetches.
4121fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCH_MISS
4131fa7f10bSFabien Thomas.Pq Event 24H , Umask 20H
414bb374ac2SGlen BarberCounts number of instruction fetches that miss the L2 cache.
415bb374ac2SGlen BarberL2 instruction fetches include both L1I demand misses as well as L1I
416bb374ac2SGlen Barberinstruction prefetches.
4171fa7f10bSFabien Thomas.It Li L2_RQSTS.IFETCHES
4181fa7f10bSFabien Thomas.Pq Event 24H , Umask 30H
419bb374ac2SGlen BarberCounts all instruction fetches.
420bb374ac2SGlen BarberL2 instruction fetches include both L1I
4211fa7f10bSFabien Thomasdemand misses as well as L1I instruction prefetches.
4221fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCH_HIT
4231fa7f10bSFabien Thomas.Pq Event 24H , Umask 40H
4241fa7f10bSFabien ThomasCounts L2 prefetch hits for both code and data.
4251fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCH_MISS
4261fa7f10bSFabien Thomas.Pq Event 24H , Umask 80H
4271fa7f10bSFabien ThomasCounts L2 prefetch misses for both code and data.
4281fa7f10bSFabien Thomas.It Li L2_RQSTS.PREFETCHES
4291fa7f10bSFabien Thomas.Pq Event 24H , Umask C0H
4301fa7f10bSFabien ThomasCounts all L2 prefetches for both code and data.
4311fa7f10bSFabien Thomas.It Li L2_RQSTS.MISS
4321fa7f10bSFabien Thomas.Pq Event 24H , Umask AAH
4331fa7f10bSFabien ThomasCounts all L2 misses for both code and data.
4341fa7f10bSFabien Thomas.It Li L2_RQSTS.REFERENCES
4351fa7f10bSFabien Thomas.Pq Event 24H , Umask FFH
4361fa7f10bSFabien ThomasCounts all L2 requests for both code and data.
4371fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.I_STATE
4381fa7f10bSFabien Thomas.Pq Event 26H , Umask 01H
4391fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is
440bb374ac2SGlen Barberin the I (invalid) state, i.e. a cache miss.
441bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and L1D prefetches.
4421fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.S_STATE
4431fa7f10bSFabien Thomas.Pq Event 26H , Umask 02H
4441fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is
445bb374ac2SGlen Barberin the S (shared) state.
446bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and L1D
4471fa7f10bSFabien Thomasprefetches.
4481fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.E_STATE
4491fa7f10bSFabien Thomas.Pq Event 26H , Umask 04H
4501fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is
451bb374ac2SGlen Barberin the E (exclusive) state.
452bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and
4531fa7f10bSFabien ThomasL1D prefetches.
4541fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.M_STATE
4551fa7f10bSFabien Thomas.Pq Event 26H , Umask 08H
4561fa7f10bSFabien ThomasCounts number of L2 data demand loads where the cache line to be loaded is
457bb374ac2SGlen Barberin the M (modified) state.
458bb374ac2SGlen BarberL2 demand loads are both L1D demand misses and
4591fa7f10bSFabien ThomasL1D prefetches.
4601fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.DEMAND.MESI
4611fa7f10bSFabien Thomas.Pq Event 26H , Umask 0FH
462bb374ac2SGlen BarberCounts all L2 data demand requests.
463bb374ac2SGlen BarberL2 demand loads are both L1D demand
4641fa7f10bSFabien Thomasmisses and L1D prefetches.
4651fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
4661fa7f10bSFabien Thomas.Pq Event 26H , Umask 10H
4671fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is
4681fa7f10bSFabien Thomasin the I (invalid) state, i.e. a cache miss.
4691fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
4701fa7f10bSFabien Thomas.Pq Event 26H , Umask 20H
4711fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is
472bb374ac2SGlen Barberin the S (shared) state.
473bb374ac2SGlen BarberA prefetch RFO will miss on an S state line, while
4741fa7f10bSFabien Thomasa prefetch read will hit on an S state line.
4751fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
4761fa7f10bSFabien Thomas.Pq Event 26H , Umask 40H
4771fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is
4781fa7f10bSFabien Thomasin the E (exclusive) state.
4791fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.M_STATE
4801fa7f10bSFabien Thomas.Pq Event 26H , Umask 80H
4811fa7f10bSFabien ThomasCounts number of L2 prefetch data loads where the cache line to be loaded is
4821fa7f10bSFabien Thomasin the M (modified) state.
4831fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.PREFETCH.MESI
4841fa7f10bSFabien Thomas.Pq Event 26H , Umask F0H
4851fa7f10bSFabien ThomasCounts all L2 prefetch requests.
4861fa7f10bSFabien Thomas.It Li L2_DATA_RQSTS.ANY
4871fa7f10bSFabien Thomas.Pq Event 26H , Umask FFH
4881fa7f10bSFabien ThomasCounts all L2 data requests.
4891fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.I_STATE
4901fa7f10bSFabien Thomas.Pq Event 27H , Umask 01H
4911fa7f10bSFabien ThomasCounts number of L2 demand store RFO requests where the cache line to be
492bb374ac2SGlen Barberloaded is in the I (invalid) state, i.e, a cache miss.
493bb374ac2SGlen BarberThe L1D prefetcher
4941fa7f10bSFabien Thomasdoes not issue a RFO prefetch.
4951fa7f10bSFabien ThomasThis is a demand RFO request
4961fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.S_STATE
4971fa7f10bSFabien Thomas.Pq Event 27H , Umask 02H
4981fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is
499bb374ac2SGlen Barberin the S (shared) state.
500bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO prefetch.
50151cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request.
5021fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.M_STATE
5031fa7f10bSFabien Thomas.Pq Event 27H , Umask 08H
5041fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is
505bb374ac2SGlen Barberin the M (modified) state.
506bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO prefetch.
50751cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request.
5081fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.HIT
5091fa7f10bSFabien Thomas.Pq Event 27H , Umask 0EH
5101fa7f10bSFabien ThomasCounts number of L2 store RFO requests where the cache line to be loaded is
511bb374ac2SGlen Barberin either the S, E or M states.
512bb374ac2SGlen BarberThe L1D prefetcher does not issue a RFO
5131fa7f10bSFabien Thomasprefetch.
5141fa7f10bSFabien ThomasThis is a demand RFO request
5151fa7f10bSFabien Thomas.It Li L2_WRITE.RFO.MESI
5161fa7f10bSFabien Thomas.Pq Event 27H , Umask 0FH
5171fa7f10bSFabien ThomasCounts all L2 store RFO requests.The L1D prefetcher does not issue a RFO
5181fa7f10bSFabien Thomasprefetch.
51951cc3ad7SGeorge V. Neville-NeilThis is a demand RFO request.
5201fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.I_STATE
5211fa7f10bSFabien Thomas.Pq Event 27H , Umask 10H
5221fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be
5231fa7f10bSFabien Thomasloaded is in the I (invalid) state, i.e. a cache miss.
5241fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.S_STATE
5251fa7f10bSFabien Thomas.Pq Event 27H , Umask 20H
5261fa7f10bSFabien ThomasCounts number of L2 lock RFO requests where the cache line to be loaded is
5271fa7f10bSFabien Thomasin the S (shared) state.
5281fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.E_STATE
5291fa7f10bSFabien Thomas.Pq Event 27H , Umask 40H
5301fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be
5311fa7f10bSFabien Thomasloaded is in the E (exclusive) state.
5321fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.M_STATE
5331fa7f10bSFabien Thomas.Pq Event 27H , Umask 80H
5341fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be
5351fa7f10bSFabien Thomasloaded is in the M (modified) state.
5361fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.HIT
5371fa7f10bSFabien Thomas.Pq Event 27H , Umask E0H
5381fa7f10bSFabien ThomasCounts number of L2 demand lock RFO requests where the cache line to be
5391fa7f10bSFabien Thomasloaded is in either the S, E, or M state.
5401fa7f10bSFabien Thomas.It Li L2_WRITE.LOCK.MESI
5411fa7f10bSFabien Thomas.Pq Event 27H , Umask F0H
5421fa7f10bSFabien ThomasCounts all L2 demand lock RFO requests.
5431fa7f10bSFabien Thomas.It Li L1D_WB_L2.I_STATE
5441fa7f10bSFabien Thomas.Pq Event 28H , Umask 01H
5451fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written
5461fa7f10bSFabien Thomasis in the I (invalid) state, i.e. a cache miss.
5471fa7f10bSFabien Thomas.It Li L1D_WB_L2.S_STATE
5481fa7f10bSFabien Thomas.Pq Event 28H , Umask 02H
5491fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written
5501fa7f10bSFabien Thomasis in the S state.
5511fa7f10bSFabien Thomas.It Li L1D_WB_L2.E_STATE
5521fa7f10bSFabien Thomas.Pq Event 28H , Umask 04H
5531fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written
5541fa7f10bSFabien Thomasis in the E (exclusive) state.
5551fa7f10bSFabien Thomas.It Li L1D_WB_L2.M_STATE
5561fa7f10bSFabien Thomas.Pq Event 28H , Umask 08H
5571fa7f10bSFabien ThomasCounts number of L1 writebacks to the L2 where the cache line to be written
5581fa7f10bSFabien Thomasis in the M (modified) state.
5591fa7f10bSFabien Thomas.It Li L1D_WB_L2.MESI
5601fa7f10bSFabien Thomas.Pq Event 28H , Umask 0FH
5611fa7f10bSFabien ThomasCounts all L1 writebacks to the L2.
5621fa7f10bSFabien Thomas.It Li L3_LAT_CACHE.REFERENCE
5631fa7f10bSFabien Thomas.Pq Event 2EH , Umask 02H
564bb374ac2SGlen BarberCounts uncore Last Level Cache references.
565bb374ac2SGlen BarberBecause cache hierarchy, cache
5661fa7f10bSFabien Thomassizes and other implementation-specific characteristics; value comparison to
5671fa7f10bSFabien Thomasestimate performance differences is not recommended.
56851cc3ad7SGeorge V. Neville-NeilSee Table A-1.
5691fa7f10bSFabien Thomas.It Li L3_LAT_CACHE.MISS
5701fa7f10bSFabien Thomas.Pq Event 2EH , Umask 01H
571bb374ac2SGlen BarberCounts uncore Last Level Cache misses.
572bb374ac2SGlen BarberBecause cache hierarchy, cache sizes
5731fa7f10bSFabien Thomasand other implementation-specific characteristics; value comparison to
5741fa7f10bSFabien Thomasestimate performance differences is not recommended.
57551cc3ad7SGeorge V. Neville-NeilSee Table A-1.
5761fa7f10bSFabien Thomas.It Li CPU_CLK_UNHALTED.THREAD_P
5771fa7f10bSFabien Thomas.Pq Event 3CH , Umask 00H
5781fa7f10bSFabien ThomasCounts the number of thread cycles while the thread is not in a halt state.
579bb374ac2SGlen BarberThe thread enters the halt state when it is running the HLT instruction.
580bb374ac2SGlen BarberThe core frequency may change from time to time due to power or thermal
5811fa7f10bSFabien Thomasthrottling.
5821fa7f10bSFabien Thomassee Table A-1
5831fa7f10bSFabien Thomas.It Li CPU_CLK_UNHALTED.REF_P
5841fa7f10bSFabien Thomas.Pq Event 3CH , Umask 01H
5851fa7f10bSFabien ThomasIncrements at the frequency of TSC when not halted.
5861fa7f10bSFabien Thomassee Table A-1
5871fa7f10bSFabien Thomas.It Li DTLB_MISSES.ANY
5881fa7f10bSFabien Thomas.Pq Event 49H , Umask 01H
5891fa7f10bSFabien ThomasCounts the number of misses in the STLB which causes a page walk.
5901fa7f10bSFabien Thomas.It Li DTLB_MISSES.WALK_COMPLETED
5911fa7f10bSFabien Thomas.Pq Event 49H , Umask 02H
5921fa7f10bSFabien ThomasCounts number of misses in the STLB which resulted in a completed page walk.
5931fa7f10bSFabien Thomas.It Li DTLB_MISSES.WALK_CYCLES
5941fa7f10bSFabien Thomas.Pq Event 49H , Umask 04H
5951fa7f10bSFabien ThomasCounts cycles of page walk due to misses in the STLB.
5961fa7f10bSFabien Thomas.It Li DTLB_MISSES.STLB_HIT
5971fa7f10bSFabien Thomas.Pq Event 49H , Umask 10H
5981fa7f10bSFabien ThomasCounts the number of DTLB first level misses that hit in the second level
599bb374ac2SGlen BarberTLB.
600bb374ac2SGlen BarberThis event is only relevant if the core contains multiple DTLB levels.
6011fa7f10bSFabien Thomas.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
6021fa7f10bSFabien Thomas.Pq Event 49H , Umask 80H
6031fa7f10bSFabien ThomasCounts number of completed large page walks due to misses in the STLB.
6041fa7f10bSFabien Thomas.It Li LOAD_HIT_PRE
6051fa7f10bSFabien Thomas.Pq Event 4CH , Umask 01H
6061fa7f10bSFabien ThomasCounts load operations sent to the L1 data cache while a previous SSE
6071fa7f10bSFabien Thomasprefetch instruction to the same cache line has started prefetching but has
6081fa7f10bSFabien Thomasnot yet finished.
6091fa7f10bSFabien Thomas.It Li L1D_PREFETCH.REQUESTS
6101fa7f10bSFabien Thomas.Pq Event 4EH , Umask 01H
6111fa7f10bSFabien ThomasCounts number of hardware prefetch requests dispatched out of the prefetch
6121fa7f10bSFabien ThomasFIFO.
6131fa7f10bSFabien Thomas.It Li L1D_PREFETCH.MISS
6141fa7f10bSFabien Thomas.Pq Event 4EH , Umask 02H
615bb374ac2SGlen BarberCounts number of hardware prefetch requests that miss the L1D.
616bb374ac2SGlen BarberThere are two
617bb374ac2SGlen Barberprefetchers in the L1D.
618bb374ac2SGlen BarberA streamer, which predicts lines sequentially after
6191fa7f10bSFabien Thomasthis one should be fetched, and the IP prefetcher that remembers access
620bb374ac2SGlen Barberpatterns for the current instruction.
621bb374ac2SGlen BarberThe streamer prefetcher stops on an
6221fa7f10bSFabien ThomasL1D hit, while the IP prefetcher does not.
6231fa7f10bSFabien Thomas.It Li L1D_PREFETCH.TRIGGERS
6241fa7f10bSFabien Thomas.Pq Event 4EH , Umask 04H
6251fa7f10bSFabien ThomasCounts number of prefetch requests triggered by the Finite State Machine and
626bb374ac2SGlen Barberpushed into the prefetch FIFO.
627bb374ac2SGlen BarberSome of the prefetch requests are dropped due
6281fa7f10bSFabien Thomasto overwrites or competition between the IP index prefetcher and streamer
629bb374ac2SGlen Barberprefetcher.
630bb374ac2SGlen BarberThe prefetch FIFO contains 4 entries.
6311fa7f10bSFabien Thomas.It Li EPT.WALK_CYCLES
6321fa7f10bSFabien Thomas.Pq Event 4FH , Umask 10H
6331fa7f10bSFabien ThomasCounts Extended Page walk cycles.
6341fa7f10bSFabien Thomas.It Li L1D.REPL
6351fa7f10bSFabien Thomas.Pq Event 51H , Umask 01H
6361fa7f10bSFabien ThomasCounts the number of lines brought into the L1 data cache.
63751cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only.
6381fa7f10bSFabien Thomas.It Li L1D.M_REPL
6391fa7f10bSFabien Thomas.Pq Event 51H , Umask 02H
6401fa7f10bSFabien ThomasCounts the number of modified lines brought into the L1 data cache.
64151cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only.
6421fa7f10bSFabien Thomas.It Li L1D.M_EVICT
6431fa7f10bSFabien Thomas.Pq Event 51H , Umask 04H
6441fa7f10bSFabien ThomasCounts the number of modified lines evicted from the L1 data cache due to
6451fa7f10bSFabien Thomasreplacement.
64651cc3ad7SGeorge V. Neville-NeilCounter 0, 1 only.
6471fa7f10bSFabien Thomas.It Li L1D.M_SNOOP_EVICT
6481fa7f10bSFabien Thomas.Pq Event 51H , Umask 08H
6491fa7f10bSFabien ThomasCounts the number of modified lines evicted from the L1 data cache due to
6501fa7f10bSFabien Thomassnoop HITM intervention.
6511fa7f10bSFabien ThomasCounter 0, 1 only
6521fa7f10bSFabien Thomas.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT
6531fa7f10bSFabien Thomas.Pq Event 52H , Umask 01H
6541fa7f10bSFabien ThomasCounts the number of cacheable load lock speculated instructions accepted
6551fa7f10bSFabien Thomasinto the fill buffer.
6561fa7f10bSFabien Thomas.It Li L1D_CACHE_LOCK_FB_HIT
6571fa7f10bSFabien Thomas.Pq Event 53H , Umask 01H
6581fa7f10bSFabien ThomasCounts the number of cacheable load lock speculated or retired instructions
6591fa7f10bSFabien Thomasaccepted into the fill buffer.
6601fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
6611fa7f10bSFabien Thomas.Pq Event 60H , Umask 01H
662bb374ac2SGlen BarberCounts weighted cycles of offcore demand data read requests.
663bb374ac2SGlen BarberDoes not include L2 prefetch requests.
66451cc3ad7SGeorge V. Neville-NeilCounter 0.
6651fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
6661fa7f10bSFabien Thomas.Pq Event 60H , Umask 02H
667bb374ac2SGlen BarberCounts weighted cycles of offcore demand code read requests.
668bb374ac2SGlen BarberDoes not include L2 prefetch requests.
66951cc3ad7SGeorge V. Neville-NeilCounter 0.
6701fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
6711fa7f10bSFabien Thomas.Pq Event 60H , Umask 04H
672bb374ac2SGlen BarberCounts weighted cycles of offcore demand RFO requests.
673bb374ac2SGlen BarberDoes not include L2 prefetch requests.
67451cc3ad7SGeorge V. Neville-NeilCounter 0.
6751fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
6761fa7f10bSFabien Thomas.Pq Event 60H , Umask 08H
677bb374ac2SGlen BarberCounts weighted cycles of offcore read requests of any kind.
678bb374ac2SGlen BarberInclude L2 prefetch requests.
6793102cfe2SGlen BarberCounter 0.
6801fa7f10bSFabien Thomas.It Li CACHE_LOCK_CYCLES.L1D_L2
6811fa7f10bSFabien Thomas.Pq Event 63H , Umask 01H
682bb374ac2SGlen BarberCycle count during which the L1D and L2 are locked.
683bb374ac2SGlen BarberA lock is asserted when
6841fa7f10bSFabien Thomasthere is a locked memory access, due to uncacheable memory, a locked
6851fa7f10bSFabien Thomasoperation that spans two cache lines, or a page walk from an uncacheable
6861fa7f10bSFabien Thomaspage table.
687bb374ac2SGlen BarberCounter 0, 1 only.
688bb374ac2SGlen BarberL1D and L2 locks have a very high performance penalty and
6891fa7f10bSFabien Thomasit is highly recommended to avoid such accesses.
6901fa7f10bSFabien Thomas.It Li CACHE_LOCK_CYCLES.L1D
6911fa7f10bSFabien Thomas.Pq Event 63H , Umask 02H
6921fa7f10bSFabien ThomasCounts the number of cycles that cacheline in the L1 data cache unit is
6931fa7f10bSFabien Thomaslocked.
6941fa7f10bSFabien ThomasCounter 0, 1 only.
6951fa7f10bSFabien Thomas.It Li IO_TRANSACTIONS
6961fa7f10bSFabien Thomas.Pq Event 6CH , Umask 01H
6971fa7f10bSFabien ThomasCounts the number of completed I/O transactions.
6981fa7f10bSFabien Thomas.It Li L1I.HITS
6991fa7f10bSFabien Thomas.Pq Event 80H , Umask 01H
7001fa7f10bSFabien ThomasCounts all instruction fetches that hit the L1 instruction cache.
7011fa7f10bSFabien Thomas.It Li L1I.MISSES
7021fa7f10bSFabien Thomas.Pq Event 80H , Umask 02H
703bb374ac2SGlen BarberCounts all instruction fetches that miss the L1I cache.
704bb374ac2SGlen BarberThis includes
7051fa7f10bSFabien Thomasinstruction cache misses, streaming buffer misses, victim cache misses and
706bb374ac2SGlen Barberuncacheable fetches.
707bb374ac2SGlen BarberAn instruction fetch miss is counted only once and not
7081fa7f10bSFabien Thomasonce for every cycle it is outstanding.
7091fa7f10bSFabien Thomas.It Li L1I.READS
7101fa7f10bSFabien Thomas.Pq Event 80H , Umask 03H
7111fa7f10bSFabien ThomasCounts all instruction fetches, including uncacheable fetches that bypass
7121fa7f10bSFabien Thomasthe L1I.
7131fa7f10bSFabien Thomas.It Li L1I.CYCLES_STALLED
7141fa7f10bSFabien Thomas.Pq Event 80H , Umask 04H
7151fa7f10bSFabien ThomasCycle counts for which an instruction fetch stalls due to a L1I cache miss,
7161fa7f10bSFabien ThomasITLB miss or ITLB fault.
7171fa7f10bSFabien Thomas.It Li LARGE_ITLB.HIT
7181fa7f10bSFabien Thomas.Pq Event 82H , Umask 01H
7191fa7f10bSFabien ThomasCounts number of large ITLB hits.
7201fa7f10bSFabien Thomas.It Li ITLB_MISSES.ANY
7211fa7f10bSFabien Thomas.Pq Event 85H , Umask 01H
7221fa7f10bSFabien ThomasCounts the number of misses in all levels of the ITLB which causes a page
7231fa7f10bSFabien Thomaswalk.
7241fa7f10bSFabien Thomas.It Li ITLB_MISSES.WALK_COMPLETED
7251fa7f10bSFabien Thomas.Pq Event 85H , Umask 02H
7261fa7f10bSFabien ThomasCounts number of misses in all levels of the ITLB which resulted in a
7271fa7f10bSFabien Thomascompleted page walk.
7281fa7f10bSFabien Thomas.It Li ITLB_MISSES.WALK_CYCLES
7291fa7f10bSFabien Thomas.Pq Event 85H , Umask 04H
7301fa7f10bSFabien ThomasCounts ITLB miss page walk cycles.
7311fa7f10bSFabien Thomas.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
7321fa7f10bSFabien Thomas.Pq Event 85H , Umask 80H
7331fa7f10bSFabien ThomasCounts number of completed large page walks due to misses in the STLB.
7341fa7f10bSFabien Thomas.It Li ILD_STALL.LCP
7351fa7f10bSFabien Thomas.Pq Event 87H , Umask 01H
7361fa7f10bSFabien ThomasCycles Instruction Length Decoder stalls due to length changing prefixes:
7371fa7f10bSFabien Thomas66, 67 or REX.W (for EM64T) instructions which change the length of the
7381fa7f10bSFabien Thomasdecoded instruction.
7391fa7f10bSFabien Thomas.It Li ILD_STALL.MRU
7401fa7f10bSFabien Thomas.Pq Event 87H , Umask 02H
7411fa7f10bSFabien ThomasInstruction Length Decoder stall cycles due to Brand Prediction Unit (PBU)
7421fa7f10bSFabien ThomasMost Recently Used (MRU) bypass.
7431fa7f10bSFabien Thomas.It Li ILD_STALL.IQ_FULL
7441fa7f10bSFabien Thomas.Pq Event 87H , Umask 04H
7451fa7f10bSFabien ThomasStall cycles due to a full instruction queue.
7461fa7f10bSFabien Thomas.It Li ILD_STALL.REGEN
7471fa7f10bSFabien Thomas.Pq Event 87H , Umask 08H
7481fa7f10bSFabien ThomasCounts the number of regen stalls.
7491fa7f10bSFabien Thomas.It Li ILD_STALL.ANY
7501fa7f10bSFabien Thomas.Pq Event 87H , Umask 0FH
7511fa7f10bSFabien ThomasCounts any cycles the Instruction Length Decoder is stalled.
7521fa7f10bSFabien Thomas.It Li BR_INST_EXEC.COND
7531fa7f10bSFabien Thomas.Pq Event 88H , Umask 01H
7541fa7f10bSFabien ThomasCounts the number of conditional near branch instructions executed, but not
7551fa7f10bSFabien Thomasnecessarily retired.
7561fa7f10bSFabien Thomas.It Li BR_INST_EXEC.DIRECT
7571fa7f10bSFabien Thomas.Pq Event 88H , Umask 02H
7581fa7f10bSFabien ThomasCounts all unconditional near branch instructions excluding calls and
7591fa7f10bSFabien Thomasindirect branches.
7601fa7f10bSFabien Thomas.It Li BR_INST_EXEC.INDIRECT_NON_CALL
7611fa7f10bSFabien Thomas.Pq Event 88H , Umask 04H
7621fa7f10bSFabien ThomasCounts the number of executed indirect near branch instructions that are not
7631fa7f10bSFabien Thomascalls.
7641fa7f10bSFabien Thomas.It Li BR_INST_EXEC.NON_CALLS
7651fa7f10bSFabien Thomas.Pq Event 88H , Umask 07H
7661fa7f10bSFabien ThomasCounts all non call near branch instructions executed, but not necessarily
7671fa7f10bSFabien Thomasretired.
7681fa7f10bSFabien Thomas.It Li BR_INST_EXEC.RETURN_NEAR
7691fa7f10bSFabien Thomas.Pq Event 88H , Umask 08H
7701fa7f10bSFabien ThomasCounts indirect near branches that have a return mnemonic.
7711fa7f10bSFabien Thomas.It Li BR_INST_EXEC.DIRECT_NEAR_CALL
7721fa7f10bSFabien Thomas.Pq Event 88H , Umask 10H
7731fa7f10bSFabien ThomasCounts unconditional near call branch instructions, excluding non call
7741fa7f10bSFabien Thomasbranch, executed.
7751fa7f10bSFabien Thomas.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL
7761fa7f10bSFabien Thomas.Pq Event 88H , Umask 20H
7771fa7f10bSFabien ThomasCounts indirect near calls, including both register and memory indirect,
7781fa7f10bSFabien Thomasexecuted.
7791fa7f10bSFabien Thomas.It Li BR_INST_EXEC.NEAR_CALLS
7801fa7f10bSFabien Thomas.Pq Event 88H , Umask 30H
7811fa7f10bSFabien ThomasCounts all near call branches executed, but not necessarily retired.
7821fa7f10bSFabien Thomas.It Li BR_INST_EXEC.TAKEN
7831fa7f10bSFabien Thomas.Pq Event 88H , Umask 40H
7841fa7f10bSFabien ThomasCounts taken near branches executed, but not necessarily retired.
7851fa7f10bSFabien Thomas.It Li BR_INST_EXEC.ANY
7861fa7f10bSFabien Thomas.Pq Event 88H , Umask 7FH
787bb374ac2SGlen BarberCounts all near executed branches (not necessarily retired).
788bb374ac2SGlen BarberThis includes only instructions and not micro-op branches.
789bb374ac2SGlen BarberFrequent branching is not necessarily a major performance issue.
790bb374ac2SGlen BarberHowever frequent branch mispredictions may be a problem.
7911fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.COND
7921fa7f10bSFabien Thomas.Pq Event 89H , Umask 01H
7931fa7f10bSFabien ThomasCounts the number of mispredicted conditional near branch instructions
7941fa7f10bSFabien Thomasexecuted, but not necessarily retired.
7951fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.DIRECT
7961fa7f10bSFabien Thomas.Pq Event 89H , Umask 02H
7971fa7f10bSFabien ThomasCounts mispredicted macro unconditional near branch instructions, excluding
7981fa7f10bSFabien Thomascalls and indirect branches (should always be 0).
7991fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.INDIRECT_NON_CALL
8001fa7f10bSFabien Thomas.Pq Event 89H , Umask 04H
8011fa7f10bSFabien ThomasCounts the number of executed mispredicted indirect near branch instructions
8021fa7f10bSFabien Thomasthat are not calls.
8031fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.NON_CALLS
8041fa7f10bSFabien Thomas.Pq Event 89H , Umask 07H
8051fa7f10bSFabien ThomasCounts mispredicted non call near branches executed, but not necessarily
8061fa7f10bSFabien Thomasretired.
8071fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.RETURN_NEAR
8081fa7f10bSFabien Thomas.Pq Event 89H , Umask 08H
8091fa7f10bSFabien ThomasCounts mispredicted indirect branches that have a rear return mnemonic.
8101fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL
8111fa7f10bSFabien Thomas.Pq Event 89H , Umask 10H
8121fa7f10bSFabien ThomasCounts mispredicted non-indirect near calls executed, (should always be 0).
8131fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL
8141fa7f10bSFabien Thomas.Pq Event 89H , Umask 20H
815f6ac2391SJoel DahlCounts mispredicted indirect near calls executed, including both register
8161fa7f10bSFabien Thomasand memory indirect.
8171fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.NEAR_CALLS
8181fa7f10bSFabien Thomas.Pq Event 89H , Umask 30H
8191fa7f10bSFabien ThomasCounts all mispredicted near call branches executed, but not necessarily
8201fa7f10bSFabien Thomasretired.
8211fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.TAKEN
8221fa7f10bSFabien Thomas.Pq Event 89H , Umask 40H
8231fa7f10bSFabien ThomasCounts executed mispredicted near branches that are taken, but not
8241fa7f10bSFabien Thomasnecessarily retired.
8251fa7f10bSFabien Thomas.It Li BR_MISP_EXEC.ANY
8261fa7f10bSFabien Thomas.Pq Event 89H , Umask 7FH
8271fa7f10bSFabien ThomasCounts the number of mispredicted near branch instructions that were
8281fa7f10bSFabien Thomasexecuted, but not necessarily retired.
8291fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.ANY
8301fa7f10bSFabien Thomas.Pq Event A2H , Umask 01H
831bb374ac2SGlen BarberCounts the number of Allocator resource related stalls.
832bb374ac2SGlen BarberIncludes register renaming buffer entries, memory buffer entries.
833bb374ac2SGlen BarberIn addition to resource related stalls, this event counts some other events.
834bb374ac2SGlen BarberIncludes stalls arising
8351fa7f10bSFabien Thomasduring branch misprediction recovery, such as if retirement of the
8361fa7f10bSFabien Thomasmispredicted branch is delayed and stalls arising while store buffer is
8371fa7f10bSFabien Thomasdraining from synchronizing operations.
8381fa7f10bSFabien ThomasDoes not include stalls due to SuperQ (off core) queue full, too many cache
8391fa7f10bSFabien Thomasmisses, etc.
8401fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.LOAD
8411fa7f10bSFabien Thomas.Pq Event A2H , Umask 02H
8421fa7f10bSFabien ThomasCounts the cycles of stall due to lack of load buffer for load operation.
8431fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.RS_FULL
8441fa7f10bSFabien Thomas.Pq Event A2H , Umask 04H
8451fa7f10bSFabien ThomasThis event counts the number of cycles when the number of instructions in
8461fa7f10bSFabien Thomasthe pipeline waiting for execution reaches the limit the processor can
847bb374ac2SGlen Barberhandle.
848bb374ac2SGlen BarberA high count of this event indicates that there are long latency
8491fa7f10bSFabien Thomasoperations in the pipe (possibly load and store operations that miss the L2
8501fa7f10bSFabien Thomascache, or instructions dependent upon instructions further down the pipeline
8511fa7f10bSFabien Thomasthat have yet to retire.
8521fa7f10bSFabien ThomasWhen RS is full, new instructions can not enter the reservation station and
8531fa7f10bSFabien Thomasstart execution.
8541fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.STORE
8551fa7f10bSFabien Thomas.Pq Event A2H , Umask 08H
8561fa7f10bSFabien ThomasThis event counts the number of cycles that a resource related stall will
8571fa7f10bSFabien Thomasoccur due to the number of store instructions reaching the limit of the
858bb374ac2SGlen Barberpipeline, (i.e. all store buffers are used).
859bb374ac2SGlen BarberThe stall ends when a store
8601fa7f10bSFabien Thomasinstruction commits its data to the cache or memory.
8611fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.ROB_FULL
8621fa7f10bSFabien Thomas.Pq Event A2H , Umask 10H
8631fa7f10bSFabien ThomasCounts the cycles of stall due to re- order buffer full.
8641fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.FPCW
8651fa7f10bSFabien Thomas.Pq Event A2H , Umask 20H
8661fa7f10bSFabien ThomasCounts the number of cycles while execution was stalled due to writing the
8671fa7f10bSFabien Thomasfloating-point unit (FPU) control word.
8681fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.MXCSR
8691fa7f10bSFabien Thomas.Pq Event A2H , Umask 40H
8701fa7f10bSFabien ThomasStalls due to the MXCSR register rename occurring to close to a previous
871bb374ac2SGlen BarberMXCSR rename.
872bb374ac2SGlen BarberThe MXCSR provides control and status for the MMX registers.
8731fa7f10bSFabien Thomas.It Li RESOURCE_STALLS.OTHER
8741fa7f10bSFabien Thomas.Pq Event A2H , Umask 80H
8751fa7f10bSFabien ThomasCounts the number of cycles while execution was stalled due to other
8761fa7f10bSFabien Thomasresource issues.
8771fa7f10bSFabien Thomas.It Li MACRO_INSTS.FUSIONS_DECODED
8781fa7f10bSFabien Thomas.Pq Event A6H , Umask 01H
8791fa7f10bSFabien ThomasCounts the number of instructions decoded that are macro-fused but not
8801fa7f10bSFabien Thomasnecessarily executed or retired.
8811fa7f10bSFabien Thomas.It Li BACLEAR_FORCE_IQ
8821fa7f10bSFabien Thomas.Pq Event A7H , Umask 01H
883bb374ac2SGlen BarberCounts number of times a BACLEAR was forced by the Instruction Queue.
884bb374ac2SGlen BarberThe IQ is also responsible for providing conditional branch prediction
885bb374ac2SGlen Barberdirection based on a static scheme and dynamic data provided by the L2
886bb374ac2SGlen BarberBranch Prediction Unit.
887bb374ac2SGlen BarberIf the conditional branch target is not found in the Target
8881fa7f10bSFabien ThomasArray and the IQ predicts that the branch is taken, then the IQ will force
889bb374ac2SGlen Barberthe Branch Address Calculator to issue a BACLEAR.
890bb374ac2SGlen BarberEach BACLEAR asserted by
8911fa7f10bSFabien Thomasthe BAC generates approximately an 8 cycle bubble in the instruction fetch
8921fa7f10bSFabien Thomaspipeline.
8931fa7f10bSFabien Thomas.It Li LSD.UOPS
8941fa7f10bSFabien Thomas.Pq Event A8H , Umask 01H
8951fa7f10bSFabien ThomasCounts the number of micro-ops delivered by loop stream detector
8961fa7f10bSFabien ThomasUse cmask=1 and invert to count cycles
8971fa7f10bSFabien Thomas.It Li ITLB_FLUSH
8981fa7f10bSFabien Thomas.Pq Event AEH , Umask 01H
8991fa7f10bSFabien ThomasCounts the number of ITLB flushes
9001fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA
9011fa7f10bSFabien Thomas.Pq Event B0H , Umask 01H
902bb374ac2SGlen BarberCounts number of offcore demand data read requests.
903bb374ac2SGlen BarberDoes not count L2 prefetch requests.
9041fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
9051fa7f10bSFabien Thomas.Pq Event B0H , Umask 02H
906bb374ac2SGlen BarberCounts number of offcore demand code read requests.
907bb374ac2SGlen BarberDoes not count L2 prefetch requests.
9081fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.DEMAND.RFO
9091fa7f10bSFabien Thomas.Pq Event B0H , Umask 04H
910bb374ac2SGlen BarberCounts number of offcore demand RFO requests.
911bb374ac2SGlen BarberDoes not count L2 prefetch requests.
9121fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY.READ
9131fa7f10bSFabien Thomas.Pq Event B0H , Umask 08H
914bb374ac2SGlen BarberCounts number of offcore read requests.
915bb374ac2SGlen BarberIncludes L2 prefetch requests.
9161fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY.RFO
9171fa7f10bSFabien Thomas.Pq Event 80H , Umask 10H
918bb374ac2SGlen BarberCounts number of offcore RFO requests.
919bb374ac2SGlen BarberIncludes L2 prefetch requests.
9201fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
9211fa7f10bSFabien Thomas.Pq Event B0H , Umask 40H
9221fa7f10bSFabien ThomasCounts number of L1D writebacks to the uncore.
9231fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS.ANY
9241fa7f10bSFabien Thomas.Pq Event B0H , Umask 80H
9251fa7f10bSFabien ThomasCounts all offcore requests.
9261fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT0
9271fa7f10bSFabien Thomas.Pq Event B1H , Umask 01H
928bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 0.
929bb374ac2SGlen BarberPort 0 handles integer arithmetic, SIMD and FP add Uops.
9301fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT1
9311fa7f10bSFabien Thomas.Pq Event B1H , Umask 02H
932bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 1.
933bb374ac2SGlen BarberPort 1 handles integer arithmetic, SIMD, integer shift, FP multiply and
934bb374ac2SGlen BarberFP divide Uops.
9351fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT2_CORE
9361fa7f10bSFabien Thomas.Pq Event B1H , Umask 04H
937bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 2.
938bb374ac2SGlen BarberPort 2 handles the load Uops.
939bb374ac2SGlen BarberThis is a core count only and can not be collected per
9401fa7f10bSFabien Thomasthread.
9411fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT3_CORE
9421fa7f10bSFabien Thomas.Pq Event B1H , Umask 08H
943bb374ac2SGlen BarberCounts number of Uops executed that were issued on port 3.
944bb374ac2SGlen BarberPort 3 handles store Uops.
945bb374ac2SGlen BarberThis is a core count only and can not be collected per thread.
9461fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT4_CORE
9471fa7f10bSFabien Thomas.Pq Event B1H , Umask 10H
948bb374ac2SGlen BarberCounts number of Uops executed that where issued on port 4.
949bb374ac2SGlen BarberPort 4 handles the value to be stored for the store Uops issued on port 3.
950bb374ac2SGlen BarberThis is a core count only and can not be collected per thread.
9511fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
9521fa7f10bSFabien Thomas.Pq Event B1H , Umask 1FH
9531fa7f10bSFabien ThomasCounts number of cycles there are one or more uops being executed and were
954bb374ac2SGlen Barberissued on ports 0-4.
955bb374ac2SGlen BarberThis is a core count only and can not be collected per thread.
9561fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT5
9571fa7f10bSFabien Thomas.Pq Event B1H , Umask 20H
9581fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 5.
9591fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
9601fa7f10bSFabien Thomas.Pq Event B1H , Umask 3FH
9611fa7f10bSFabien ThomasCounts number of cycles there are one or more uops being executed on any
962bb374ac2SGlen Barberports.
963bb374ac2SGlen BarberThis is a core count only and can not be collected per thread.
9641fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT015
9651fa7f10bSFabien Thomas.Pq Event B1H , Umask 40H
9661fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 0, 1, or 5.
96751cc3ad7SGeorge V. Neville-NeilUse cmask=1, invert=1 to count stall cycles.
9681fa7f10bSFabien Thomas.It Li UOPS_EXECUTED.PORT234
9691fa7f10bSFabien Thomas.Pq Event B1H , Umask 80H
9701fa7f10bSFabien ThomasCounts number of Uops executed that where issued on port 2, 3, or 4.
9711fa7f10bSFabien Thomas.It Li OFFCORE_REQUESTS_SQ_FULL
9721fa7f10bSFabien Thomas.Pq Event B2H , Umask 01H
9731fa7f10bSFabien ThomasCounts number of cycles the SQ is full to handle off-core requests.
9741fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
9751fa7f10bSFabien Thomas.Pq Event B3H , Umask 01H
976bb374ac2SGlen BarberCounts weighted cycles of snoopq requests for data.
977bb374ac2SGlen BarberCounter 0 only
9781fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty.
9791fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
9801fa7f10bSFabien Thomas.Pq Event B3H , Umask 02H
981bb374ac2SGlen BarberCounts weighted cycles of snoopq invalidate requests.
982bb374ac2SGlen BarberCounter 0 only.
9831fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty.
9841fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
9851fa7f10bSFabien Thomas.Pq Event B3H , Umask 04H
986bb374ac2SGlen BarberCounts weighted cycles of snoopq requests for code.
987bb374ac2SGlen BarberCounter 0 only.
9881fa7f10bSFabien ThomasUse cmask=1 to count cycles not empty.
9891fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.CODE
9901fa7f10bSFabien Thomas.Pq Event B4H , Umask 01H
99151cc3ad7SGeorge V. Neville-NeilCounts the number of snoop code requests.
9921fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.DATA
9931fa7f10bSFabien Thomas.Pq Event B4H , Umask 02H
99451cc3ad7SGeorge V. Neville-NeilCounts the number of snoop data requests.
9951fa7f10bSFabien Thomas.It Li SNOOPQ_REQUESTS.INVALIDATE
9961fa7f10bSFabien Thomas.Pq Event B4H , Umask 04H
9971fa7f10bSFabien ThomasCounts the number of snoop invalidate requests
9981fa7f10bSFabien Thomas.It Li OFF_CORE_RESPONSE_0
9991fa7f10bSFabien Thomas.Pq Event B7H , Umask 01H
10001fa7f10bSFabien Thomassee Section 30.6.1.3, Off-core Response Performance Monitoring in the
10011fa7f10bSFabien ThomasProcessor Core.
100251cc3ad7SGeorge V. Neville-NeilRequires programming MSR 01A6H.
10031fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HIT
10041fa7f10bSFabien Thomas.Pq Event B8H , Umask 01H
10051fa7f10bSFabien ThomasCounts HIT snoop response sent by this thread in response to a snoop
10061fa7f10bSFabien Thomasrequest.
10071fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HITE
10081fa7f10bSFabien Thomas.Pq Event B8H , Umask 02H
10091fa7f10bSFabien ThomasCounts HIT E snoop response sent by this thread in response to a snoop
10101fa7f10bSFabien Thomasrequest.
10111fa7f10bSFabien Thomas.It Li SNOOP_RESPONSE.HITM
10121fa7f10bSFabien Thomas.Pq Event B8H , Umask 04H
10131fa7f10bSFabien ThomasCounts HIT M snoop response sent by this thread in response to a snoop
10141fa7f10bSFabien Thomasrequest.
10151fa7f10bSFabien Thomas.It Li OFF_CORE_RESPONSE_1
10161fa7f10bSFabien Thomas.Pq Event BBH , Umask 01H
10171fa7f10bSFabien Thomassee Section 30.6.1.3, Off-core Response Performance Monitoring in the
101851cc3ad7SGeorge V. Neville-NeilProcessor Core.
101951cc3ad7SGeorge V. Neville-NeilUse MSR 01A7H.
10201fa7f10bSFabien Thomas.It Li INST_RETIRED.ANY_P
10211fa7f10bSFabien Thomas.Pq Event C0H , Umask 01H
10221fa7f10bSFabien ThomasSee Table A-1
10231fa7f10bSFabien ThomasNotes: INST_RETIRED.ANY is counted by a designated fixed counter.
10241fa7f10bSFabien ThomasINST_RETIRED.ANY_P is counted by a programmable counter and is an
1025bb374ac2SGlen Barberarchitectural performance event.
1026bb374ac2SGlen BarberEvent is supported if CPUID.A.EBX[1] = 0.
10271fa7f10bSFabien ThomasCounting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
10281fa7f10bSFabien Thomascount as retired instructions.
10291fa7f10bSFabien Thomas.It Li INST_RETIRED.X87
10301fa7f10bSFabien Thomas.Pq Event C0H , Umask 02H
1031c2025a76SJoel DahlCounts the number of floating point computational operations retired
10321fa7f10bSFabien Thomasfloating point computational operations executed by the assist handler and
10331fa7f10bSFabien Thomassub-operations of complex floating point instructions like transcendental
10341fa7f10bSFabien Thomasinstructions.
10351fa7f10bSFabien Thomas.It Li INST_RETIRED.MMX
10361fa7f10bSFabien Thomas.Pq Event C0H , Umask 04H
10371fa7f10bSFabien ThomasCounts the number of retired: MMX instructions.
10381fa7f10bSFabien Thomas.It Li UOPS_RETIRED.ANY
10391fa7f10bSFabien Thomas.Pq Event C2H , Umask 01H
10401fa7f10bSFabien ThomasCounts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
1041bb374ac2SGlen Barberothers=1; maximum count of 8 per cycle).
1042bb374ac2SGlen BarberMost instructions are composed of one or two micro-ops.
1043bb374ac2SGlen BarberSome instructions are decoded into longer sequences
10441fa7f10bSFabien Thomassuch as repeat instructions, floating point transcendental instructions, and
10451fa7f10bSFabien Thomasassists.
10461fa7f10bSFabien ThomasUse cmask=1 and invert to count active cycles or stalled cycles
10471fa7f10bSFabien Thomas.It Li UOPS_RETIRED.RETIRE_SLOTS
10481fa7f10bSFabien Thomas.Pq Event C2H , Umask 02H
10491fa7f10bSFabien ThomasCounts the number of retirement slots used each cycle
10501fa7f10bSFabien Thomas.It Li UOPS_RETIRED.MACRO_FUSED
10511fa7f10bSFabien Thomas.Pq Event C2H , Umask 04H
10521fa7f10bSFabien ThomasCounts number of macro-fused uops retired.
10531fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.CYCLES
10541fa7f10bSFabien Thomas.Pq Event C3H , Umask 01H
10551fa7f10bSFabien ThomasCounts the cycles machine clear is asserted.
10561fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.MEM_ORDER
10571fa7f10bSFabien Thomas.Pq Event C3H , Umask 02H
10581fa7f10bSFabien ThomasCounts the number of machine clears due to memory order conflicts.
10591fa7f10bSFabien Thomas.It Li MACHINE_CLEARS.SMC
10601fa7f10bSFabien Thomas.Pq Event C3H , Umask 04H
10611fa7f10bSFabien ThomasCounts the number of times that a program writes to a code section.
10621fa7f10bSFabien ThomasSelf-modifying code causes a sever penalty in all Intel 64 and IA-32
1063bb374ac2SGlen Barberprocessors.
1064bb374ac2SGlen BarberThe modified cache line is written back to the L2 and L3caches.
106551cc3ad7SGeorge V. Neville-Neil.It Li BR_INST_RETIRED.ANY_P
10661fa7f10bSFabien Thomas.Pq Event C4H , Umask 00H
106751cc3ad7SGeorge V. Neville-NeilSee Table A-1.
10681fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.CONDITIONAL
10691fa7f10bSFabien Thomas.Pq Event C4H , Umask 01H
10701fa7f10bSFabien ThomasCounts the number of conditional branch instructions retired.
10711fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.NEAR_CALL
10721fa7f10bSFabien Thomas.Pq Event C4H , Umask 02H
107351cc3ad7SGeorge V. Neville-NeilCounts the number of direct & indirect near unconditional calls retired.
10741fa7f10bSFabien Thomas.It Li BR_INST_RETIRED.ALL_BRANCHES
10751fa7f10bSFabien Thomas.Pq Event C4H , Umask 04H
107651cc3ad7SGeorge V. Neville-NeilCounts the number of branch instructions retired.
107751cc3ad7SGeorge V. Neville-Neil.It Li BR_MISP_RETIRED.ANY_P
10781fa7f10bSFabien Thomas.Pq Event C5H , Umask 00H
107951cc3ad7SGeorge V. Neville-NeilSee Table A-1.
10801fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.CONDITIONAL
10811fa7f10bSFabien Thomas.Pq Event C5H , Umask 01H
10821fa7f10bSFabien ThomasCounts mispredicted conditional retired calls.
10831fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.NEAR_CALL
10841fa7f10bSFabien Thomas.Pq Event C5H , Umask 02H
10851fa7f10bSFabien ThomasCounts mispredicted direct & indirect near unconditional retired calls.
10861fa7f10bSFabien Thomas.It Li BR_MISP_RETIRED.ALL_BRANCHES
10871fa7f10bSFabien Thomas.Pq Event C5H , Umask 04H
10881fa7f10bSFabien ThomasCounts all mispredicted retired calls.
10891fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE
10901fa7f10bSFabien Thomas.Pq Event C7H , Umask 01H
10911fa7f10bSFabien ThomasCounts SIMD packed single-precision floating point Uops retired.
10921fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE
10931fa7f10bSFabien Thomas.Pq Event C7H , Umask 02H
10941fa7f10bSFabien ThomasCounts SIMD calar single-precision floating point Uops retired.
10951fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE
10961fa7f10bSFabien Thomas.Pq Event C7H , Umask 04H
10971fa7f10bSFabien ThomasCounts SIMD packed double- precision floating point Uops retired.
10981fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE
10991fa7f10bSFabien Thomas.Pq Event C7H , Umask 08H
11001fa7f10bSFabien ThomasCounts SIMD scalar double-precision floating point Uops retired.
11011fa7f10bSFabien Thomas.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER
11021fa7f10bSFabien Thomas.Pq Event C7H , Umask 10H
11031fa7f10bSFabien ThomasCounts 128-bit SIMD vector integer Uops retired.
11041fa7f10bSFabien Thomas.It Li ITLB_MISS_RETIRED
11051fa7f10bSFabien Thomas.Pq Event C8H , Umask 20H
11061fa7f10bSFabien ThomasCounts the number of retired instructions that missed the ITLB when the
11071fa7f10bSFabien Thomasinstruction was fetched.
11081fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L1D_HIT
11091fa7f10bSFabien Thomas.Pq Event CBH , Umask 01H
11101fa7f10bSFabien ThomasCounts number of retired loads that hit the L1 data cache.
11111fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L2_HIT
11121fa7f10bSFabien Thomas.Pq Event CBH , Umask 02H
11131fa7f10bSFabien ThomasCounts number of retired loads that hit the L2 data cache.
11141fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT
11151fa7f10bSFabien Thomas.Pq Event CBH , Umask 04H
11161fa7f10bSFabien ThomasCounts number of retired loads that hit their own, unshared lines in the L3
11171fa7f10bSFabien Thomascache.
11181fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
11191fa7f10bSFabien Thomas.Pq Event CBH , Umask 08H
11201fa7f10bSFabien ThomasCounts number of retired loads that hit in a sibling core's L2 (on die
1121bb374ac2SGlen Barbercore).
1122bb374ac2SGlen BarberSince the L3 is inclusive of all cores on the package, this is an L3 hit.
1123bb374ac2SGlen BarberThis counts both clean or modified hits.
11241fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.L3_MISS
11251fa7f10bSFabien Thomas.Pq Event CBH , Umask 10H
1126bb374ac2SGlen BarberCounts number of retired loads that miss the L3 cache.
1127bb374ac2SGlen BarberThe load was satisfied by a remote socket, local memory or an IOH.
11281fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.HIT_LFB
11291fa7f10bSFabien Thomas.Pq Event CBH , Umask 40H
11301fa7f10bSFabien ThomasCounts number of retired loads that miss the L1D and the address is located
1131bb374ac2SGlen Barberin an allocated line fill buffer and will soon be committed to cache.
1132bb374ac2SGlen BarberThis is counting secondary L1D misses.
11331fa7f10bSFabien Thomas.It Li MEM_LOAD_RETIRED.DTLB_MISS
11341fa7f10bSFabien Thomas.Pq Event CBH , Umask 80H
1135bb374ac2SGlen BarberCounts the number of retired loads that missed the DTLB.
1136bb374ac2SGlen BarberThe DTLB miss is not counted if the load operation causes a fault.
1137bb374ac2SGlen BarberThis event counts loads from cacheable memory only.
1138bb374ac2SGlen BarberThe event does not count loads by software prefetches.
1139bb374ac2SGlen BarberCounts both primary and secondary misses to the TLB.
11401fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.TO_FP
11411fa7f10bSFabien Thomas.Pq Event CCH , Umask 01H
11421fa7f10bSFabien ThomasCounts the first floating-point instruction following any MMX instruction.
11431fa7f10bSFabien ThomasYou can use this event to estimate the penalties for the transitions between
11441fa7f10bSFabien Thomasfloating-point and MMX technology states.
11451fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.TO_MMX
11461fa7f10bSFabien Thomas.Pq Event CCH , Umask 02H
1147bb374ac2SGlen BarberCounts the first MMX instruction following a floating-point instruction.
1148bb374ac2SGlen BarberYou can use this event to estimate the penalties for the transitions between
11491fa7f10bSFabien Thomasfloating-point and MMX technology states.
11501fa7f10bSFabien Thomas.It Li FP_MMX_TRANS.ANY
11511fa7f10bSFabien Thomas.Pq Event CCH , Umask 03H
11521fa7f10bSFabien ThomasCounts all transitions from floating point to MMX instructions and from MMX
1153bb374ac2SGlen Barberinstructions to floating point instructions.
1154bb374ac2SGlen BarberYou can use this event to estimate the penalties for the transitions between
1155bb374ac2SGlen Barberfloating-point and MMX technology states.
11561fa7f10bSFabien Thomas.It Li MACRO_INSTS.DECODED
11571fa7f10bSFabien Thomas.Pq Event D0H , Umask 01H
11581fa7f10bSFabien ThomasCounts the number of instructions decoded, (but not necessarily executed or
11591fa7f10bSFabien Thomasretired).
11601fa7f10bSFabien Thomas.It Li UOPS_DECODED.STALL_CYCLES
11611fa7f10bSFabien Thomas.Pq Event D1H , Umask 01H
11621fa7f10bSFabien ThomasCounts the cycles of decoder stalls.
11631fa7f10bSFabien Thomas.It Li UOPS_DECODED.MS
11641fa7f10bSFabien Thomas.Pq Event D1H , Umask 02H
1165bb374ac2SGlen BarberCounts the number of Uops decoded by the Microcode Sequencer, MS.
1166bb374ac2SGlen BarberThe MS delivers uops when the instruction is more than 4 uops long or a
1167bb374ac2SGlen Barbermicrocode assist is occurring.
11681fa7f10bSFabien Thomas.It Li UOPS_DECODED.ESP_FOLDING
11691fa7f10bSFabien Thomas.Pq Event D1H , Umask 04H
11701fa7f10bSFabien ThomasCounts number of stack pointer (ESP) instructions decoded: push , pop , call
11711fa7f10bSFabien Thomas, ret, etc. ESP instructions do not generate a Uop to increment or decrement
1172bb374ac2SGlen BarberESP.
1173bb374ac2SGlen BarberInstead, they update an ESP_Offset register that keeps track of the
11741fa7f10bSFabien Thomasdelta to the current value of the ESP register.
11751fa7f10bSFabien Thomas.It Li UOPS_DECODED.ESP_SYNC
11761fa7f10bSFabien Thomas.Pq Event D1H , Umask 08H
11771fa7f10bSFabien ThomasCounts number of stack pointer (ESP) sync operations where an ESP
11781fa7f10bSFabien Thomasinstruction is corrected by adding the ESP offset register to the current
11791fa7f10bSFabien Thomasvalue of the ESP register.
11801fa7f10bSFabien Thomas.It Li RAT_STALLS.FLAGS
11811fa7f10bSFabien Thomas.Pq Event D2H , Umask 01H
11821fa7f10bSFabien ThomasCounts the number of cycles during which execution stalled due to several
1183bb374ac2SGlen Barberreasons, one of which is a partial flag register stall.
1184bb374ac2SGlen BarberA partial register
11851fa7f10bSFabien Thomasstall may occur when two conditions are met: 1) an instruction modifies
11861fa7f10bSFabien Thomassome, but not all, of the flags in the flag register and 2) the next
11871fa7f10bSFabien Thomasinstruction, which depends on flags, depends on flags that were not modified
11881fa7f10bSFabien Thomasby this instruction.
11891fa7f10bSFabien Thomas.It Li RAT_STALLS.REGISTERS
11901fa7f10bSFabien Thomas.Pq Event D2H , Umask 02H
11911fa7f10bSFabien ThomasThis event counts the number of cycles instruction execution latency became
11921fa7f10bSFabien Thomaslonger than the defined latency because the instruction used a register that
11931fa7f10bSFabien Thomaswas partially written by previous instruction.
11941fa7f10bSFabien Thomas.It Li RAT_STALLS.ROB_READ_PORT
11951fa7f10bSFabien Thomas.Pq Event D2H , Umask 04H
11961fa7f10bSFabien ThomasCounts the number of cycles when ROB read port stalls occurred, which did
1197bb374ac2SGlen Barbernot allow new micro-ops to enter the out-of-order pipeline.
1198bb374ac2SGlen BarberNote that, at
11991fa7f10bSFabien Thomasthis stage in the pipeline, additional stalls may occur at the same cycle
1200bb374ac2SGlen Barberand prevent the stalled micro-ops from entering the pipe.
1201bb374ac2SGlen BarberIn such a case,
12021fa7f10bSFabien Thomasmicro-ops retry entering the execution pipe in the next cycle and the
12031fa7f10bSFabien ThomasROB-read port stall is counted again.
12041fa7f10bSFabien Thomas.It Li RAT_STALLS.SCOREBOARD
12051fa7f10bSFabien Thomas.Pq Event D2H , Umask 08H
12061fa7f10bSFabien ThomasCounts the cycles where we stall due to microarchitecturally required
1207bb374ac2SGlen Barberserialization.
1208bb374ac2SGlen BarberMicrocode scoreboarding stalls.
12091fa7f10bSFabien Thomas.It Li RAT_STALLS.ANY
12101fa7f10bSFabien Thomas.Pq Event D2H , Umask 0FH
12111fa7f10bSFabien ThomasCounts all Register Allocation Table stall cycles due to: Cycles when ROB
12121fa7f10bSFabien Thomasread port stalls occurred, which did not allow new micro-ops to enter the
1213bb374ac2SGlen Barberexecution pipe.
1214bb374ac2SGlen BarberCycles when partial register stalls occurred Cycles when
12151fa7f10bSFabien Thomasflag stalls occurred Cycles floating-point unit (FPU) status word stalls
1216bb374ac2SGlen Barberoccurred.
1217bb374ac2SGlen BarberTo count each of these conditions separately use the events:
12181fa7f10bSFabien ThomasRAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
12191fa7f10bSFabien ThomasRAT_STALLS.FPSW.
12201fa7f10bSFabien Thomas.It Li SEG_RENAME_STALLS
12211fa7f10bSFabien Thomas.Pq Event D4H , Umask 01H
12221fa7f10bSFabien ThomasCounts the number of stall cycles due to the lack of renaming resources for
1223bb374ac2SGlen Barberthe ES, DS, FS, and GS segment registers.
1224bb374ac2SGlen BarberIf a segment is renamed but not
12251fa7f10bSFabien Thomasretired and a second update to the same segment occurs, a stall occurs in
12261fa7f10bSFabien Thomasthe front- end of the pipeline until the renamed segment retires.
12271fa7f10bSFabien Thomas.It Li ES_REG_RENAMES
12281fa7f10bSFabien Thomas.Pq Event D5H , Umask 01H
12291fa7f10bSFabien ThomasCounts the number of times the ES segment register is renamed.
12301fa7f10bSFabien Thomas.It Li UOP_UNFUSION
12311fa7f10bSFabien Thomas.Pq Event DBH , Umask 01H
12321fa7f10bSFabien ThomasCounts unfusion events due to floating point exception to a fused uop.
12331fa7f10bSFabien Thomas.It Li BR_INST_DECODED
12341fa7f10bSFabien Thomas.Pq Event E0H , Umask 01H
12351fa7f10bSFabien ThomasCounts the number of branch instructions decoded.
12361fa7f10bSFabien Thomas.It Li BPU_MISSED_CALL_RET
12371fa7f10bSFabien Thomas.Pq Event E5H , Umask 01H
1238799162a6SJoel DahlCounts number of times the Branch Prediction Unit missed predicting a call
12391fa7f10bSFabien Thomasor return branch.
12401fa7f10bSFabien Thomas.It Li BACLEAR.CLEAR
12411fa7f10bSFabien Thomas.Pq Event E6H , Umask 01H
12421fa7f10bSFabien ThomasCounts the number of times the front end is resteered, mainly when the
12431fa7f10bSFabien ThomasBranch Prediction Unit cannot provide a correct prediction and this is
1244bb374ac2SGlen Barbercorrected by the Branch Address Calculator at the front end.
1245bb374ac2SGlen BarberThis can occur
12461fa7f10bSFabien Thomasif the code has many branches such that they cannot be consumed by the BPU.
12471fa7f10bSFabien ThomasEach BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
1248bb374ac2SGlen Barberin the instruction fetch pipeline.
1249bb374ac2SGlen BarberThe effect on total execution time depends on the surrounding code.
12501fa7f10bSFabien Thomas.It Li BACLEAR.BAD_TARGET
12511fa7f10bSFabien Thomas.Pq Event E6H , Umask 02H
12521fa7f10bSFabien ThomasCounts number of Branch Address Calculator clears (BACLEAR) asserted due to
12531fa7f10bSFabien Thomasconditional branch instructions in which there was a target hit but the
1254bb374ac2SGlen Barberdirection was wrong.
1255bb374ac2SGlen BarberEach BACLEAR asserted by the BAC generates
12561fa7f10bSFabien Thomasapproximately an 8 cycle bubble in the instruction fetch pipeline.
12571fa7f10bSFabien Thomas.It Li BPU_CLEARS.EARLY
12581fa7f10bSFabien Thomas.Pq Event E8H , Umask 01H
12591fa7f10bSFabien ThomasCounts early (normal) Branch Prediction Unit clears: BPU predicted a taken
12601fa7f10bSFabien Thomasbranch after incorrectly assuming that it was not taken.
12611fa7f10bSFabien ThomasThe BPU clear leads to 2 cycle bubble in the Front End.
12621fa7f10bSFabien Thomas.It Li BPU_CLEARS.LATE
12631fa7f10bSFabien Thomas.Pq Event E8H , Umask 02H
12641fa7f10bSFabien ThomasCounts late Branch Prediction Unit clears due to Most Recently Used
1265bb374ac2SGlen Barberconflicts.
1266bb374ac2SGlen BarberThe PBU clear leads to a 3 cycle bubble in the Front End.
12671fa7f10bSFabien Thomas.It Li THREAD_ACTIVE
12681fa7f10bSFabien Thomas.Pq Event ECH , Umask 01H
12691fa7f10bSFabien ThomasCounts cycles threads are active.
12701fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.LOAD
12711fa7f10bSFabien Thomas.Pq Event F0H , Umask 01H
12721fa7f10bSFabien ThomasCounts L2 load operations due to HW prefetch or demand loads.
12731fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.RFO
12741fa7f10bSFabien Thomas.Pq Event F0H , Umask 02H
12751fa7f10bSFabien ThomasCounts L2 RFO operations due to HW prefetch or demand RFOs.
12761fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.IFETCH
12771fa7f10bSFabien Thomas.Pq Event F0H , Umask 04H
12781fa7f10bSFabien ThomasCounts L2 instruction fetch operations due to HW prefetch or demand ifetch.
12791fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.PREFETCH
12801fa7f10bSFabien Thomas.Pq Event F0H , Umask 08H
12811fa7f10bSFabien ThomasCounts L2 prefetch operations.
12821fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.L1D_WB
12831fa7f10bSFabien Thomas.Pq Event F0H , Umask 10H
12841fa7f10bSFabien ThomasCounts L1D writeback operations to the L2.
12851fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.FILL
12861fa7f10bSFabien Thomas.Pq Event F0H , Umask 20H
12871fa7f10bSFabien ThomasCounts L2 cache line fill operations due to load, RFO, L1D writeback or
12881fa7f10bSFabien Thomasprefetch.
12891fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.WB
12901fa7f10bSFabien Thomas.Pq Event F0H , Umask 40H
12911fa7f10bSFabien ThomasCounts L2 writeback operations to the L3.
12921fa7f10bSFabien Thomas.It Li L2_TRANSACTIONS.ANY
12931fa7f10bSFabien Thomas.Pq Event F0H , Umask 80H
12941fa7f10bSFabien ThomasCounts all L2 cache operations.
12951fa7f10bSFabien Thomas.It Li L2_LINES_IN.S_STATE
12961fa7f10bSFabien Thomas.Pq Event F1H , Umask 02H
12971fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache in the S (shared)
12981fa7f10bSFabien Thomasstate.
12991fa7f10bSFabien Thomas.It Li L2_LINES_IN.E_STATE
13001fa7f10bSFabien Thomas.Pq Event F1H , Umask 04H
13011fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache in the E
13021fa7f10bSFabien Thomas(exclusive) state.
13031fa7f10bSFabien Thomas.It Li L2_LINES_IN.ANY
13041fa7f10bSFabien Thomas.Pq Event F1H , Umask 07H
13051fa7f10bSFabien ThomasCounts the number of cache lines allocated in the L2 cache.
13061fa7f10bSFabien Thomas.It Li L2_LINES_OUT.DEMAND_CLEAN
13071fa7f10bSFabien Thomas.Pq Event F2H , Umask 01H
13081fa7f10bSFabien ThomasCounts L2 clean cache lines evicted by a demand request.
13091fa7f10bSFabien Thomas.It Li L2_LINES_OUT.DEMAND_DIRTY
13101fa7f10bSFabien Thomas.Pq Event F2H , Umask 02H
13111fa7f10bSFabien ThomasCounts L2 dirty (modified) cache lines evicted by a demand request.
13121fa7f10bSFabien Thomas.It Li L2_LINES_OUT.PREFETCH_CLEAN
13131fa7f10bSFabien Thomas.Pq Event F2H , Umask 04H
13141fa7f10bSFabien ThomasCounts L2 clean cache line evicted by a prefetch request.
13151fa7f10bSFabien Thomas.It Li L2_LINES_OUT.PREFETCH_DIRTY
13161fa7f10bSFabien Thomas.Pq Event F2H , Umask 08H
13171fa7f10bSFabien ThomasCounts L2 modified cache line evicted by a prefetch request.
13181fa7f10bSFabien Thomas.It Li L2_LINES_OUT.ANY
13191fa7f10bSFabien Thomas.Pq Event F2H , Umask 0FH
13201fa7f10bSFabien ThomasCounts all L2 cache lines evicted for any reason.
13211fa7f10bSFabien Thomas.It Li SQ_MISC.LRU_HINTS
13221fa7f10bSFabien Thomas.Pq Event F4H , Umask 04H
13231fa7f10bSFabien ThomasCounts number of Super Queue LRU hints sent to L3.
13241fa7f10bSFabien Thomas.It Li SQ_MISC.SPLIT_LOCK
13251fa7f10bSFabien Thomas.Pq Event F4H , Umask 10H
13261fa7f10bSFabien ThomasCounts the number of SQ lock splits across a cache line.
13271fa7f10bSFabien Thomas.It Li SQ_FULL_STALL_CYCLES
13281fa7f10bSFabien Thomas.Pq Event F6H , Umask 01H
1329bb374ac2SGlen BarberCounts cycles the Super Queue is full.
1330bb374ac2SGlen BarberNeither of the threads on this core will be able to access the uncore.
13311fa7f10bSFabien Thomas.It Li FP_ASSIST.ALL
13321fa7f10bSFabien Thomas.Pq Event F7H , Umask 01H
13331fa7f10bSFabien ThomasCounts the number of floating point operations executed that required
1334bb374ac2SGlen Barbermicro-code assist intervention.
1335bb374ac2SGlen BarberAssists are required in the following cases:
13361fa7f10bSFabien ThomasSSE instructions, (Denormal input when the DAZ flag is off or Underflow
13371fa7f10bSFabien Thomasresult when the FTZ flag is off): x87 instructions, (NaN or denormal are
13381fa7f10bSFabien Thomasloaded to a register or used as input from memory, Division by 0 or
13391fa7f10bSFabien ThomasUnderflow output).
13401fa7f10bSFabien Thomas.It Li FP_ASSIST.OUTPUT
13411fa7f10bSFabien Thomas.Pq Event F7H , Umask 02H
13421fa7f10bSFabien ThomasCounts number of floating point micro-code assist when the output value
13431fa7f10bSFabien Thomas(destination register) is invalid.
13441fa7f10bSFabien Thomas.It Li FP_ASSIST.INPUT
13451fa7f10bSFabien Thomas.Pq Event F7H , Umask 04H
13461fa7f10bSFabien ThomasCounts number of floating point micro-code assist when the input value (one
13471fa7f10bSFabien Thomasof the source operands to an FP instruction) is invalid.
13481fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_MPY
13491fa7f10bSFabien Thomas.Pq Event FDH , Umask 01H
13501fa7f10bSFabien ThomasCounts number of SID integer 64 bit packed multiply operations.
13511fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_SHIFT
13521fa7f10bSFabien Thomas.Pq Event FDH , Umask 02H
13531fa7f10bSFabien ThomasCounts number of SID integer 64 bit packed shift operations.
13541fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACK
13551fa7f10bSFabien Thomas.Pq Event FDH , Umask 04H
13561fa7f10bSFabien ThomasCounts number of SID integer 64 bit pack operations.
13571fa7f10bSFabien Thomas.It Li SIMD_INT_64.UNPACK
13581fa7f10bSFabien Thomas.Pq Event FDH , Umask 08H
13591fa7f10bSFabien ThomasCounts number of SID integer 64 bit unpack operations.
13601fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_LOGICAL
13611fa7f10bSFabien Thomas.Pq Event FDH , Umask 10H
13621fa7f10bSFabien ThomasCounts number of SID integer 64 bit logical operations.
13631fa7f10bSFabien Thomas.It Li SIMD_INT_64.PACKED_ARITH
13641fa7f10bSFabien Thomas.Pq Event FDH , Umask 20H
13651fa7f10bSFabien ThomasCounts number of SID integer 64 bit arithmetic operations.
13661fa7f10bSFabien Thomas.It Li SIMD_INT_64.SHUFFLE_MOVE
13671fa7f10bSFabien Thomas.Pq Event FDH , Umask 40H
13681fa7f10bSFabien ThomasCounts number of SID integer 64 bit shift or move operations.
13691fa7f10bSFabien Thomas.El
13701fa7f10bSFabien Thomas.Sh SEE ALSO
13711fa7f10bSFabien Thomas.Xr pmc 3 ,
13721fa7f10bSFabien Thomas.Xr pmc.atom 3 ,
13731fa7f10bSFabien Thomas.Xr pmc.core 3 ,
13741fa7f10bSFabien Thomas.Xr pmc.iaf 3 ,
13751fa7f10bSFabien Thomas.Xr pmc.ucf 3 ,
13761fa7f10bSFabien Thomas.Xr pmc.k7 3 ,
13771fa7f10bSFabien Thomas.Xr pmc.k8 3 ,
13781fa7f10bSFabien Thomas.Xr pmc.p4 3 ,
13791fa7f10bSFabien Thomas.Xr pmc.p5 3 ,
13801fa7f10bSFabien Thomas.Xr pmc.p6 3 ,
13811fa7f10bSFabien Thomas.Xr pmc.corei7 3 ,
13821fa7f10bSFabien Thomas.Xr pmc.corei7uc 3 ,
13831fa7f10bSFabien Thomas.Xr pmc.westmereuc 3 ,
1384*f5f9340bSFabien Thomas.Xr pmc.soft 3 ,
13851fa7f10bSFabien Thomas.Xr pmc.tsc 3 ,
13861fa7f10bSFabien Thomas.Xr pmc_cpuinfo 3 ,
13871fa7f10bSFabien Thomas.Xr pmclog 3 ,
13881fa7f10bSFabien Thomas.Xr hwpmc 4
13891fa7f10bSFabien Thomas.Sh HISTORY
13901fa7f10bSFabien ThomasThe
13911fa7f10bSFabien Thomas.Nm pmc
13921fa7f10bSFabien Thomaslibrary first appeared in
13931fa7f10bSFabien Thomas.Fx 6.0 .
13941fa7f10bSFabien Thomas.Sh AUTHORS
13951fa7f10bSFabien ThomasThe
13961fa7f10bSFabien Thomas.Lb libpmc
13971fa7f10bSFabien Thomaslibrary was written by
13981fa7f10bSFabien Thomas.An "Joseph Koshy"
13991fa7f10bSFabien Thomas.Aq jkoshy@FreeBSD.org .
1400