xref: /freebsd/lib/libpmc/pmc.3 (revision 35a04710d7286aa9538917fd7f8e417dbee95b82)
1.\" Copyright (c) 2003-2007 Joseph Koshy.  All rights reserved.
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" This software is provided by Joseph Koshy ``as is'' and
13.\" any express or implied warranties, including, but not limited to, the
14.\" implied warranties of merchantability and fitness for a particular purpose
15.\" are disclaimed.  in no event shall Joseph Koshy be liable
16.\" for any direct, indirect, incidental, special, exemplary, or consequential
17.\" damages (including, but not limited to, procurement of substitute goods
18.\" or services; loss of use, data, or profits; or business interruption)
19.\" however caused and on any theory of liability, whether in contract, strict
20.\" liability, or tort (including negligence or otherwise) arising in any way
21.\" out of the use of this software, even if advised of the possibility of
22.\" such damage.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd November 25, 2007
27.Os
28.Dt PMC 3
29.Sh NAME
30.Nm pmc
31.Nd library for accessing hardware performance monitoring counters
32.Sh LIBRARY
33.Lb libpmc
34.Sh SYNOPSIS
35.In pmc.h
36.Sh DESCRIPTION
37The
38.Lb libpmc
39provides a programming interface that allows applications to use
40hardware performance counters to gather performance data about
41specific processes or for the system as a whole.
42The library is implemented using the lower-level facilities offered by
43the
44.Xr hwpmc 4
45driver.
46.Ss Key Concepts
47Performance monitoring counters (PMCs) are represented by the library
48using a software abstraction.
49These
50.Dq abstract
51PMCs can have one two scopes:
52.Bl -bullet
53.It
54System scope.
55These PMCs measure events in a whole-system manner, i.e., independent
56of the currently executing thread.
57System scope PMCs are allocated on specific CPUs and do not
58migrate between CPUs.
59Non-privileged process are allowed to allocate system scope PMCs if the
60.Xr hwpmc 4
61sysctl tunable:
62.Va security.bsd.unprivileged_syspmcs
63is non-zero.
64.It
65Process scope.
66These PMCs only measure hardware events when the processes they are
67attached to are executing on a CPU.
68In an SMP system, process scope PMCs migrate between CPUs along with
69their target processes.
70.El
71.Pp
72Orthogonal to PMC scope, PMCs may be allocated in one of two
73operational modes:
74.Bl -bullet
75.It
76Counting PMCs measure events according to their scope
77(system or process).
78The application needs to explicitly read these counters
79to retrieve their value.
80.It
81Sampling PMCs cause the CPU to be periodically interrupted
82and information about its state of execution to be collected.
83Sampling PMCs are used to profile specific processes and kernel
84threads or to profile the system as a whole.
85.El
86.Pp
87The scope and operational mode for a software PMC are specified at
88PMC allocation time.
89An application is allowed to allocate multiple PMCs subject
90to availability of hardware resources.
91.Pp
92The library uses human-readable strings to name the event being
93measured by hardware.
94The syntax used for specifying a hardware event along with additional
95event specific qualifiers (if any) is described in detail in section
96.Sx "EVENT SPECIFIERS"
97below.
98.Pp
99PMCs are associated with the process that allocated them and
100will be automatically reclaimed by the system when the process exits.
101Additionally, process-scope PMCs have to be attached to one or more
102target processes before they can perform measurements.
103A process-scope PMC may be attached to those target processes
104that its owner process would otherwise be permitted to debug.
105An owner process may attach PMCs to itself allowing
106it to measure its own behavior.
107Additionally, on some machine architectures, such self-attached PMCs
108may be read cheaply using specialized instructions supported by the
109processor.
110.Pp
111Certain kinds of PMCs require that a log file be configured before
112they may be started.
113These include:
114.Bl -bullet -compact
115.It
116System scope sampling PMCs.
117.It
118Process scope sampling PMCs.
119.It
120Process scope counting PMCs that have been configured to report PMC
121readings on process context switches or process exits.
122.El
123Upto one log file may be configured per owner process.
124Events logged to a log file may be subsequently analyzed using the
125.Xr pmclog 3
126family of functions.
127.Ss Supported CPUs
128The CPUs known to the PMC library are named by the
129.Vt "enum pmc_cputype"
130enumeration.
131Supported CPUs include:
132.Bl -tag -width PMC_CPU_INTEL_PIII -compact
133.It PMC_CPU_AMD_K7
134.Tn "AMD Athlon"
135CPUs.
136.It PMC_CPU_AMD_K8
137.Tn "AMD Athlon64"
138CPUs.
139.It PMC_CPU_INTEL_P6
140.Tn Intel
141.Tn "Pentium Pro"
142CPUs.
143.It PMC_CPU_INTEL_PII
144.Tn "Intel Pentium II"
145CPUs.
146.It PMC_CPU_INTEL_PIII
147.Tn "Intel Pentium III"
148CPUs.
149.It PMC_CPU_INTEL_PM
150.Tn "Intel Pentium M"
151CPUs.
152.It PMC_CPU_INTEL_PIV
153.Tn "Intel Pentium 4"
154CPUs.
155.El
156.Ss Supported PMCs
157PMC supported by this library are named by the
158.Vt enum pmc_class
159enumeration.
160Supported PMC kinds include:
161.Bl -tag -width PMC_CLASS_TSC -compact
162.It PMC_CLASS_TSC
163The timestamp counter on i386 and amd64 architecture CPUs.
164.It PMC_CLASS_K7
165Programmable hardware counters present in
166.Tn "AMD Athlon"
167CPUs.
168.It PMC_CLASS_K8
169Programmable hardware counters present in
170.Tn "AMD Athlon64"
171CPUs.
172.It PMC_CLASS_P6
173Programmable hardware counters present in
174.Tn Intel
175.Tn "Pentium Pro" ,
176.Tn "Pentium II" ,
177.Tn "Pentium III" ,
178.Tn "Celeron" ,
179and
180.Tn "Pentium M"
181CPUs.
182.It PMC_CLASS_P4
183Programmable hardware counters present in
184.Tn "Intel Pentium 4"
185CPUs.
186.El
187.Ss PMC Capabilities
188.Pp
189Capabilities of performance monitoring hardware are denoted using
190the
191.Vt "enum pmc_caps"
192enumeration.
193Supported capabilities include:
194.Bl -tag -width "PMC_CAP_INTERRUPT" -compact
195.It PMC_CAP_EDGE
196The ability to count negated to asserted transitions of the hardware
197conditions being probed for.
198.It PMC_CAP_INTERRUPT
199The ability to interrupt the CPU.
200.It PMC_CAP_INVERT
201The ability to invert the sense of the hardware conditions being
202measured.
203.It PMC_CAP_READ
204PMC hardware allows the CPU to read performance counters.
205.It PMC_CAP_QUALIFIER
206The hardware allows monitored to be further qualified in some
207system dependent way.
208.It PMC_CAP_SYSTEM
209The ability to restrict counting of hardware events to when the CPU is
210running privileged code.
211.It PMC_CAP_THRESHOLD
212The ability to ignore simultaneous hardware events below a
213programmable threshold.
214.It PMC_CAP_USER
215The ability to restrict counting of hardware events to those when the
216CPU is running unprivileged code.
217.It PMC_CAP_WRITE
218PMC hardware allows CPUs write to counters.
219.El
220.Ss Functional Grouping
221This section contains a brief overview of the available functionality
222in the PMC library.
223Each function listed here is described further in its own manual page.
224.Bl -tag -width indent
225.It Administration
226.Bl -tag -compact
227.It Fn pmc_disable , Fn pmc_enable
228Administratively disable (enable) specific performance monitoring
229counter hardware.
230Counters that are disabled will not be available to applications to
231use.
232.El
233.It "Convenience Functions"
234.Bl -tag -compact
235.It Fn pmc_event_names_of_class
236Returns a list of event names supported by a given PMC type.
237.It Fn pmc_name_of_capability
238Convert a
239.Dv PMC_CAP_*
240flag to a human-readable string.
241.It Fn pmc_name_of_class
242Convert a
243.Dv PMC_CLASS_*
244constant to a human-readable string.
245.It Fn pmc_name_of_cputype
246Return a human-readable name for a CPU type.
247.It Fn pmc_name_of_disposition
248Return a human-readable string describing a PMC's disposition.
249.It Fn pmc_name_of_event
250Convert a numeric event code to a human-readable string.
251.It Fn pmc_name_of_mode
252Convert a
253.Dv PMC_MODE_*
254constant to a human-readable name.
255.It Fn pmc_name_of_state
256Return a human-readable string describing a PMC's current state.
257.El
258.It "Library Initialization"
259.Bl -tag -compact
260.It Fn pmc_init
261Initialize the library.
262This function must be called before any other library function.
263.El
264.It "Log File Handling"
265.Bl -tag -compact
266.It Fn pmc_configure_logfile
267Configure a log file for
268.Xr hwpmc 4
269to write logged events to.
270.It Fn pmc_flush_logfile
271Flush all pending log data in
272.Xr hwpmc 4 Ns Ap s
273buffers.
274.It Fn pmc_writelog
275Append arbitrary user data to the current log file.
276.El
277.It "PMC Management"
278.Bl -tag -compact
279.It Fn pmc_allocate , Fn pmc_release
280Allocate (free) a PMC.
281.It Fn pmc_attach , Fn pmc_detach
282Attach (detach) a process scope PMC to a target.
283.It Fn pmc_read , Fn pmc_write , Fn pmc_rw
284Read (write) a value from (to) a PMC.
285.It Fn pmc_start , Fn pmc_stop
286Start (stop) a software PMC.
287.It Fn pmc_set
288Set the reload value for a sampling PMC.
289.El
290.It "Queries"
291.Bl -tag -compact
292.It Fn pmc_capabilities
293Retrieve the capabilities for a given PMC.
294.It Fn pmc_cpuinfo
295Retrieve information about the CPUs and PMC hardware present in the
296system.
297.It Fn pmc_get_driver_stats
298Retrieve statistics maintained by
299.Xr hwpmc 4 .
300.It Fn pmc_ncpu
301Determine the number of CPUs in the system.
302.It Fn pmc_npmc
303Return the number of hardware PMCs present in a given CPU.
304.It Fn pmc_pmcinfo
305Return information about the state of a given CPU's PMCs.
306.It Fn pmc_width
307Determine the width of a hardware counter in bits.
308.El
309.It "x86 Architecture Specific API"
310.Bl -tag -compact
311.It Fn pmc_get_msr
312Returns the processor model specific register number
313associated with
314.Fa pmc .
315Applications may then use the x86
316.Ic RDPMC
317instruction to directly read the contents of the PMC.
318.El
319.El
320.Ss Signal Handling Requirements
321Applications using PMCs are required to handle the following signals:
322.Bl -tag -width ".Dv SIGBUS"
323.It Dv SIGBUS
324When the
325.Xr hwpmc 4
326module is unloaded using
327.Xr kldunload 8 ,
328processes that have PMCs allocated to them will be sent a
329.Dv SIGBUS
330signal.
331.It Dv SIGIO
332The
333.Xr hwpmc 4
334driver will send a PMC owning process a
335.Dv SIGIO
336signal if:
337.Bl -bullet
338.It
339If any process-mode PMC allocated by it loses all its
340target processes.
341.It
342If the driver encounters an error when writing log data to a
343configured log file.
344This error may be retrieved by a subsequent call to
345.Fn pmc_flush_logfile .
346.El
347.El
348.Ss Typical Program Flow
349.Bl -enum
350.It
351An application would first invoke function
352.Fn pmc_init
353to allow the library to initialize itself.
354.It
355Signal handling would then be set up.
356.It
357Next the application would allocate the PMCs it desires using function
358.Fn pmc_allocate .
359.It
360Initial values for PMCs may be set using function
361.Fn pmc_set .
362.It
363If a log file is necessary for the PMCs to work, it would
364be configured using function
365.Fn pmc_configure_logfile .
366.It
367Process scope PMCs would then be attached to their target processes
368using function
369.Fn pmc_attach .
370.It
371The PMCs would then be started using function
372.Fn pmc_start .
373.It
374Once started, the values of counting PMCs may be read using function
375.Fn pmc_start .
376For PMCs that write events to the log file, this logged data would be
377read and parsed using the
378.Xr pmclog 3
379family of functions.
380.It
381PMCs are stopped using function
382.Fn pmc_stop ,
383and process scope PMCs are detached from their targets using
384function
385.Fn pmc_detach .
386.It
387Before the process exits, its may release its PMCs using function
388.Fn pmc_release .
389Any configured log file may be closed using function
390.Fn pmc_configure_logfile .
391.El
392.Sh EVENT SPECIFIERS
393Event specifiers are strings comprising of an event name, followed by
394optional parameters modifying the semantics of the hardware event
395being probed.
396Event names are PMC architecture dependent, but the
397.Xr hwpmc 4
398library defines machine independent aliases for commonly used
399events.
400.Ss Event Name Aliases
401Event name aliases are CPU architecture independent names for commonly
402used events.
403The following aliases are known to this version of the
404.Nm pmc
405library:
406.Bl -tag -width indent
407.It Li branches
408Measure the number of branches retired.
409.It Li branch-mispredicts
410Measure the number of retired branches that were mispredicted.
411.It Li cycles
412Measure processor cycles.
413This event is implemented using the processor's Time Stamp Counter
414register.
415.It Li dc-misses
416Measure the number of data cache misses.
417.It Li ic-misses
418Measure the number of instruction cache misses.
419.It Li instructions
420Measure the number of instructions retired.
421.It Li interrupts
422Measure the number of interrupts seen.
423.It Li unhalted-cycles
424Measure the number of cycles the processor is not in a halted
425or sleep state.
426.El
427.Ss Time Stamp Counter (TSC)
428The timestamp counter is a monotonically non-decreasing counter that
429counts processor cycles.
430.Pp
431In the i386 architecture, this counter may
432be selected by requesting an event with event specifier
433.Dq Li tsc .
434The
435.Dq Li tsc
436event does not support any further qualifiers.
437It can only be allocated in system-wide counting mode,
438and is a read-only counter.
439Multiple processes are allowed to allocate the TSC.
440Once allocated, it may be read using the
441.Fn pmc_read
442function, or by using the RDTSC instruction.
443.Ss AMD (K7) PMCs
444These PMCs are present in the
445.Tn "AMD Athlon"
446series of CPUs and are documented in:
447.Rs
448.%B "AMD Athlon Processor x86 Code Optimization Guide"
449.%N "Publication No. 22007"
450.%D "February 2002"
451.%Q "Advanced Micro Devices, Inc."
452.Re
453.Pp
454Event specifiers for AMD K7 PMCs can have the following optional
455qualifiers:
456.Bl -tag -width indent
457.It Li count= Ns Ar value
458Configure the counter to increment only if the number of configured
459events measured in a cycle is greater than or equal to
460.Ar value .
461.It Li edge
462Configure the counter to only count negated-to-asserted transitions
463of the conditions expressed by the other qualifiers.
464In other words, the counter will increment only once whenever a given
465condition becomes true, irrespective of the number of clocks during
466which the condition remains true.
467.It Li inv
468Invert the sense of comparision when the
469.Dq Li count
470qualifier is present, making the counter to increment when the
471number of events per cycle is less than the value specified by
472the
473.Dq Li count
474qualifier.
475.It Li os
476Configure the PMC to count events happening at privilege level 0.
477.It Li unitmask= Ns Ar mask
478This qualifier is used to further qualify a select few events,
479.Dq Li k7-dc-refills-from-l2 ,
480.Dq Li k7-dc-refills-from-system
481and
482.Dq Li k7-dc-writebacks .
483Here
484.Ar mask
485is a string of the following characters optionally separated by
486.Ql +
487characters:
488.Pp
489.Bl -tag -width indent -compact
490.It Li m
491Count operations for lines in the
492.Dq Modified
493state.
494.It Li o
495Count operations for lines in the
496.Dq Owner
497state.
498.It Li e
499Count operations for lines in the
500.Dq Exclusive
501state.
502.It Li s
503Count operations for lines in the
504.Dq Shared
505state.
506.It Li i
507Count operations for lines in the
508.Dq Invalid
509state.
510.El
511.Pp
512If no
513.Dq Li unitmask
514qualifier is specified, the default is to count events for caches
515lines in any of the above states.
516.It Li usr
517Configure the PMC to count events occurring at privilege levels 1, 2
518or 3.
519.El
520.Pp
521If neither of the
522.Dq Li os
523or
524.Dq Li usr
525qualifiers were specified, the default is to enable both.
526.Pp
527The event specifiers supported on AMD K7 PMCs are:
528.Bl -tag -width indent
529.It Li k7-dc-accesses
530Count data cache accesses.
531.It Li k7-dc-misses
532Count data cache misses.
533.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask
534Count data cache refills from L2 cache.
535This event may be further qualified using the
536.Dq Li unitmask
537qualifier.
538.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask
539Count data cache refills from system memory.
540This event may be further qualified using the
541.Dq Li unitmask
542qualifier.
543.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask
544Count data cache writebacks.
545This event may be further qualified using the
546.Dq Li unitmask
547qualifier.
548.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits
549Count L1 DTLB misses and L2 DTLB hits.
550.It Li k7-l1-and-l2-dtlb-misses
551Count L1 and L2 DTLB misses.
552.It Li k7-misaligned-references
553Count misaligned data references.
554.It Li k7-ic-fetches
555Count instruction cache fetches.
556.It Li k7-ic-misses
557Count instruction cache misses.
558.It Li k7-l1-itlb-misses
559Count L1 ITLB misses that are L2 ITLB hits.
560.It Li k7-l1-l2-itlb-misses
561Count L1 (and L2) ITLB misses.
562.It Li k7-retired-instructions
563Count all retired instructions.
564.It Li k7-retired-ops
565Count retired ops.
566.It Li k7-retired-branches
567Count all retired branches (conditional, unconditional, exceptions
568and interrupts).
569.It Li k7-retired-branches-mispredicted
570Count all misprediced retired branches.
571.It Li k7-retired-taken-branches
572Count retired taken branches.
573.It Li k7-retired-taken-branches-mispredicted
574Count mispredicted taken branches that were retired.
575.It Li k7-retired-far-control-transfers
576Count retired far control transfers.
577.It Li k7-retired-resync-branches
578Count retired resync branches (non control transfer branches).
579.It Li k7-interrupts-masked-cycles
580Count the number of cycles when the processor's
581.Va IF
582flag was zero.
583.It Li k7-interrupts-masked-while-pending-cycles
584Count the number of cycles interrupts were masked while pending due
585to the processor's
586.Va IF
587flag being zero.
588.It Li k7-hardware-interrupts
589Count the number of taken hardware interrupts.
590.El
591.Ss AMD (K8) PMCs
592These PMCs are present in the
593.Tn "AMD Athlon64"
594and
595.Tn "AMD Opteron"
596series of CPUs.
597They are documented in:
598.Rs
599.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
600.%N "Publication No. 26094"
601.%D "April 2004"
602.%Q "Advanced Micro Devices, Inc."
603.Re
604.Pp
605Event specifiers for AMD K8 PMCs can have the following optional
606qualifiers:
607.Bl -tag -width indent
608.It Li count= Ns Ar value
609Configure the counter to increment only if the number of configured
610events measured in a cycle is greater than or equal to
611.Ar value .
612.It Li edge
613Configure the counter to only count negated-to-asserted transitions
614of the conditions expressed by the other fields.
615In other words, the counter will increment only once whenever a given
616condition becomes true, irrespective of the number of clocks during
617which the condition remains true.
618.It Li inv
619Invert the sense of comparision when the
620.Dq Li count
621qualifier is present, making the counter to increment when the
622number of events per cycle is less than the value specified by
623the
624.Dq Li count
625qualifier.
626.It Li mask= Ns Ar qualifier
627Many event specifiers for AMD K8 PMCs need to be additionally
628qualified using a mask qualifier.
629These additional qualifiers are event-specific and are documented
630along with their associated event specifiers below.
631.It Li os
632Configure the PMC to count events happening at privilege level 0.
633.It Li usr
634Configure the PMC to count events occurring at privilege levels 1, 2
635or 3.
636.El
637.Pp
638If neither of the
639.Dq Li os
640or
641.Dq Li usr
642qualifiers were specified, the default is to enable both.
643.Pp
644The event specifiers supported on AMD K8 PMCs are:
645.Bl -tag -width indent
646.It Li k8-bu-cpu-clk-unhalted
647Count the number of clock cycles when the CPU is not in the HLT or
648STPCLK states.
649.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
650Count fill requests that missed in the L2 cache.
651This event may be further qualified using
652.Ar qualifier ,
653which is a
654.Ql +
655separated set of the following keywords:
656.Pp
657.Bl -tag -width indent -compact
658.It Li dc-fill
659Count data cache fill requests.
660.It Li ic-fill
661Count instruction cache fill requests.
662.It Li tlb-reload
663Count TLB reloads.
664.El
665.Pp
666The default is to count all types of requests.
667.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
668Count internally generated requests to the L2 cache.
669This event may be further qualified using
670.Ar qualifier ,
671which is a
672.Ql +
673separated set of the following keywords:
674.Pp
675.Bl -tag -width indent -compact
676.It Li cancelled
677Count cancelled requests.
678.It Li dc-fill
679Count data cache fill requests.
680.It Li ic-fill
681Count instruction cache fill requests.
682.It Li tag-snoop
683Count tag snoop requests.
684.It Li tlb-reload
685Count TLB reloads.
686.El
687.Pp
688The default is to count all types of requests.
689.It Li k8-dc-access
690Count data cache accesses including microcode scratchpad accesses.
691.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
692Count data cache copyback operations.
693This event may be further qualified using
694.Ar qualifier ,
695which is a
696.Ql +
697separated set of the following keywords:
698.Pp
699.Bl -tag -width indent -compact
700.It Li exclusive
701Count operations for lines in the
702.Dq exclusive
703state.
704.It Li invalid
705Count operations for lines in the
706.Dq invalid
707state.
708.It Li modified
709Count operations for lines in the
710.Dq modified
711state.
712.It Li owner
713Count operations for lines in the
714.Dq owner
715state.
716.It Li shared
717Count operations for lines in the
718.Dq shared
719state.
720.El
721.Pp
722The default is to count operations for lines in all the
723above states.
724.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
725Count data cache accesses by lock instructions.
726This event is only available on processors of revision C or later
727vintage.
728This event may be further qualified using
729.Ar qualifier ,
730which is a
731.Ql +
732separated set of the following keywords:
733.Pp
734.Bl -tag -width indent -compact
735.It Li accesses
736Count data cache accesses by lock instructions.
737.It Li misses
738Count data cache misses by lock instructions.
739.El
740.Pp
741The default is to count all accesses.
742.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
743Count the number of dispatched prefetch instructions.
744This event may be further qualified using
745.Ar qualifier ,
746which is a
747.Ql +
748separated set of the following keywords:
749.Pp
750.Bl -tag -width indent -compact
751.It Li load
752Count load operations.
753.It Li nta
754Count non-temporal operations.
755.It Li store
756Count store operations.
757.El
758.Pp
759The default is to count all operations.
760.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
761Count L1 DTLB misses that are L2 DTLB hits.
762.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
763Count L1 DTLB misses that are also misses in the L2 DTLB.
764.It Li k8-dc-microarchitectural-early-cancel-of-an-access
765Count microarchitectural early cancels of data cache accesses.
766.It Li k8-dc-microarchitectural-late-cancel-of-an-access
767Count microarchitectural late cancels of data cache accesses.
768.It Li k8-dc-misaligned-data-reference
769Count misaligned data references.
770.It Li k8-dc-miss
771Count data cache misses.
772.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
773Count one bit ECC errors found by the scrubber.
774This event may be further qualified using
775.Ar qualifier ,
776which is a
777.Ql +
778separated set of the following keywords:
779.Pp
780.Bl -tag -width indent -compact
781.It Li scrubber
782Count scrubber detected errors.
783.It Li piggyback
784Count piggyback scrubber errors.
785.El
786.Pp
787The default is to count both kinds of errors.
788.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
789Count data cache refills from L2 cache.
790This event may be further qualified using
791.Ar qualifier ,
792which is a
793.Ql +
794separated set of the following keywords:
795.Pp
796.Bl -tag -width indent -compact
797.It Li exclusive
798Count operations for lines in the
799.Dq exclusive
800state.
801.It Li invalid
802Count operations for lines in the
803.Dq invalid
804state.
805.It Li modified
806Count operations for lines in the
807.Dq modified
808state.
809.It Li owner
810Count operations for lines in the
811.Dq owner
812state.
813.It Li shared
814Count operations for lines in the
815.Dq shared
816state.
817.El
818.Pp
819The default is to count operations for lines in all the
820above states.
821.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
822Count data cache refills from system memory.
823This event may be further qualified using
824.Ar qualifier ,
825which is a
826.Ql +
827separated set of the following keywords:
828.Pp
829.Bl -tag -width indent -compact
830.It Li exclusive
831Count operations for lines in the
832.Dq exclusive
833state.
834.It Li invalid
835Count operations for lines in the
836.Dq invalid
837state.
838.It Li modified
839Count operations for lines in the
840.Dq modified
841state.
842.It Li owner
843Count operations for lines in the
844.Dq owner
845state.
846.It Li shared
847Count operations for lines in the
848.Dq shared
849state.
850.El
851.Pp
852The default is to count operations for lines in all the
853above states.
854.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
855Count the number of dispatched FPU ops.
856This event is supported in revision B and later CPUs.
857This event may be further qualified using
858.Ar qualifier ,
859which is a
860.Ql +
861separated set of the following keywords:
862.Pp
863.Bl -tag -width indent -compact
864.It Li add-pipe-excluding-junk-ops
865Count add pipe ops excluding junk ops.
866.It Li add-pipe-junk-ops
867Count junk ops in the add pipe.
868.It Li multiply-pipe-excluding-junk-ops
869Count multiply pipe ops excluding junk ops.
870.It Li multiply-pipe-junk-ops
871Count junk ops in the multiply pipe.
872.It Li store-pipe-excluding-junk-ops
873Count store pipe ops excluding junk ops
874.It Li store-pipe-junk-ops
875Count junk ops in the store pipe.
876.El
877.Pp
878The default is to count all types of ops.
879.It Li k8-fp-cycles-with-no-fpu-ops-retired
880Count cycles when no FPU ops were retired.
881This event is supported in revision B and later CPUs.
882.It Li k8-fp-dispatched-fpu-fast-flag-ops
883Count dispatched FPU ops that use the fast flag interface.
884This event is supported in revision B and later CPUs.
885.It Li k8-fr-decoder-empty
886Count cycles when there was nothing to dispatch (i.e., the decoder
887was empty).
888.It Li k8-fr-dispatch-stalls
889Count all dispatch stalls.
890.It Li k8-fr-dispatch-stall-for-segment-load
891Count dispatch stalls for segment loads.
892.It Li k8-fr-dispatch-stall-for-serialization
893Count dispatch stalls for serialization.
894.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
895Count dispatch stalls from branch abort to retiral.
896.It Li k8-fr-dispatch-stall-when-fpu-is-full
897Count dispatch stalls when the FPU is full.
898.It Li k8-fr-dispatch-stall-when-ls-is-full
899Count dispatch stalls when the load/store unit is full.
900.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
901Count dispatch stalls when the reorder buffer is full.
902.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
903Count dispatch stalls when reservation stations are full.
904.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
905Count dispatch stalls when waiting for all to be quiet.
906.\" XXX What does "waiting for all to be quiet" mean?
907.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
908Count dispatch stalls when a far control transfer or a resync branch
909is pending.
910.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
911Count FPU exceptions.
912This event is supported in revision B and later CPUs.
913This event may be further qualified using
914.Ar qualifier ,
915which is a
916.Ql +
917separated set of the following keywords:
918.Pp
919.Bl -tag -width indent -compact
920.It Li sse-and-x87-microtraps
921Count SSE and x87 microtraps.
922.It Li sse-reclass-microfaults
923Count SSE reclass microfaults
924.It Li sse-retype-microfaults
925Count SSE retype microfaults
926.It Li x87-reclass-microfaults
927Count x87 reclass microfaults.
928.El
929.Pp
930The default is to count all types of exceptions.
931.It Li k8-fr-interrupts-masked-cycles
932Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
933.It Li k8-fr-interrupts-masked-while-pending-cycles
934Count cycles while interrupts were masked while pending (i.e., cycles
935when INTR was asserted while CPU RFLAGS field IF was zero).
936.It Li k8-fr-number-of-breakpoints-for-dr0
937Count the number of breakpoints for DR0.
938.It Li k8-fr-number-of-breakpoints-for-dr1
939Count the number of breakpoints for DR1.
940.It Li k8-fr-number-of-breakpoints-for-dr2
941Count the number of breakpoints for DR2.
942.It Li k8-fr-number-of-breakpoints-for-dr3
943Count the number of breakpoints for DR3.
944.It Li k8-fr-retired-branches
945Count retired branches including exceptions and interrupts.
946.It Li k8-fr-retired-branches-mispredicted
947Count mispredicted retired branches.
948.It Li k8-fr-retired-far-control-transfers
949Count retired far control transfers (which are always mispredicted).
950.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
951Count retired fastpath double op instructions.
952This event is supported in revision B and later CPUs.
953This event may be further qualified using
954.Ar qualifier ,
955which is a
956.Ql +
957separated set of the following keywords:
958.Pp
959.Bl -tag -width indent -compact
960.It Li low-op-pos-0
961Count instructions with the low op in position 0.
962.It Li low-op-pos-1
963Count instructions with the low op in position 1.
964.It Li low-op-pos-2
965Count instructions with the low op in position 2.
966.El
967.Pp
968The default is to count all types of instructions.
969.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
970Count retired FPU instructions.
971This event is supported in revision B and later CPUs.
972This event may be further qualified using
973.Ar qualifier ,
974which is a
975.Ql +
976separated set of the following keywords:
977.Pp
978.Bl -tag -width indent -compact
979.It Li mmx-3dnow
980Count MMX and 3DNow!\& instructions.
981.It Li packed-sse-sse2
982Count packed SSE and SSE2 instructions.
983.It Li scalar-sse-sse2
984Count scalar SSE and SSE2 instructions
985.It Li x87
986Count x87 instructions.
987.El
988.Pp
989The default is to count all types of instructions.
990.It Li k8-fr-retired-near-returns
991Count retired near returns.
992.It Li k8-fr-retired-near-returns-mispredicted
993Count mispredicted near returns.
994.It Li k8-fr-retired-resyncs
995Count retired resyncs (non-control transfer branches).
996.It Li k8-fr-retired-taken-hardware-interrupts
997Count retired taken hardware interrupts.
998.It Li k8-fr-retired-taken-branches
999Count retired taken branches.
1000.It Li k8-fr-retired-taken-branches-mispredicted
1001Count retired taken branches that were mispredicted.
1002.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
1003Count retired taken branches that were mispredicted only due to an
1004address miscompare.
1005.It Li k8-fr-retired-uops
1006Count retired uops.
1007.It Li k8-fr-retired-x86-instructions
1008Count retired x86 instructions including exceptions and interrupts.
1009.It Li k8-ic-fetch
1010Count instruction cache fetches.
1011.It Li k8-ic-instruction-fetch-stall
1012Count cycles in stalls due to instruction fetch.
1013.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
1014Count L1 ITLB misses that are L2 ITLB hits.
1015.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
1016Count ITLB misses that miss in both L1 and L2 ITLBs.
1017.It Li k8-ic-microarchitectural-resync-by-snoop
1018Count microarchitectural resyncs caused by snoops.
1019.It Li k8-ic-miss
1020Count instruction cache misses.
1021.It Li k8-ic-refill-from-l2
1022Count instruction cache refills from L2 cache.
1023.It Li k8-ic-refill-from-system
1024Count instruction cache refills from system memory.
1025.It Li k8-ic-return-stack-hits
1026Count hits to the return stack.
1027.It Li k8-ic-return-stack-overflow
1028Count overflows of the return stack.
1029.It Li k8-ls-buffer2-full
1030Count load/store buffer2 full events.
1031.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
1032Count locked operations.
1033For revision C and later CPUs, the following qualifiers are supported:
1034.Pp
1035.Bl -tag -width indent -compact
1036.It Li cycles-in-request
1037Count the number of cycles in the lock request/grant stage.
1038.It Li cycles-to-complete
1039Count the number of cycles a lock takes to complete once it is
1040non-speculative and is the older load/store operation.
1041.It Li locked-instructions
1042Count the number of lock instructions executed.
1043.El
1044.Pp
1045The default is to count the number of lock instructions executed.
1046.It Li k8-ls-microarchitectural-late-cancel
1047Count microarchitectural late cancels of operations in the load/store
1048unit.
1049.It Li k8-ls-microarchitectural-resync-by-self-modifying-code
1050Count microarchitectural resyncs caused by self-modifying code.
1051.It Li k8-ls-microarchitectural-resync-by-snoop
1052Count microarchitectural resyncs caused by snoops.
1053.It Li k8-ls-retired-cflush-instructions
1054Count retired CFLUSH instructions.
1055.It Li k8-ls-retired-cpuid-instructions
1056Count retired CPUID instructions.
1057.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
1058Count segment register loads.
1059This event may be further qualified using
1060.Ar qualifier ,
1061which is a
1062.Ql +
1063separated set of the following keywords:
1064.Bl -tag -width indent -compact
1065.It Li cs
1066Count CS register loads.
1067.It Li ds
1068Count DS register loads.
1069.It Li es
1070Count ES register loads.
1071.It Li fs
1072Count FS register loads.
1073.It Li gs
1074Count GS register loads.
1075.\" .It Li hs
1076.\" Count HS register loads.
1077.\" XXX "HS" register?
1078.It Li ss
1079Count SS register loads.
1080.El
1081.Pp
1082The default is to count all types of loads.
1083.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
1084Count memory controller bypass counter saturation events.
1085This event may be further qualified using
1086.Ar qualifier ,
1087which is a
1088.Ql +
1089separated set of the following keywords:
1090.Pp
1091.Bl -tag -width indent -compact
1092.It Li dram-controller-interface-bypass
1093Count DRAM controller interface bypass.
1094.It Li dram-controller-queue-bypass
1095Count DRAM controller queue bypass.
1096.It Li memory-controller-hi-pri-bypass
1097Count memory controller high priority bypasses.
1098.It Li memory-controller-lo-pri-bypass
1099Count memory controller low priority bypasses.
1100.El
1101.Pp
1102.It Li k8-nb-memory-controller-dram-slots-missed
1103Count memory controller DRAM command slots missed (in MemClks).
1104.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
1105Count memory controller page access events.
1106This event may be further qualified using
1107.Ar qualifier ,
1108which is a
1109.Ql +
1110separated set of the following keywords:
1111.Pp
1112.Bl -tag -width indent -compact
1113.It Li page-conflict
1114Count page conflicts.
1115.It Li page-hit
1116Count page hits.
1117.It Li page-miss
1118Count page misses.
1119.El
1120.Pp
1121The default is to count all types of events.
1122.It Li k8-nb-memory-controller-page-table-overflow
1123Count memory control page table overflow events.
1124.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
1125Count probe events.
1126This event may be further qualified using
1127.Ar qualifier ,
1128which is a
1129.Ql +
1130separated set of the following keywords:
1131.Pp
1132.Bl -tag -width indent -compact
1133.It Li probe-hit
1134Count all probe hits.
1135.It Li probe-hit-dirty-no-memory-cancel
1136Count probe hits without memory cancels.
1137.It Li probe-hit-dirty-with-memory-cancel
1138Count probe hits with memory cancels.
1139.It Li probe-miss
1140Count probe misses.
1141.El
1142.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
1143Count sized commands issued.
1144This event may be further qualified using
1145.Ar qualifier ,
1146which is a
1147.Ql +
1148separated set of the following keywords:
1149.Pp
1150.Bl -tag -width indent -compact
1151.It Li nonpostwrszbyte
1152.It Li nonpostwrszdword
1153.It Li postwrszbyte
1154.It Li postwrszdword
1155.It Li rdszbyte
1156.It Li rdszdword
1157.It Li rdmodwr
1158.El
1159.Pp
1160The default is to count all types of commands.
1161.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
1162Count memory control turnaround events.
1163This event may be further qualified using
1164.Ar qualifier ,
1165which is a
1166.Ql +
1167separated set of the following keywords:
1168.Pp
1169.Bl -tag -width indent -compact
1170.\" XXX doc is unclear whether these are cycle counts or event counts
1171.It Li dimm-turnaround
1172Count DIMM turnarounds.
1173.It Li read-to-write-turnaround
1174Count read to write turnarounds.
1175.It Li write-to-read-turnaround
1176Count write to read turnarounds.
1177.El
1178.Pp
1179The default is to count all types of events.
1180.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
1181.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
1182.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
1183Count events on the HyperTransport(tm) buses.
1184These events may be further qualified using
1185.Ar qualifier ,
1186which is a
1187.Ql +
1188separated set of the following keywords:
1189.Pp
1190.Bl -tag -width indent -compact
1191.It Li buffer-release
1192Count buffer release messages sent.
1193.It Li command
1194Count command messages sent.
1195.It Li data
1196Count data messages sent.
1197.It Li nop
1198Count nop messages sent.
1199.El
1200.Pp
1201The default is to count all types of messages.
1202.El
1203.Ss Intel P6 PMCS
1204Intel P6 PMCs are present in Intel
1205.Tn "Pentium Pro" ,
1206.Tn "Pentium II" ,
1207.Tn Celeron ,
1208.Tn "Pentium III"
1209and
1210.Tn "Pentium M"
1211processors.
1212.Pp
1213These CPUs have two counters.
1214Some events may only be used on specific counters and some events are
1215defined only on specific processor models.
1216.Pp
1217These PMCs are documented in
1218.Rs
1219.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
1220.%T "Volume 3: System Programming Guide"
1221.%N "Order Number 245472-012"
1222.%D 2003
1223.%Q "Intel Corporation"
1224.Re
1225.Pp
1226Some of these events are affected by processor errata described in
1227.Rs
1228.%B "Intel(R) Pentium(R) III Processor Specification Update"
1229.%N "Document Number: 244453-054"
1230.%D "April 2005"
1231.%Q "Intel Corporation"
1232.Re
1233.Pp
1234Event specifiers for Intel P6 PMCs can have the following common
1235qualifiers:
1236.Bl -tag -width indent
1237.It Li cmask= Ns Ar value
1238Configure the PMC to increment only if the number of configured
1239events measured in a cycle is greater than or equal to
1240.Ar value .
1241.It Li edge
1242Configure the PMC to count the number of deasserted to asserted
1243transitions of the conditions expressed by the other qualifiers.
1244If specified, the counter will increment only once whenever a
1245condition becomes true, irrespective of the number of clocks during
1246which the condition remains true.
1247.It Li inv
1248Invert the sense of comparision when the
1249.Dq Li cmask
1250qualifier is present, making the counter increment when the number of
1251events per cycle is less than the value specified by the
1252.Dq Li cmask
1253qualifier.
1254.It Li os
1255Configure the PMC to count events happening at processor privilege
1256level 0.
1257.It Li umask= Ns Ar value
1258This qualifier is used to further qualify the event selected (see
1259below).
1260.It Li usr
1261Configure the PMC to count events occurring at privilege levels 1, 2
1262or 3.
1263.El
1264.Pp
1265If neither of the
1266.Dq Li os
1267or
1268.Dq Li usr
1269qualifiers are specified, the default is to enable both.
1270.Pp
1271The event specifiers supported by Intel P6 PMCs are:
1272.Bl -tag -width indent
1273.It Li p6-baclears
1274Count the number of times a static branch prediction was made by the
1275branch decoder because the BTB did not have a prediction.
1276.It Li p6-br-bac-missp-exec
1277.Pq Tn "Pentium M"
1278Count the number of branch instructions executed that where
1279mispredicted at the Front End (BAC).
1280.It Li p6-br-bogus
1281Count the number of bogus branches.
1282.It Li p6-br-call-exec
1283.Pq Tn "Pentium M"
1284Count the number of call instructions executed.
1285.It Li p6-br-call-missp-exec
1286.Pq Tn "Pentium M"
1287Count the number of call instructions executed that were mispredicted.
1288.It Li p6-br-cnd-exec
1289.Pq Tn "Pentium M"
1290Count the number of conditional branch instructions executed.
1291.It Li p6-br-cnd-missp-exec
1292.Pq Tn "Pentium M"
1293Count the number of conditional branch instructions executed that were
1294mispredicted.
1295.It Li p6-br-ind-call-exec
1296.Pq Tn "Pentium M"
1297Count the number of indirect call instructions executed.
1298.It Li p6-br-ind-exec
1299.Pq Tn "Pentium M"
1300Count the number of indirect branch instructions executed.
1301.It Li p6-br-ind-missp-exec
1302.Pq Tn "Pentium M"
1303Count the number of indirect branch instructions executed that were
1304mispredicted.
1305.It Li p6-br-inst-decoded
1306Count the number of branch instructions decoded.
1307.It Li p6-br-inst-exec
1308.Pq Tn "Pentium M"
1309Count the number of branch instructions executed but necessarily retired.
1310.It Li p6-br-inst-retired
1311Count the number of branch instructions retired.
1312.It Li p6-br-miss-pred-retired
1313Count the number of mispredicted branch instructions retired.
1314.It Li p6-br-miss-pred-taken-ret
1315Count the number of taken mispredicted branches retired.
1316.It Li p6-br-missp-exec
1317.Pq Tn "Pentium M"
1318Count the number of branch instructions executed that were
1319mispredicted at execution.
1320.It Li p6-br-ret-bac-missp-exec
1321.Pq Tn "Pentium M"
1322Count the number of return instructions executed that were
1323mispredicted at the Front End (BAC).
1324.It Li p6-br-ret-exec
1325.Pq Tn "Pentium M"
1326Count the number of return instructions executed.
1327.It Li p6-br-ret-missp-exec
1328.Pq Tn "Pentium M"
1329Count the number of return instructions executed that were
1330mispredicted at execution.
1331.It Li p6-br-taken-retired
1332Count the number of taken branches retired.
1333.It Li p6-btb-misses
1334Count the number of branches for which the BTB did not produce a
1335prediction.
1336.It Li p6-bus-bnr-drv
1337Count the number of bus clock cycles during which this processor is
1338driving the BNR# pin.
1339.It Li p6-bus-data-rcv
1340Count the number of bus clock cycles during which this processor is
1341receiving data.
1342.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier
1343Count the number of clocks during which DRDY# is asserted.
1344An additional qualifier may be specified, and comprises one of the
1345following keywords:
1346.Pp
1347.Bl -tag -width indent -compact
1348.It Li any
1349Count transactions generated by any agent on the bus.
1350.It Li self
1351Count transactions generated by this processor.
1352.El
1353.Pp
1354The default is to count operations generated by this processor.
1355.It Li p6-bus-hit-drv
1356Count the number of bus clock cycles during which this processor is
1357driving the HIT# pin.
1358.It Li p6-bus-hitm-drv
1359Count the number of bus clock cycles during which this processor is
1360driving the HITM# pin.
1361.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier
1362Count the number of clocks during with LOCK# is asserted on the
1363external system bus.
1364An additional qualifier may be specified and comprises one of the following
1365keywords:
1366.Pp
1367.Bl -tag -width indent -compact
1368.It Li any
1369Count transactions generated by any agent on the bus.
1370.It Li self
1371Count transactions generated by this processor.
1372.El
1373.Pp
1374The default is to count operations generated by this processor.
1375.It Li p6-bus-req-outstanding
1376Count the number of bus requests outstanding in any given cycle.
1377.It Li p6-bus-snoop-stall
1378Count the number of clock cycles during which the bus is snoop stalled.
1379.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier
1380Count the number of completed bus transactions of any kind.
1381An additional qualifier may be specified and comprises one of the following
1382keywords:
1383.Pp
1384.Bl -tag -width indent -compact
1385.It Li any
1386Count transactions generated by any agent on the bus.
1387.It Li self
1388Count transactions generated by this processor.
1389.El
1390.Pp
1391The default is to count operations generated by this processor.
1392.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier
1393Count the number of burst read transactions.
1394An additional qualifier may be specified and comprises one of the following
1395keywords:
1396.Pp
1397.Bl -tag -width indent -compact
1398.It Li any
1399Count transactions generated by any agent on the bus.
1400.It Li self
1401Count transactions generated by this processor.
1402.El
1403.Pp
1404The default is to count operations generated by this processor.
1405.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier
1406Count the number of completed burst transactions.
1407An additional qualifier may be specified and comprises one of the following
1408keywords:
1409.Pp
1410.Bl -tag -width indent -compact
1411.It Li any
1412Count transactions generated by any agent on the bus.
1413.It Li self
1414Count transactions generated by this processor.
1415.El
1416.Pp
1417The default is to count operations generated by this processor.
1418.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier
1419Count the number of completed deferred transactions.
1420An additional qualifier may be specified and comprises one of the following
1421keywords:
1422.Pp
1423.Bl -tag -width indent -compact
1424.It Li any
1425Count transactions generated by any agent on the bus.
1426.It Li self
1427Count transactions generated by this processor.
1428.El
1429.Pp
1430The default is to count operations generated by this processor.
1431.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier
1432Count the number of completed instruction fetch transactions.
1433An additional qualifier may be specified and comprises one of the following
1434keywords:
1435.Pp
1436.Bl -tag -width indent -compact
1437.It Li any
1438Count transactions generated by any agent on the bus.
1439.It Li self
1440Count transactions generated by this processor.
1441.El
1442.Pp
1443The default is to count operations generated by this processor.
1444.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier
1445Count the number of completed invalidate transactions.
1446An additional qualifier may be specified and comprises one of the following
1447keywords:
1448.Pp
1449.Bl -tag -width indent -compact
1450.It Li any
1451Count transactions generated by any agent on the bus.
1452.It Li self
1453Count transactions generated by this processor.
1454.El
1455.Pp
1456The default is to count operations generated by this processor.
1457.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier
1458Count the number of completed memory transactions.
1459An additional qualifier may be specified and comprises one of the following
1460keywords:
1461.Pp
1462.Bl -tag -width indent -compact
1463.It Li any
1464Count transactions generated by any agent on the bus.
1465.It Li self
1466Count transactions generated by this processor.
1467.El
1468.Pp
1469The default is to count operations generated by this processor.
1470.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier
1471Count the number of completed partial write transactions.
1472An additional qualifier may be specified and comprises one of the following
1473keywords:
1474.Pp
1475.Bl -tag -width indent -compact
1476.It Li any
1477Count transactions generated by any agent on the bus.
1478.It Li self
1479Count transactions generated by this processor.
1480.El
1481.Pp
1482The default is to count operations generated by this processor.
1483.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier
1484Count the number of completed read-for-ownership transactions.
1485An additional qualifier may be specified and comprises one of the following
1486keywords:
1487.Pp
1488.Bl -tag -width indent -compact
1489.It Li any
1490Count transactions generated by any agent on the bus.
1491.It Li self
1492Count transactions generated by this processor.
1493.El
1494.Pp
1495The default is to count operations generated by this processor.
1496.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier
1497Count the number of completed I/O transactions.
1498An additional qualifier may be specified and comprises one of the following
1499keywords:
1500.Pp
1501.Bl -tag -width indent -compact
1502.It Li any
1503Count transactions generated by any agent on the bus.
1504.It Li self
1505Count transactions generated by this processor.
1506.El
1507.Pp
1508The default is to count operations generated by this processor.
1509.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier
1510Count the number of completed partial transactions.
1511An additional qualifier may be specified and comprises one of the following
1512keywords:
1513.Pp
1514.Bl -tag -width indent -compact
1515.It Li any
1516Count transactions generated by any agent on the bus.
1517.It Li self
1518Count transactions generated by this processor.
1519.El
1520.Pp
1521The default is to count operations generated by this processor.
1522.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier
1523Count the number of completed write-back transactions.
1524An additional qualifier may be specified and comprises one of the following
1525keywords:
1526.Pp
1527.Bl -tag -width indent -compact
1528.It Li any
1529Count transactions generated by any agent on the bus.
1530.It Li self
1531Count transactions generated by this processor.
1532.El
1533.Pp
1534The default is to count operations generated by this processor.
1535.It Li p6-cpu-clk-unhalted
1536Count the number of cycles during with the processor was not halted.
1537.Pp
1538.Pq Tn "Pentium M"
1539Count the number of cycles during with the processor was not halted
1540and not in a thermal trip.
1541.It Li p6-cycles-div-busy
1542Count the number of cycles during which the divider is busy and cannot
1543accept new divides.
1544This event is only allocated on counter 0.
1545.It Li p6-cycles-in-pending-and-masked
1546Count the number of processor cycles for which interrupts were
1547disabled and interrupts were pending.
1548.It Li p6-cycles-int-masked
1549Count the number of processor cycles for which interrupts were
1550disabled.
1551.It Li p6-data-mem-refs
1552Count all loads and all stores using any memory type, including
1553internal retries.
1554Each part of a split store is counted separately.
1555.It Li p6-dcu-lines-in
1556Count the total lines allocated in the data cache unit.
1557.It Li p6-dcu-m-lines-in
1558Count the number of M state lines allocated in the data cache unit.
1559.It Li p6-dcu-m-lines-out
1560Count the number of M state lines evicted from the data cache unit.
1561.It Li p6-dcu-miss-outstanding
1562Count the weighted number of cycles while a data cache unit miss is
1563outstanding, incremented by the number of outstanding cache misses at
1564any time.
1565.It Li p6-div
1566Count the number of integer and floating-point divides including
1567speculative divides.
1568This event is only allocated on counter 1.
1569.It Li p6-emon-esp-uops
1570.Pq Tn "Pentium M"
1571Count the total number of micro-ops.
1572.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier
1573.Pq Tn "Pentium M"
1574Count the number of
1575.Tn "Enhanced Intel SpeedStep"
1576transitions.
1577An additional qualifier may be specified, and can be one of the
1578following keywords:
1579.Pp
1580.Bl -tag -width indent -compact
1581.It Li all
1582Count all transitions.
1583.It Li freq
1584Count only frequency transitions.
1585.El
1586.Pp
1587The default is to count all transitions.
1588.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier
1589.Pq Tn "Pentium M"
1590Count the number of retired fused micro-ops.
1591An additional qualifier may be specified, and may be one of the
1592following keywords:
1593.Pp
1594.Bl -tag -width indent -compact
1595.It Li all
1596Count all fused micro-ops.
1597.It Li loadop
1598Count only load and op micro-ops.
1599.It Li stdsta
1600Count only STD/STA micro-ops.
1601.El
1602.Pp
1603The default is to count all fused micro-ops.
1604.It Li p6-emon-kni-comp-inst-ret
1605.Pq Tn "Pentium III"
1606Count the number of SSE computational instructions retired.
1607An additional qualifier may be specified, and comprises one of the
1608following keywords:
1609.Pp
1610.Bl -tag -width indent -compact
1611.It Li packed-and-scalar
1612Count packed and scalar operations.
1613.It Li scalar
1614Count scalar operations only.
1615.El
1616.Pp
1617The default is to count packed and scalar operations.
1618.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier
1619.Pq Tn "Pentium III"
1620Count the number of SSE instructions retired.
1621An additional qualifier may be specified, and comprises one of the
1622following keywords:
1623.Pp
1624.Bl -tag -width indent -compact
1625.It Li packed-and-scalar
1626Count packed and scalar operations.
1627.It Li scalar
1628Count scalar operations only.
1629.El
1630.Pp
1631The default is to count packed and scalar operations.
1632.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier
1633.Pq Tn "Pentium III"
1634Count the number of SSE prefetch or weakly ordered instructions
1635dispatched (including speculative prefetches).
1636An additional qualifier may be specified, and comprises one of the
1637following keywords:
1638.Pp
1639.Bl -tag -width indent -compact
1640.It Li nta
1641Count non-temporal prefetches.
1642.It Li t1
1643Count prefetches to L1.
1644.It Li t2
1645Count prefetches to L2.
1646.It Li wos
1647Count weakly ordered stores.
1648.El
1649.Pp
1650The default is to count non-temporal prefetches.
1651.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier
1652.Pq Tn "Pentium III"
1653Count the number of prefetch or weakly ordered instructions that miss
1654all caches.
1655An additional qualifier may be specified, and comprises one of the
1656following keywords:
1657.Pp
1658.Bl -tag -width indent -compact
1659.It Li nta
1660Count non-temporal prefetches.
1661.It Li t1
1662Count prefetches to L1.
1663.It Li t2
1664Count prefetches to L2.
1665.It Li wos
1666Count weakly ordered stores.
1667.El
1668.Pp
1669The default is to count non-temporal prefetches.
1670.It Li p6-emon-pref-rqsts-dn
1671.Pq Tn "Pentium M"
1672Count the number of downward prefetches issued.
1673.It Li p6-emon-pref-rqsts-up
1674.Pq Tn "Pentium M"
1675Count the number of upward prefetches issued.
1676.It Li p6-emon-simd-instr-retired
1677.Pq Tn "Pentium M"
1678Count the number of retired
1679.Tn MMX
1680instructions.
1681.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier
1682.Pq Tn "Pentium M"
1683Count the number of computational SSE instructions retired.
1684An additional qualifier may be specified and can be one of the
1685following keywords:
1686.Pp
1687.Bl -tag -width indent -compact
1688.It Li sse-packed-single
1689Count SSE packed-single instructions.
1690.It Li sse-scalar-single
1691Count SSE scalar-single instructions.
1692.It Li sse2-packed-double
1693Count SSE2 packed-double instructions.
1694.It Li sse2-scalar-double
1695Count SSE2 scalar-double instructions.
1696.El
1697.Pp
1698The default is to count SSE packed-single instructions.
1699.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer
1700.Pp
1701.Pq Tn "Pentium M"
1702Count the number of SSE instructions retired.
1703An additional qualifier can be specified, and can be one of the
1704following keywords:
1705.Pp
1706.Bl -tag -width indent -compact
1707.It Li sse-packed-single
1708Count SSE packed-single instructions.
1709.It Li sse-packed-single-scalar-single
1710Count SSE packed-single and scalar-single instructions.
1711.It Li sse2-packed-double
1712Count SSE2 packed-double instructions.
1713.It Li sse2-scalar-double
1714Count SSE2 scalar-double instructions.
1715.El
1716.Pp
1717The default is to count SSE packed-single instructions.
1718.It Li p6-emon-synch-uops
1719.Pq Tn "Pentium M"
1720Count the number of sync micro-ops.
1721.It Li p6-emon-thermal-trip
1722.Pq Tn "Pentium M"
1723Count the duration or occurrences of thermal trips.
1724Use the
1725.Dq Li edge
1726qualifier to count occurrences of thermal trips.
1727.It Li p6-emon-unfusion
1728.Pq Tn "Pentium M"
1729Count the number of unfusion events in the reorder buffer.
1730.It Li p6-flops
1731Count the number of computational floating point operations retired.
1732This event is only allocated on counter 0.
1733.It Li p6-fp-assist
1734Count the number of floating point exceptions handled by microcode.
1735This event is only allocated on counter 1.
1736.It Li p6-fp-comps-ops-exe
1737Count the number of computation floating point operations executed.
1738This event is only allocated on counter 0.
1739.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier
1740.Pq Tn "Pentium II" , Tn "Pentium III"
1741Count the number of transitions between MMX and floating-point
1742instructions.
1743An additional qualifier may be specified, and comprises one of the
1744following keywords:
1745.Pp
1746.Bl -tag -width indent -compact
1747.It Li mmxtofp
1748Count transitions from MMX instructions to floating-point instructions.
1749.It Li fptommx
1750Count transitions from floating-point instructions to MMX instructions.
1751.El
1752.Pp
1753The default is to count MMX to floating-point transitions.
1754.It Li p6-hw-int-rx
1755Count the number of hardware interrupts received.
1756.It Li p6-ifu-fetch
1757Count the number of instruction fetches, both cacheable and non-cacheable.
1758.It Li p6-ifu-fetch-miss
1759Count the number of instruction fetch misses (i.e., those that produce
1760memory accesses).
1761.It Li p6-ifu-mem-stall
1762Count the number of cycles instruction fetch is stalled for any reason.
1763.It Li p6-ild-stall
1764Count the number of cycles the instruction length decoder is stalled.
1765.It Li p6-inst-decoded
1766Count the number of instructions decoded.
1767.It Li p6-inst-retired
1768Count the number of instructions retired.
1769.It Li p6-itlb-miss
1770Count the number of instruction TLB misses.
1771.It Li p6-l2-ads
1772Count the number of L2 address strobes.
1773.It Li p6-l2-dbus-busy
1774Count the number of cycles during which the L2 cache data bus was busy.
1775.It Li p6-l2-dbus-busy-rd
1776Count the number of cycles during which the L2 cache data bus was busy
1777transferring read data from L2 to the processor.
1778.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier
1779Count the number of L2 instruction fetches.
1780An additional qualifier may be specified and comprises a list of the following
1781keywords separated by
1782.Ql +
1783characters:
1784.Pp
1785.Bl -tag -width indent -compact
1786.It Li e
1787Count operations affecting E (exclusive) state lines.
1788.It Li i
1789Count operations affecting I (invalid) state lines.
1790.It Li m
1791Count operations affecting M (modified) state lines.
1792.It Li s
1793Count operations affecting S (shared) state lines.
1794.El
1795.Pp
1796The default is to count operations affecting all (MESI) state lines.
1797.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier
1798Count the number of L2 data loads.
1799An additional qualifier may be specified and comprises a list of the following
1800keywords separated by
1801.Ql +
1802characters:
1803.Pp
1804.Bl -tag -width indent -compact
1805.It Li both
1806.Pq Tn "Pentium M"
1807Count both hardware-prefetched lines and non-hardware-prefetched lines.
1808.It Li e
1809Count operations affecting E (exclusive) state lines.
1810.It Li hw
1811.Pq Tn "Pentium M"
1812Count hardware-prefetched lines only.
1813.It Li i
1814Count operations affecting I (invalid) state lines.
1815.It Li m
1816Count operations affecting M (modified) state lines.
1817.It Li nonhw
1818.Pq Tn "Pentium M"
1819Exclude hardware-prefetched lines.
1820.It Li s
1821Count operations affecting S (shared) state lines.
1822.El
1823.Pp
1824The default on processors other than
1825.Tn "Pentium M"
1826processors is to count operations affecting all (MESI) state lines.
1827The default on
1828.Tn "Pentium M"
1829processors is to count both hardware-prefetched and
1830non-hardware-prefetch operations on all (MESI) state lines.
1831.Pq Errata
1832This event is affected by processor errata E53.
1833.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier
1834Count the number of L2 lines allocated.
1835An additional qualifier may be specified and comprises a list of the following
1836keywords separated by
1837.Ql +
1838characters:
1839.Pp
1840.Bl -tag -width indent -compact
1841.It Li both
1842.Pq Tn "Pentium M"
1843Count both hardware-prefetched lines and non-hardware-prefetched lines.
1844.It Li e
1845Count operations affecting E (exclusive) state lines.
1846.It Li hw
1847.Pq Tn "Pentium M"
1848Count hardware-prefetched lines only.
1849.It Li i
1850Count operations affecting I (invalid) state lines.
1851.It Li m
1852Count operations affecting M (modified) state lines.
1853.It Li nonhw
1854.Pq Tn "Pentium M"
1855Exclude hardware-prefetched lines.
1856.It Li s
1857Count operations affecting S (shared) state lines.
1858.El
1859.Pp
1860The default on processors other than
1861.Tn "Pentium M"
1862processors is to count operations affecting all (MESI) state lines.
1863The default on
1864.Tn "Pentium M"
1865processors is to count both hardware-prefetched and
1866non-hardware-prefetch operations on all (MESI) state lines.
1867.Pq Errata
1868This event is affected by processor errata E45.
1869.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier
1870Count the number of L2 lines evicted.
1871An additional qualifier may be specified and comprises a list of the following
1872keywords separated by
1873.Ql +
1874characters:
1875.Pp
1876.Bl -tag -width indent -compact
1877.It Li both
1878.Pq Tn "Pentium M"
1879Count both hardware-prefetched lines and non-hardware-prefetched lines.
1880.It Li e
1881Count operations affecting E (exclusive) state lines.
1882.It Li hw
1883.Pq Tn "Pentium M"
1884Count hardware-prefetched lines only.
1885.It Li i
1886Count operations affecting I (invalid) state lines.
1887.It Li m
1888Count operations affecting M (modified) state lines.
1889.It Li nonhw
1890.Pq Tn "Pentium M" only
1891Exclude hardware-prefetched lines.
1892.It Li s
1893Count operations affecting S (shared) state lines.
1894.El
1895.Pp
1896The default on processors other than
1897.Tn "Pentium M"
1898processors is to count operations affecting all (MESI) state lines.
1899The default on
1900.Tn "Pentium M"
1901processors is to count both hardware-prefetched and
1902non-hardware-prefetch operations on all (MESI) state lines.
1903.Pq Errata
1904This event is affected by processor errata E45.
1905.It Li p6-l2-m-lines-inm
1906Count the number of modified lines allocated in L2 cache.
1907.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier
1908Count the number of L2 M-state lines evicted.
1909.Pp
1910.Pq Tn "Pentium M"
1911On these processors an additional qualifier may be specified and
1912comprises a list of the following keywords separated by
1913.Ql +
1914characters:
1915.Pp
1916.Bl -tag -width indent -compact
1917.It Li both
1918Count both hardware-prefetched lines and non-hardware-prefetched lines.
1919.It Li hw
1920Count hardware-prefetched lines only.
1921.It Li nonhw
1922Exclude hardware-prefetched lines.
1923.El
1924.Pp
1925The default is to count both hardware-prefetched and
1926non-hardware-prefetch operations.
1927.Pq Errata
1928This event is affected by processor errata E53.
1929.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier
1930Count the total number of L2 requests.
1931An additional qualifier may be specified and comprises a list of the following
1932keywords separated by
1933.Ql +
1934characters:
1935.Pp
1936.Bl -tag -width indent -compact
1937.It Li e
1938Count operations affecting E (exclusive) state lines.
1939.It Li i
1940Count operations affecting I (invalid) state lines.
1941.It Li m
1942Count operations affecting M (modified) state lines.
1943.It Li s
1944Count operations affecting S (shared) state lines.
1945.El
1946.Pp
1947The default is to count operations affecting all (MESI) state lines.
1948.It Li p6-l2-st
1949Count the number of L2 data stores.
1950An additional qualifier may be specified and comprises a list of the following
1951keywords separated by
1952.Ql +
1953characters:
1954.Pp
1955.Bl -tag -width indent -compact
1956.It Li e
1957Count operations affecting E (exclusive) state lines.
1958.It Li i
1959Count operations affecting I (invalid) state lines.
1960.It Li m
1961Count operations affecting M (modified) state lines.
1962.It Li s
1963Count operations affecting S (shared) state lines.
1964.El
1965.Pp
1966The default is to count operations affecting all (MESI) state lines.
1967.It Li p6-ld-blocks
1968Count the number of load operations delayed due to store buffer blocks.
1969.It Li p6-misalign-mem-ref
1970Count the number of misaligned data memory references (crossing a 64
1971bit boundary).
1972.It Li p6-mmx-assist
1973.Pq Tn "Pentium II" , Tn "Pentium III"
1974Count the number of MMX assists executed.
1975.It Li p6-mmx-instr-exec
1976.Pq Tn Celeron , Tn "Pentium II"
1977Count the number of MMX instructions executed, except MOVQ and MOVD
1978stores from register to memory.
1979.It Li p6-mmx-instr-ret
1980.Pq Tn "Pentium II"
1981Count the number of MMX instructions retired.
1982.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier
1983.Pq Tn "Pentium II" , Tn "Pentium III"
1984Count the number of MMX instructions executed.
1985An additional qualifier may be specified and comprises a list of
1986the following keywords separated by
1987.Ql +
1988characters:
1989.Pp
1990.Bl -tag -width indent -compact
1991.It Li pack
1992Count MMX pack operation instructions.
1993.It Li packed-arithmetic
1994Count MMX packed arithmetic instructions.
1995.It Li packed-logical
1996Count MMX packed logical instructions.
1997.It Li packed-multiply
1998Count MMX packed multiply instructions.
1999.It Li packed-shift
2000Count MMX packed shift instructions.
2001.It Li unpack
2002Count MMX unpack operation instructions.
2003.El
2004.Pp
2005The default is to count all operations.
2006.It Li p6-mmx-sat-instr-exec
2007.Pq Tn "Pentium II" , Tn "Pentium III"
2008Count the number of MMX saturating instructions executed.
2009.It Li p6-mmx-uops-exec
2010.Pq Tn "Pentium II" , Tn "Pentium III"
2011Count the number of MMX micro-ops executed.
2012.It Li p6-mul
2013Count the number of integer and floating-point multiplies, including
2014speculative multiplies.
2015This event is only allocated on counter 1.
2016.It Li p6-partial-rat-stalls
2017Count the number of cycles or events for partial stalls.
2018.It Li p6-resource-stalls
2019Count the number of cycles there was a resource related stall of any kind.
2020.It Li p6-ret-seg-renames
2021.Pq Tn "Pentium II" , Tn "Pentium III"
2022Count the number of segment register rename events retired.
2023.It Li p6-sb-drains
2024Count the number of cycles the store buffer is draining.
2025.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier
2026.Pq Tn "Pentium II" , Tn "Pentium III"
2027Count the number of segment register renames.
2028An additional qualifier may be specified, and comprises a list of the
2029following keywords separated by
2030.Ql +
2031characters:
2032.Pp
2033.Bl -tag -width indent -compact
2034.It Li ds
2035Count renames for segment register DS.
2036.It Li es
2037Count renames for segment register ES.
2038.It Li fs
2039Count renames for segment register FS.
2040.It Li gs
2041Count renames for segment register GS.
2042.El
2043.Pp
2044The default is to count operations affecting all segment registers.
2045.It Li p6-seg-rename-stalls
2046.Pq Tn "Pentium II" , Tn "Pentium III"
2047Count the number of segment register renaming stalls.
2048An additional qualifier may be specified, and comprises a list of the
2049following keywords separated by
2050.Ql +
2051characters:
2052.Pp
2053.Bl -tag -width indent -compact
2054.It Li ds
2055Count stalls for segment register DS.
2056.It Li es
2057Count stalls for segment register ES.
2058.It Li fs
2059Count stalls for segment register FS.
2060.It Li gs
2061Count stalls for segment register GS.
2062.El
2063.Pp
2064The default is to count operations affecting all the segment registers.
2065.It Li p6-segment-reg-loads
2066Count the number of segment register loads.
2067.It Li p6-uops-retired
2068Count the number of micro-ops retired.
2069.El
2070.Ss Intel P4 PMCS
2071Intel P4 PMCs are present in Intel
2072.Tn "Pentium 4"
2073and
2074.Tn Xeon
2075processors.
2076These PMCs are documented in
2077.Rs
2078.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
2079.%T "Volume 3: System Programming Guide"
2080.%N "Order Number 245472-012"
2081.%D 2003
2082.%Q "Intel Corporation"
2083.Re
2084Further information about using these PMCs may be found in
2085.Rs
2086.%B "IA-32 Intel(R) Architecture Optimization Guide"
2087.%D 2003
2088.%N "Order Number 248966-009"
2089.%Q "Intel Corporation"
2090.Re
2091Some of these events are affected by processor errata described in
2092.Rs
2093.%B "Intel(R) Pentium(R) 4 Processor Specification Update"
2094.%N "Document Number: 249199-059"
2095.%D "April 2005"
2096.%Q "Intel Corporation"
2097.Re
2098.Pp
2099Event specifiers for Intel P4 PMCs can have the following common
2100qualifiers:
2101.Bl -tag -width indent
2102.It Li active= Ns Ar choice
2103(On P4 HTT CPUs) Filter event counting based on which logical
2104processors are active.
2105The allowed values of
2106.Ar choice
2107are:
2108.Pp
2109.Bl -tag -width indent -compact
2110.It Li any
2111Count when either logical processor is active.
2112.It Li both
2113Count when both logical processors are active.
2114.It Li none
2115Count only when neither logical processor is active.
2116.It Li single
2117Count only when one logical processor is active.
2118.El
2119.Pp
2120The default is
2121.Dq Li both .
2122.It Li cascade
2123Configure the PMC to cascade onto its partner.
2124See
2125.Sx "Cascading P4 PMCs"
2126below for more information.
2127.It Li edge
2128Configure the counter to count false to true transitions of the threshold
2129comparision output.
2130This qualifier only takes effect if a threshold qualifier has also been
2131specified.
2132.It Li complement
2133Configure the counter to increment only when the event count seen is
2134less than the threshold qualifier value specified.
2135.It Li mask= Ns Ar qualifier
2136Many event specifiers for Intel P4 PMCs need to be additionally
2137qualified using a mask qualifier.
2138The allowed syntax for these qualifiers is event specific and is
2139described along with the events.
2140.It Li os
2141Configure the PMC to count when the CPL of the processor is 0.
2142.It Li precise
2143Select precise event based sampling.
2144Precise sampling is supported by the hardware for a limited set of
2145events.
2146.It Li tag= Ns Ar value
2147Configure the PMC to tag the internal uop selected by the other
2148fields in this event specifier with value
2149.Ar value .
2150This feature is used when cascading PMCs.
2151.It Li threshold= Ns Ar value
2152Configure the PMC to increment only when the event counts seen are
2153greater than the specified threshold value
2154.Ar value .
2155.It Li usr
2156Configure the PMC to count when the CPL of the processor is 1, 2 or 3.
2157.El
2158.Pp
2159If neither of the
2160.Dq Li os
2161or
2162.Dq Li usr
2163qualifiers are specified, the default is to enable both.
2164.Pp
2165On Intel Pentium 4 processors with HTT, events are
2166divided into two classes:
2167.Pp
2168.Bl -tag -width indent -compact
2169.It "TS Events"
2170are those where hardware can differentiate between events
2171generated on one logical processor from those generated on the
2172other.
2173.It "TI Events"
2174are those where hardware cannot differentiate between events
2175generated by multiple logical processors in a package.
2176.El
2177.Pp
2178Only TS events are allowed for use with process-mode PMCs on
2179Pentium-4/HTT CPUs.
2180.Pp
2181The event specifiers supported by Intel P4 PMCs are:
2182.Pp
2183.Bl -tag -width indent
2184.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags
2185.Pq "TI event"
2186Count integer SIMD SSE2 instructions that operate on 128 bit SIMD
2187operands.
2188Qualifier
2189.Ar flags
2190can take the following value (which is also the default):
2191.Pp
2192.Bl -tag -width indent -compact
2193.It Li all
2194Count all uops operating on 128 bit SIMD integer operands in memory or
2195XMM register.
2196.El
2197.Pp
2198If an instruction contains more than one 128 bit MMX uop, then each
2199uop will be counted.
2200.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags
2201.Pq "TI event"
2202Count MMX instructions that operate on 64 bit SIMD operands.
2203Qualifier
2204.Ar flags
2205can take the following value (which is also the default):
2206.Pp
2207.Bl -tag -width indent -compact
2208.It Li all
2209Count all uops operating on 64 bit SIMD integer operands in memory or
2210in MMX registers.
2211.El
2212.Pp
2213If an instruction contains more than one 64 bit MMX uop, then each
2214uop will be counted.
2215.It Li p4-b2b-cycles
2216.Pq "TI event"
2217Count back-to-back bys cycles.
2218Further documentation for this event is unavailable.
2219.It Li p4-bnr
2220.Pq "TI event"
2221Count bus-not-ready conditions.
2222Further documentation for this event is unavailable.
2223.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier
2224.Pq "TS event"
2225Count instruction fetch requests qualified by additional
2226flags specified in
2227.Ar qualifier .
2228At this point only one flag is supported:
2229.Pp
2230.Bl -tag -width indent -compact
2231.It Li tcmiss
2232Count trace cache lookup misses.
2233.El
2234.Pp
2235The default qualifier is also
2236.Dq Li mask=tcmiss .
2237.It Li p4-branch-retired Op Li ,mask= Ns Ar flags
2238.Pq "TS event"
2239Counts retired branches.
2240Qualifier
2241.Ar flags
2242is a list of the following
2243.Ql +
2244separated strings:
2245.Pp
2246.Bl -tag -width indent -compact
2247.It Li mmnp
2248Count branches not-taken and predicted.
2249.It Li mmnm
2250Count branches not-taken and mis-predicted.
2251.It Li mmtp
2252Count branches taken and predicted.
2253.It Li mmtm
2254Count branches taken and mis-predicted.
2255.El
2256.Pp
2257The default qualifier counts all four kinds of branches.
2258.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier
2259.Pq "TS event"
2260Count the number of entries (clipped at 15) currently active in the
2261BSQ.
2262Qualifier
2263.Ar qualifier
2264is a
2265.Ql +
2266separated set of the following flags:
2267.Pp
2268.Bl -tag -width indent -compact
2269.It Li req-type0 , Li req-type1
2270Forms a 2-bit number used to select the request type encoding:
2271.Pp
2272.Bl -tag -width indent -compact
2273.It Li 0
2274reads excluding read invalidate
2275.It Li 1
2276read invalidates
2277.It Li 2
2278writes other than writebacks
2279.It Li 3
2280writebacks
2281.El
2282.Pp
2283Bit
2284.Dq Li req-type1
2285is the MSB for this two bit number.
2286.It Li req-len0 , Li req-len1
2287Forms a two-bit number that specifies the request length encoding:
2288.Pp
2289.Bl -tag -width indent -compact
2290.It Li 0
22910 chunks
2292.It Li 1
22931 chunk
2294.It Li 3
22958 chunks
2296.El
2297.Pp
2298Bit
2299.Dq Li req-len1
2300is the MSB for this two bit number.
2301.It Li req-io-type
2302Count requests that are input or output requests.
2303.It Li req-lock-type
2304Count requests that lock the bus.
2305.It Li req-lock-cache
2306Count requests that lock the cache.
2307.It Li req-split-type
2308Count requests that is a bus 8-byte chunk that is split across an
23098-byte boundary.
2310.It Li req-dem-type
2311Count requests that are demand (not prefetches) if set.
2312Count requests that are prefetches if not set.
2313.It Li req-ord-type
2314Count requests that are ordered.
2315.It Li mem-type0 , Li mem-type1 , Li mem-type2
2316Forms a 3-bit number that specifies a memory type encoding:
2317.Pp
2318.Bl -tag -width indent -compact
2319.It Li 0
2320UC
2321.It Li 1
2322USWC
2323.It Li 4
2324WT
2325.It Li 5
2326WP
2327.It Li 6
2328WB
2329.El
2330.Pp
2331Bit
2332.Dq Li mem-type2
2333is the MSB of this 3-bit number.
2334.El
2335.Pp
2336The default qualifier has all the above bits set.
2337.Pp
2338Edge triggering using the
2339.Dq Li edge
2340qualifier should not be used with this event when counting cycles.
2341.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier
2342.Pq "TS event"
2343Count allocations in the bus sequence unit according to the flags
2344specified in
2345.Ar qualifier ,
2346which is a
2347.Ql +
2348separated set of the following flags:
2349.Pp
2350.Bl -tag -width indent -compact
2351.It Li req-type0 , Li req-type1
2352Forms a 2-bit number used to select the request type encoding:
2353.Pp
2354.Bl -tag -width indent -compact
2355.It Li 0
2356reads excluding read invalidate
2357.It Li 1
2358read invalidates
2359.It Li 2
2360writes other than writebacks
2361.It Li 3
2362writebacks
2363.El
2364.Pp
2365Bit
2366.Dq Li req-type1
2367is the MSB for this two bit number.
2368.It Li req-len0 , Li req-len1
2369Forms a two-bit number that specifies the request length encoding:
2370.Pp
2371.Bl -tag -width indent -compact
2372.It Li 0
23730 chunks
2374.It Li 1
23751 chunk
2376.It Li 3
23778 chunks
2378.El
2379.Pp
2380Bit
2381.Dq Li req-len1
2382is the MSB for this two bit number.
2383.It Li req-io-type
2384Count requests that are input or output requests.
2385.It Li req-lock-type
2386Count requests that lock the bus.
2387.It Li req-lock-cache
2388Count requests that lock the cache.
2389.It Li req-split-type
2390Count requests that is a bus 8-byte chunk that is split across an
23918-byte boundary.
2392.It Li req-dem-type
2393Count requests that are demand (not prefetches) if set.
2394Count requests that are prefetches if not set.
2395.It Li req-ord-type
2396Count requests that are ordered.
2397.It Li mem-type0 , Li mem-type1 , Li mem-type2
2398Forms a 3-bit number that specifies a memory type encoding:
2399.Pp
2400.Bl -tag -width indent -compact
2401.It Li 0
2402UC
2403.It Li 1
2404USWC
2405.It Li 4
2406WT
2407.It Li 5
2408WP
2409.It Li 6
2410WB
2411.El
2412.Pp
2413Bit
2414.Dq Li mem-type2
2415is the MSB of this 3-bit number.
2416.El
2417.Pp
2418The default qualifier has all the above bits set.
2419.Pp
2420This event is usually used along with the
2421.Dq Li edge
2422qualifier to avoid multiple counting.
2423.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier
2424.Pq "TS event"
2425Count cache references as seen by the bus unit (2nd or 3rd level
2426cache references).
2427Qualifier
2428.Ar qualifier
2429is a
2430.Ql +
2431separated list of the following keywords:
2432.Pp
2433.Bl -tag -width indent -compact
2434.It Li rd-2ndl-hits
2435Count 2nd level cache hits in the shared state.
2436.It Li rd-2ndl-hite
2437Count 2nd level cache hits in the exclusive state.
2438.It Li rd-2ndl-hitm
2439Count 2nd level cache hits in the modified state.
2440.It Li rd-3rdl-hits
2441Count 3rd level cache hits in the shared state.
2442.It Li rd-3rdl-hite
2443Count 3rd level cache hits in the exclusive state.
2444.It Li rd-3rdl-hitm
2445Count 3rd level cache hits in the modified state.
2446.It Li rd-2ndl-miss
2447Count 2nd level cache misses.
2448.It Li rd-3rdl-miss
2449Count 3rd level cache misses.
2450.It Li wr-2ndl-miss
2451Count write-back lookups from the data access cache that miss the 2nd
2452level cache.
2453.El
2454.Pp
2455The default is to count all the above events.
2456.It Li p4-execution-event Op Li ,mask= Ns Ar flags
2457.Pq "TS event"
2458Count the retirement of tagged uops selected through the execution
2459tagging mechanism.
2460Qualifier
2461.Ar flags
2462can contain the following strings separated by
2463.Ql +
2464characters:
2465.Pp
2466.Bl -tag -width indent -compact
2467.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3
2468The marked uops are not bogus.
2469.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3
2470The marked uops are bogus.
2471.El
2472.Pp
2473This event requires additional (upstream) events to be allocated to
2474perform the desired uop tagging.
2475The default is to set all the above flags.
2476This event can be used for precise event based sampling.
2477.It Li p4-front-end-event Op Li ,mask= Ns Ar flags
2478.Pq "TS event"
2479Count the retirement of tagged uops selected through the front-end
2480tagging mechanism.
2481Qualifier
2482.Ar flags
2483can contain the following strings separated by
2484.Ql +
2485characters:
2486.Pp
2487.Bl -tag -width indent -compact
2488.It Li nbogus
2489The marked uops are not bogus.
2490.It Li bogus
2491The marked uops are bogus.
2492.El
2493.Pp
2494This event requires additional (upstream) events to be allocated to
2495perform the desired uop tagging.
2496The default is to select both kinds of events.
2497This event can be used for precise event based sampling.
2498.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags
2499.Pq "TI event"
2500Count each DBSY or DRDY event selected by qualifier
2501.Ar flags .
2502Qualifier
2503.Ar flags
2504is a
2505.Ql +
2506separated set of the following flags:
2507.Pp
2508.Bl -tag -width indent -compact
2509.It Li drdy-drv
2510Count when this processor is driving data onto the bus.
2511.It Li drdy-own
2512Count when this processor is reading data from the bus.
2513.It Li drdy-other
2514Count when data is on the bus but not being sampled by this processor.
2515.It Li dbsy-drv
2516Count when this processor reserves the bus for use in the next cycle
2517in order to drive data.
2518.It Li dbsy-own
2519Count when some agent reserves the bus for use in the next bus cycle
2520to drive data that this processor will sample.
2521.It Li dbsy-other
2522Count when some agent reserves the bus for use in the next bus cycle
2523to drive data that this processor will not sample.
2524.El
2525.Pp
2526Flags
2527.Dq Li drdy-own
2528and
2529.Dq Li drdy-other
2530are mutually exclusive.
2531Flags
2532.Dq Li dbsy-own
2533and
2534.Dq Li dbsy-other
2535are mutually exclusive.
2536The default value for
2537.Ar qualifier
2538is
2539.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own .
2540.It Li p4-global-power-events Op Li ,mask= Ns Ar flags
2541.Pq "TS event"
2542Count cycles during which the processor is not stopped.
2543Qualifier
2544.Ar flags
2545can take the following value (which is also the default):
2546.Pp
2547.Bl -tag -width indent -compact
2548.It Li running
2549Count cycles when the processor is active.
2550.El
2551.Pp
2552.It Li p4-instr-retired Op Li ,mask= Ns Ar flags
2553.Pq "TS event"
2554Count instructions retired during a clock cycle.
2555Qualifer
2556.Ar flags
2557comprises of the following strings separated by
2558.Ql +
2559characters:
2560.Pp
2561.Bl -tag -width indent -compact
2562.It Li nbogusntag
2563Count non-bogus instructions that are not tagged.
2564.It Li nbogustag
2565Count non-bogus instructions that are tagged.
2566.It Li bogusntag
2567Count bogus instructions that are not tagged.
2568.It Li bogustag
2569Count bogus instructions that are tagged.
2570.El
2571.Pp
2572The default qualifier counts all the above kinds of instructions.
2573.It Li p4-ioq-active-entries Xo
2574.Op Li ,mask= Ns Ar qualifier
2575.Op Li ,busreqtype= Ns Ar req-type
2576.Xc
2577.Pq "TS event"
2578Count the number of entries (clipped at 15) in the IOQ that are
2579active.
2580The event masks are specified by qualifier
2581.Ar qualifier
2582and
2583.Ar req-type .
2584.Pp
2585Qualifier
2586.Ar qualifier
2587is a
2588.Ql +
2589separated set of the following flags:
2590.Pp
2591.Bl -tag -width indent -compact
2592.It Li all-read
2593Count read entries.
2594.It Li all-write
2595Count write entries.
2596.It Li mem-uc
2597Count entries accessing uncacheable memory.
2598.It Li mem-wc
2599Count entries accessing write-combining memory.
2600.It Li mem-wt
2601Count entries accessing write-through memory.
2602.It Li mem-wp
2603Count entries accessing write-protected memory
2604.It Li mem-wb
2605Count entries accessing write-back memory.
2606.It Li own
2607Count store requests driven by the processor (i.e., not by other
2608processors or by DMA).
2609.It Li other
2610Count store requests driven by other processors or by DMA.
2611.It Li prefetch
2612Include hardware and software prefetch requests in the count.
2613.El
2614.Pp
2615The default value for
2616.Ar qualifier
2617is to enable all the above flags.
2618.Pp
2619The
2620.Ar req-type
2621qualifier is a 5-bit number can be additionally used to select a
2622specific bus request type.
2623The default is 0.
2624.Pp
2625The
2626.Dq Li edge
2627qualifier should not be used when counting cycles with this event.
2628The exact behaviour of this event depends on the processor revision.
2629.It Li p4-ioq-allocation Xo
2630.Op Li ,mask= Ns Ar qualifier
2631.Op Li ,busreqtype= Ns Ar req-type
2632.Xc
2633.Pq "TS event"
2634Count various types of transactions on the bus matching the flags set
2635in
2636.Ar qualifier
2637and
2638.Ar req-type .
2639.Pp
2640Qualifier
2641.Ar qualifier
2642is a
2643.Ql +
2644separated set of the following flags:
2645.Pp
2646.Bl -tag -width indent -compact
2647.It Li all-read
2648Count read entries.
2649.It Li all-write
2650Count write entries.
2651.It Li mem-uc
2652Count entries accessing uncacheable memory.
2653.It Li mem-wc
2654Count entries accessing write-combining memory.
2655.It Li mem-wt
2656Count entries accessing write-through memory.
2657.It Li mem-wp
2658Count entries accessing write-protected memory
2659.It Li mem-wb
2660Count entries accessing write-back memory.
2661.It Li own
2662Count store requests driven by the processor (i.e., not by other
2663processors or by DMA).
2664.It Li other
2665Count store requests driven by other processors or by DMA.
2666.It Li prefetch
2667Include hardware and software prefetch requests in the count.
2668.El
2669.Pp
2670The default value for
2671.Ar qualifier
2672is to enable all the above flags.
2673.Pp
2674The
2675.Ar req-type
2676qualifier is a 5-bit number can be additionally used to select a
2677specific bus request type.
2678The default is 0.
2679.Pp
2680The
2681.Dq Li edge
2682qualifier is normally used with this event to prevent multiple
2683counting.
2684The exact behaviour of this event depends on the processor revision.
2685.It Li p4-itlb-reference Op mask= Ns Ar qualifier
2686.Pq "TS event"
2687Count translations using the intruction translation look-aside
2688buffer.
2689The
2690.Ar qualifier
2691argument is a list of the following strings separated by
2692.Ql +
2693characters.
2694.Pp
2695.Bl -tag -width indent -compact
2696.It Li hit
2697Count ITLB hits.
2698.It Li miss
2699Count ITLB misses.
2700.It Li hit-uc
2701Count uncacheable ITLB hits.
2702.El
2703.Pp
2704If no
2705.Ar qualifier
2706is specified the default is to count all the three kinds of ITLB
2707translations.
2708.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier
2709.Pq "TS event"
2710Count replayed events at the load port.
2711Qualifier
2712.Ar qualifier
2713can take on one value:
2714.Pp
2715.Bl -tag -width indent -compact
2716.It Li split-ld
2717Count split loads.
2718.El
2719.Pp
2720The default value for
2721.Ar qualifier
2722is
2723.Dq Li split-ld .
2724.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags
2725.Pq "TS event"
2726Count mispredicted IA-32 branch instructions.
2727Qualifier
2728.Ar flags
2729can take the following value (which is also the default):
2730.Pp
2731.Bl -tag -width indent -compact
2732.It Li nbogus
2733Count non-bogus retired branch instructions.
2734.El
2735.It Li p4-machine-clear Op Li ,mask= Ns Ar flags
2736.Pq "TS event"
2737Count the number of pipeline clears seen by the processor.
2738Qualifer
2739.Ar flags
2740is a list of the following strings separated by
2741.Ql +
2742characters:
2743.Pp
2744.Bl -tag -width indent -compact
2745.It Li clear
2746Count for a portion of the many cycles when the machine is being
2747cleared for any reason.
2748.It Li moclear
2749Count machine clears due to memory ordering issues.
2750.It Li smclear
2751Count machine clears due to self-modifying code.
2752.El
2753.Pp
2754Use qualifier
2755.Dq Li edge
2756to get a count of occurrences of machine clears.
2757The default qualifier is
2758.Dq Li clear .
2759.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list
2760.Pq "TS event"
2761Count the cancelling of various kinds of requests in the data cache
2762address control unit of the CPU.
2763The qualifier
2764.Ar event-list
2765is a list of the following strings separated by
2766.Ql +
2767characters:
2768.Pp
2769.Bl -tag -width indent -compact
2770.It Li st-rb-full
2771Requests cancelled because no store request buffer was available.
2772.It Li 64k-conf
2773Requests that conflict due to 64K aliasing.
2774.El
2775.Pp
2776If
2777.Ar event-list
2778is not specified, then the default is to count both kinds of events.
2779.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list
2780.Pq "TS event"
2781Count the completion of load split, store split, uncacheable split and
2782uncacheable load operations selected by qualifier
2783.Ar event-list .
2784The qualifier
2785.Ar event-list
2786is a
2787.Ql +
2788separated list of the following flags:
2789.Pp
2790.Bl -tag -width indent -compact
2791.It Li lsc
2792Count load splits completed, excluding loads from uncacheable or
2793write-combining areas.
2794.It Li ssc
2795Count any split stores completed.
2796.El
2797.Pp
2798The default is to count both kinds of operations.
2799.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier
2800.Pq "TS event"
2801Count load replays triggered by the memory order buffer.
2802Qualifier
2803.Ar qualifier
2804can be a
2805.Ql +
2806separated list of the following flags:
2807.Pp
2808.Bl -tag -width indent -compact
2809.It Li no-sta
2810Count replays because of unknown store addresses.
2811.It Li no-std
2812Count replays because of unknown store data.
2813.It Li partial-data
2814Count replays because of partially overlapped data accesses between
2815load and store operations.
2816.It Li unalgn-addr
2817Count replays because of mismatches in the lower 4 bits of load and
2818store operations.
2819.El
2820.Pp
2821The default qualifier is
2822.Ar no-sta+no-std+partial-data+unalgn-addr .
2823.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags
2824.Pq "TI event"
2825Count packed double-precision uops.
2826Qualifier
2827.Ar flags
2828can take the following value (which is also the default):
2829.Pp
2830.Bl -tag -width indent -compact
2831.It Li all
2832Count all uops operating on packed double-precision operands.
2833.El
2834.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags
2835.Pq "TI event"
2836Count packed single-precision uops.
2837Qualifier
2838.Ar flags
2839can take the following value (which is also the default):
2840.Pp
2841.Bl -tag -width indent -compact
2842.It Li all
2843Count all uops operating on packed single-precision operands.
2844.El
2845.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier
2846.Pq "TI event"
2847Count page walks performed by the page miss handler.
2848Qualifier
2849.Ar qualifier
2850can be a
2851.Ql +
2852separated list of the following keywords:
2853.Pp
2854.Bl -tag -width indent -compact
2855.It Li dtmiss
2856Count page walks for data TLB misses.
2857.It Li itmiss
2858Count page walks for instruction TLB misses.
2859.El
2860.Pp
2861The default value for
2862.Ar qualifier
2863is
2864.Dq Li dtmiss+itmiss .
2865.It Li p4-replay-event Op Li ,mask= Ns Ar flags
2866.Pq "TS event"
2867Count the retirement of tagged uops selected through the replay
2868tagging mechanism.
2869Qualifier
2870.Ar flags
2871contains a
2872.Ql +
2873separated set of the following strings:
2874.Pp
2875.Bl -tag -width indent -compact
2876.It Li nbogus
2877The marked uops are not bogus.
2878.It Li bogus
2879The marked uops are bogus.
2880.El
2881.Pp
2882This event requires additional (upstream) events to be allocated to
2883perform the desired uop tagging.
2884The default qualifier counts both kinds of uops.
2885This event can be used for precise event based sampling.
2886.It Li p4-resource-stall Op Li ,mask= Ns Ar flags
2887.Pq "TS event"
2888Count the occurrence or latency of stalls in the allocator.
2889Qualifier
2890.Ar flags
2891can take the following value (which is also the default):
2892.Pp
2893.Bl -tag -width indent -compact
2894.It Li sbfull
2895A stall due to the lack of store buffers.
2896.El
2897.It Li p4-response
2898.Pq "TI event"
2899Count different types of responses.
2900Further documentation on this event is not available.
2901.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags
2902.Pq "TS event"
2903Count branches retired.
2904Qualifier
2905.Ar flags
2906contains a
2907.Ql +
2908separated list of strings:
2909.Pp
2910.Bl -tag -width indent -compact
2911.It Li conditional
2912Count conditional jumps.
2913.It Li call
2914Count direct and indirect call branches.
2915.It Li return
2916Count return branches.
2917.It Li indirect
2918Count returns, indirect calls or indirect jumps.
2919.El
2920.Pp
2921The default qualifier counts all the above branch types.
2922.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags
2923.Pq "TS event"
2924Count mispredicted branches retired.
2925Qualifier
2926.Ar flags
2927contains a
2928.Ql +
2929separated list of strings:
2930.Pp
2931.Bl -tag -width indent -compact
2932.It Li conditional
2933Count conditional jumps.
2934.It Li call
2935Count indirect call branches.
2936.It Li return
2937Count return branches.
2938.It Li indirect
2939Count returns, indirect calls or indirect jumps.
2940.El
2941.Pp
2942The default qualifier counts all the above branch types.
2943.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags
2944.Pq "TI event"
2945Count the number of scalar double-precision uops.
2946Qualifier
2947.Ar flags
2948can take the following value (which is also the default):
2949.Pp
2950.Bl -tag -width indent -compact
2951.It Li all
2952Count the number of scalar double-precision uops.
2953.El
2954.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags
2955.Pq "TI event"
2956Count the number of scalar single-precision uops.
2957Qualifier
2958.Ar flags
2959can take the following value (which is also the default):
2960.Pp
2961.Bl -tag -width indent -compact
2962.It Li all
2963Count all uops operating on scalar single-precision operands.
2964.El
2965.It Li p4-snoop
2966.Pq "TI event"
2967Count snoop traffic.
2968Further documentation on this event is not available.
2969.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags
2970.Pq "TI event"
2971Count the number of times an assist is required to handle problems
2972with the operands for SSE and SSE2 operations.
2973Qualifier
2974.Ar flags
2975can take the following value (which is also the default):
2976.Pp
2977.Bl -tag -width indent -compact
2978.It Li all
2979Count assists for all SSE and SSE2 uops.
2980.El
2981.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier
2982.Pq "TS event"
2983Count events replayed at the store port.
2984Qualifier
2985.Ar qualifier
2986can take on one value:
2987.Pp
2988.Bl -tag -width indent -compact
2989.It Li split-st
2990Count split stores.
2991.El
2992.Pp
2993The default value for
2994.Ar qualifier
2995is
2996.Dq Li split-st .
2997.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier
2998.Pq "TI event"
2999Count the duration in cycles of operating modes of the trace cache and
3000decode engine.
3001The desired operating mode is selected by
3002.Ar qualifier ,
3003which is a list of the following strings separated by
3004.Ql +
3005characters:
3006.Pp
3007.Bl -tag -width indent -compact
3008.It Li DD
3009Both logical processors are in deliver mode.
3010.It Li DB
3011Logical processor 0 is in deliver mode while logical processor 1 is in
3012build mode.
3013.It Li DI
3014Logical processor 0 is in deliver mode while logical processor 1 is
3015halted, or in machine clear, or transitioning to a long microcode
3016flow.
3017.It Li BD
3018Logical processor 0 is in build mode while logical processor 1 is in
3019deliver mode.
3020.It Li BB
3021Both logical processors are in build mode.
3022.It Li BI
3023Logical processor 0 is in build mode while logical processor 1 is
3024halted, or in machine clear or transitioning to a long microcode
3025flow.
3026.It Li ID
3027Logical processor 0 is halted, or in machine clear or transitioning to
3028a long microcode flow while logical processor 1 is in deliver mode.
3029.It Li IB
3030Logical processor 0 is halted, or in machine clear or transitioning to
3031a long microcode flow while logical processor 1 is in build mode.
3032.El
3033.Pp
3034If there is only one logical processor in the processor package then
3035the qualifier for logical processor 1 is ignored.
3036If no qualifier is specified, the default qualifier is
3037.Dq Li DD+DB+DI+BD+BB+BI+ID+IB .
3038.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags
3039.Pq "TI event"
3040Count the number of times uop delivery changed from the trace cache to
3041MS ROM.
3042Qualifier
3043.Ar flags
3044can take the following value (which is also the default):
3045.Pp
3046.Bl -tag -width indent -compact
3047.It Li cisc
3048Count TC to MS transfers.
3049.El
3050.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags
3051.Pq "TS event"
3052Count the number of valid uops written to the uop queue.
3053Qualifier
3054.Ar flags
3055is a list of the following strings, separated by
3056.Ql +
3057characters:
3058.Pp
3059.Bl -tag -width indent -compact
3060.It Li from-tc-build
3061Count uops being written from the trace cache in build mode.
3062.It Li from-tc-deliver
3063Count uops being written from the trace cache in deliver mode.
3064.It Li from-rom
3065Count uops being written from microcode ROM.
3066.El
3067.Pp
3068The default qualifier counts all the above kinds of uops.
3069.It Li p4-uop-type Op Li ,mask= Ns Ar flags
3070.Pq "TS event"
3071This event is used in conjunction with the front-end at-retirement
3072mechanism to tag load and store uops.
3073Qualifer
3074.Ar flags
3075comprises the following strings separated by
3076.Ql +
3077characters:
3078.Pp
3079.Bl -tag -width indent -compact
3080.It Li tagloads
3081Mark uops that are load operations.
3082.It Li tagstores
3083Mark uops that are store operations.
3084.El
3085.Pp
3086The default qualifier counts both kinds of uops.
3087.It Li p4-uops-retired Op Li ,mask= Ns Ar flags
3088.Pq "TS event"
3089Count uops retired during a clock cycle.
3090Qualifier
3091.Ar flags
3092comprises the following strings separated by
3093.Ql +
3094characters:
3095.Pp
3096.Bl -tag -width indent -compact
3097.It Li nbogus
3098Count marked uops that are not bogus.
3099.It Li bogus
3100Count marked uops that are bogus.
3101.El
3102.Pp
3103The default qualifier counts both kinds of uops.
3104.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags
3105.Pq "TI event"
3106Count write-combining buffer operations.
3107Qualifier
3108.Ar flags
3109contains the following strings separated by
3110.Ql +
3111characters:
3112.Pp
3113.Bl -tag -width indent -compact
3114.It Li wcb-evicts
3115WC buffer evictions due to any cause.
3116.It Li wcb-full-evict
3117WC buffer evictions due to no WC buffer being available.
3118.El
3119.Pp
3120The default qualifer counts both kinds of evictions.
3121.It Li p4-x87-assist Op Li ,mask= Ns Ar flags
3122.Pq "TS event"
3123Count the retirement of x87 instructions that required special
3124handling.
3125Qualifier
3126.Ar flags
3127contains the following strings separated by
3128.Ql +
3129characters:
3130.Pp
3131.Bl -tag -width indent -compact
3132.It Li fpsu
3133Count instructions that saw an FP stack underflow.
3134.It Li fpso
3135Count instructions that saw an FP stack overflow.
3136.It Li poao
3137Count instructions that saw an x87 output overflow.
3138.It Li poau
3139Count instructions that saw an x87 output underflow.
3140.It Li prea
3141Count instructions that needed an x87 input assist.
3142.El
3143.Pp
3144The default qualifier counts all the above types of instruction
3145retirements.
3146.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags
3147.Pq "TI event"
3148Count x87 floating-point uops.
3149Qualifier
3150.Ar flags
3151can take the following value (which is also the default):
3152.Pp
3153.Bl -tag -width indent -compact
3154.It Li all
3155Count all x87 floating-point uops.
3156.El
3157.Pp
3158If an instruction contains more than one x87 floating-point uops, then
3159all x87 floating-point uops will be counted.
3160This event does not count x87 floating-point data movement operations.
3161.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags
3162.Pq "TI event"
3163Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store
3164data or perform register-to-register moves.
3165This event does not count integer move uops.
3166Qualifier
3167.Ar flags
3168may contain the following keywords separated by
3169.Ql +
3170characters:
3171.Pp
3172.Bl -tag -width indent -compact
3173.It Li allp0
3174Count all x87 and SIMD store and move uops.
3175.It Li allp2
3176Count all x87 and SIMD load uops.
3177.El
3178.Pp
3179The default is to count all uops.
3180.Pq Errata
3181This event may be affected by processor errata N43.
3182.El
3183.Ss "Cascading P4 PMCs"
3184PMC cascading support is currently poorly implemented.
3185While individual event counters may be allocated with a
3186.Dq Li cascade
3187qualifier, the current API does not offer the ability
3188to name and allocate all the resources needed for a
3189cascaded event counter pair in a single operation.
3190.Ss "Precise Event Based Sampling"
3191Support for precise event based sampling is currently
3192unimplemented.
3193.Sh COMPATIBILITY
3194The interface between the
3195.Nm pmc
3196library and the
3197.Xr hwpmc 4
3198driver is intended to be private to the implementation and may
3199change.
3200In order to ease forward compatibility with future versions of the
3201.Xr hwpmc 4
3202driver, applications are urged to dynamically link with the
3203.Nm pmc
3204library.
3205.Pp
3206The
3207.Nm pmc
3208API is
3209.Ud
3210.Sh SEE ALSO
3211.Xr pmclog 3 ,
3212.Xr hwpmc 4 ,
3213.Xr pmccontrol 8 ,
3214.Xr pmcstat 8
3215.Sh HISTORY
3216The
3217.Nm pmc
3218library first appeared in
3219.Fx 6.0 .
3220.Sh AUTHORS
3221The
3222.Lb libpmc
3223library was written by
3224.An "Joseph Koshy"
3225.Aq jkoshy@FreeBSD.org .
3226