xref: /freebsd/lib/libpmc/pmc.3 (revision 1e413cf93298b5b97441a21d9a50fdcd0ee9945e)
1.\" Copyright (c) 2003-2007 Joseph Koshy.  All rights reserved.
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" This software is provided by Joseph Koshy ``as is'' and
13.\" any express or implied warranties, including, but not limited to, the
14.\" implied warranties of merchantability and fitness for a particular purpose
15.\" are disclaimed.  in no event shall Joseph Koshy be liable
16.\" for any direct, indirect, incidental, special, exemplary, or consequential
17.\" damages (including, but not limited to, procurement of substitute goods
18.\" or services; loss of use, data, or profits; or business interruption)
19.\" however caused and on any theory of liability, whether in contract, strict
20.\" liability, or tort (including negligence or otherwise) arising in any way
21.\" out of the use of this software, even if advised of the possibility of
22.\" such damage.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd November 25, 2007
27.Os
28.Dt PMC 3
29.Sh NAME
30.Nm pmc
31.Nd library for accessing hardware performance monitoring counters
32.Sh LIBRARY
33.Lb libpmc
34.Sh SYNOPSIS
35.In pmc.h
36.Sh DESCRIPTION
37The
38.Lb libpmc
39provides a programming interface that allows applications to use
40hardware performance counters to gather performance data about
41specific processes or for the system as a whole.
42The library is implemented using the lower-level facilities offered by
43the
44.Xr hwpmc 4
45driver.
46.Ss Key Concepts
47Performance monitoring counters (PMCs) are represented by the library
48using a software abstraction.
49These
50.Dq abstract
51PMCs can have one two scopes:
52.Bl -bullet
53.It
54System scope.
55These PMCs measure events in a whole-system manner, i.e., independent
56of the currently executing thread.
57System scope PMCs are allocated on specific CPUs and do not
58migrate between CPUs.
59Non-privileged process are allowed to allocate system scope PMCs if the
60.Xr hwpmc 4
61sysctl tunable:
62.Va security.bsd.unprivileged_syspmcs
63is non-zero.
64.It
65Process scope.
66These PMCs only measure hardware events when the processes they are
67attached to are executing on a CPU.
68In an SMP system, process scope PMCs migrate between CPUs along with
69their target processes.
70.El
71.Pp
72Orthogonal to PMC scope, PMCs may be allocated in one of two
73operational modes:
74.Bl -bullet
75.It
76Counting PMCs measure events according to their scope
77(system or process).
78The application needs to explicitly read these counters
79to retrieve their value.
80.It
81Sampling PMCs cause the CPU to be periodically interrupted
82and information about its state of execution to be collected.
83Sampling PMCs are used to profile specific processes and kernel
84threads or to profile the system as a whole.
85.El
86.Pp
87The scope and operational mode for a software PMC are specified at
88PMC allocation time.
89An application is allowed to allocate multiple PMCs subject
90to availability of hardware resources.
91.Pp
92The library uses human-readable strings to name the event being
93measured by hardware.
94The syntax used for specifying a hardware event along with additional
95event specific qualifiers (if any) is described in detail in section
96.Sx "EVENT SPECIFIERS"
97below.
98.Pp
99PMCs are associated with the process that allocated them and
100will be automatically reclaimed by the system when the process exits.
101Additionally, process-scope PMCs have to be attached to one or more
102target processes before they can perform measurements.
103A process-scope PMC may be attached to those target processes
104that its owner process would otherwise be permitted to debug.
105An owner process may attach PMCs to itself allowing
106it to measure its own behavior.
107Additionally, on some machine architectures, such self-attached PMCs
108may be read cheaply using specialized instructions supported by the
109processor.
110.Pp
111Certain kinds of PMCs require that a log file be configured before
112they may be started.
113These include:
114.Bl -bullet -compact
115.It
116System scope sampling PMCs.
117.It
118Process scope sampling PMCs.
119.It
120Process scope counting PMCs that have been configured to report PMC
121readings on process context switches or process exits.
122.El
123Upto one log file may be configured per owner process.
124Events logged to a log file may be subsequently analyzed using the
125.Xr pmclog 3
126family of functions.
127.Ss Supported CPUs
128The CPUs known to the PMC library are named by the
129.Vt "enum pmc_cputype"
130enumeration.
131Supported CPUs include:
132.Bl -tag -width PMC_CPU_INTEL_PIII -compact
133.It PMC_CPU_AMD_K7
134.Tn "AMD Athlon"
135CPUs.
136.It PMC_CPU_AMD_K8
137.Tn "AMD Athlon64"
138CPUs.
139.It PMC_CPU_INTEL_P6
140.Tn Intel
141.Tn "Pentium Pro"
142CPUs.
143.It PMC_CPU_INTEL_PII
144.Tn "Intel Pentium II"
145CPUs.
146.It PMC_CPU_INTEL_PIII
147.Tn "Intel Pentium III"
148CPUs.
149.It PMC_CPU_INTEL_PM
150.Tn "Intel Pentium M"
151CPUs.
152.It PMC_CPU_INTEL_PIV
153.Tn "Intel Pentium 4"
154CPUs.
155.El
156.Ss Supported PMCs
157PMC supported by this library are named by the
158.Vt enum pmc_class
159enumeration.
160Supported PMC kinds include:
161.Bl -tag -width PMC_CLASS_TSC -compact
162.It PMC_CLASS_TSC
163The timestamp counter on i386 and amd64 architecture CPUs.
164.It PMC_CLASS_K7
165Programmable hardware counters present in
166.Tn "AMD Athlon"
167CPUs.
168.It PMC_CLASS_K8
169Programmable hardware counters present in
170.Tn "AMD Athlon64"
171CPUs.
172.It PMC_CLASS_P6
173Programmable hardware counters present in
174.Tn Intel
175.Tn "Pentium Pro" ,
176.Tn "Pentium II" ,
177.Tn "Pentium III" ,
178.Tn "Celeron" ,
179and
180.Tn "Pentium M"
181CPUs.
182.It PMC_CLASS_P4
183Programmable hardware counters present in
184.Tn "Intel Pentium 4"
185CPUs.
186.El
187.Ss PMC Capabilities
188.Pp
189Capabilities of performance monitoring hardware are denoted using
190the
191.Vt "enum pmc_caps"
192enumeration.
193Supported capabilities include:
194.Bl -tag -width "PMC_CAP_INTERRUPT" -compact
195.It PMC_CAP_EDGE
196The ability to count negated to asserted transitions of the hardware
197conditions being probed for.
198.It PMC_CAP_INTERRUPT
199The ability to interrupt the CPU.
200.It PMC_CAP_INVERT
201The ability to invert the sense of the hardware conditions being
202measured.
203.It PMC_CAP_READ
204PMC hardware allows the CPU to read performance counters.
205.It PMC_CAP_QUALIFIER
206The hardware allows monitored to be further qualified in some
207system dependent way.
208.It PMC_CAP_SYSTEM
209The ability to restrict counting of hardware events to when the CPU is
210running privileged code.
211.It PMC_CAP_THRESHOLD
212The ability to ignore simultaneous hardware events below a
213programmable threshold.
214.It PMC_CAP_USER
215The ability to restrict counting of hardware events to those when the
216CPU is running unprivileged code.
217.It PMC_CAP_WRITE
218PMC hardware allows CPUs write to counters.
219.El
220.Ss Functional Grouping
221This section contains a brief overview of the available functionality
222in the PMC library.
223Each function listed here is described further in its own manual page.
224.Bl -tag -width indent
225.It Administration
226.Bl -tag -compact
227.It Fn pmc_disable , Fn pmc_enable
228Administratively disable (enable) specific performance monitoring
229counter hardware.
230Counters that are disabled will not be available to applications to
231use.
232.El
233.It "Convenience Functions"
234.Bl -tag -compact
235.It Fn pmc_event_names_of_class
236Returns a list of event names supported by a given PMC type.
237.It Fn pmc_name_of_capability
238Convert a
239.Dv PMC_CAP_*
240flag to a human-readable string.
241.It Fn pmc_name_of_class
242Convert a
243.Dv PMC_CLASS_*
244constant to a human-readable string.
245.It Fn pmc_name_of_cputype
246Return a human-readable name for a CPU type.
247.It Fn pmc_name_of_disposition
248Return a human-readable string describing a PMC's disposition.
249.It Fn pmc_name_of_event
250Convert a numeric event code to a human-readable string.
251.It Fn pmc_name_of_mode
252Convert a
253.Dv PMC_MODE_*
254constant to a human-readable name.
255.It Fn pmc_name_of_state
256Return a human-readable string describing a PMC's current state.
257.El
258.It "Library Initialization"
259.Bl -tag -compact
260.It Fn pmc_init
261Initialize the library.
262This function must be called before any other library function.
263.El
264.It "Log File Handling"
265.Bl -tag -compact
266.It Fn pmc_configure_logfile
267Configure a log file for
268.Xr hwpmc 4
269to write logged events to.
270.It Fn pmc_flush_logfile
271Flush all pending log data in
272.Xr hwpmc 4 Ns Ap s
273buffers.
274.It Fn pmc_writelog
275Append arbitrary user data to the current log file.
276.El
277.It "PMC Management"
278.Bl -tag -compact
279.It Fn pmc_allocate , Fn pmc_release
280Allocate (free) a PMC.
281.It Fn pmc_attach , Fn pmc_detach
282Attach (detach) a process scope PMC to a target.
283.It Fn pmc_read , Fn pmc_write , Fn pmc_rw
284Read (write) a value from (to) a PMC.
285.It Fn pmc_start , Fn pmc_stop
286Start (stop) a software PMC.
287.It Fn pmc_set
288Set the reload value for a sampling PMC.
289.El
290.It "Queries"
291.Bl -tag -compact
292.It Fn pmc_capabilities
293Retrieve the capabilities for a given PMC.
294.It Fn pmc_cpuinfo
295Retrieve information about the CPUs and PMC hardware present in the
296system.
297.It Fn pmc_get_driver_stats
298Retrieve statistics maintained by
299.Xr hwpmc 4 .
300.It Fn pmc_ncpu
301Determine the number of CPUs in the system.
302.It Fn pmc_npmc
303Return the number of hardware PMCs present in a given CPU.
304.It Fn pmc_pmcinfo
305Return information about the state of a given CPU's PMCs.
306.It Fn pmc_width
307Determine the width of a hardware counter in bits.
308.El
309.It "x86 Architecture Specific API"
310.Bl -tag -compact
311.It Fn pmc_get_msr
312Returns the processor model specific register number
313associated with
314.Fa pmc .
315Applications may then use the x86
316.Ic RDPMC
317instruction to directly read the contents of the PMC.
318.El
319.El
320.Ss Signal Handling Requirements
321Applications using PMCs are required to handle the following signals:
322.Bl -tag -width ".Dv SIGBUS"
323.It Dv SIGBUS
324When the
325.Xr hwpmc 4
326module is unloaded using
327.Xr kldunload 8 ,
328processes that have PMCs allocated to them will be sent a
329.Dv SIGBUS
330signal.
331.It Dv SIGIO
332The
333.Xr hwpmc 4
334driver will send a PMC owning process a
335.Dv SIGIO
336signal if:
337.Bl -bullet
338.It
339If any process-mode PMC allocated by it loses all its
340target processes.
341.It
342If the driver encounters an error when writing log data to a
343configured log file.
344This error may be retrieved by a subsequent call to
345.Fn pmc_flush_logfile .
346.El
347.El
348.Ss Typical Program Flow
349.Bl -enum
350.It
351An application would first invoke function
352.Fn pmc_init
353to allow the library to initialize itself.
354.It
355Signal handling would then be set up.
356.It
357Next the application would allocate the PMCs it desires using function
358.Fn pmc_allocate .
359.It
360Initial values for PMCs may be set using function
361.Fn pmc_set .
362.It
363If a log file is necessary for the PMCs to work, it would
364be configured using function
365.Fn pmc_configure_logfile .
366.It
367Process scope PMCs would then be attached to their target processes
368using function
369.Fn pmc_attach .
370.It
371The PMCs would then be started using function
372.Fn pmc_start .
373.It
374Once started, the values of counting PMCs may be read using function
375.Fn pmc_start .
376For PMCs that write events to the log file, this logged data would be
377read and parsed using the
378.Xr pmclog 3
379family of functions.
380.It
381PMCs are stopped using function
382.Fn pmc_stop ,
383and process scope PMCs are detached from their targets using
384function
385.Fn pmc_detach .
386.It
387Before the process exits, its may release its PMCs using function
388.Fn pmc_release .
389Any configured log file may be closed using function
390.Fn pmc_configure_logfile .
391.El
392.Sh EVENT SPECIFIERS
393Event specifiers are strings comprising of an event name, followed by
394optional parameters modifying the semantics of the hardware event
395being probed.
396Event names are PMC architecture dependent, but the PMC library defines
397machine independent aliases for commonly used events.
398.Ss Event Name Aliases
399Event name aliases are CPU architecture independent names for commonly
400used events.
401The following aliases are known to this version of the
402.Nm pmc
403library:
404.Bl -tag -width indent
405.It Li branches
406Measure the number of branches retired.
407.It Li branch-mispredicts
408Measure the number of retired branches that were mispredicted.
409.It Li cycles
410Measure processor cycles.
411This event is implemented using the processor's Time Stamp Counter
412register.
413.It Li dc-misses
414Measure the number of data cache misses.
415.It Li ic-misses
416Measure the number of instruction cache misses.
417.It Li instructions
418Measure the number of instructions retired.
419.It Li interrupts
420Measure the number of interrupts seen.
421.It Li unhalted-cycles
422Measure the number of cycles the processor is not in a halted
423or sleep state.
424.El
425.Ss Time Stamp Counter (TSC)
426The timestamp counter is a monotonically non-decreasing counter that
427counts processor cycles.
428.Pp
429In the i386 architecture, this counter may
430be selected by requesting an event with event specifier
431.Dq Li tsc .
432The
433.Dq Li tsc
434event does not support any further qualifiers.
435It can only be allocated in system-wide counting mode,
436and is a read-only counter.
437Multiple processes are allowed to allocate the TSC.
438Once allocated, it may be read using the
439.Fn pmc_read
440function, or by using the RDTSC instruction.
441.Ss AMD (K7) PMCs
442These PMCs are present in the
443.Tn "AMD Athlon"
444series of CPUs and are documented in:
445.Rs
446.%B "AMD Athlon Processor x86 Code Optimization Guide"
447.%N "Publication No. 22007"
448.%D "February 2002"
449.%Q "Advanced Micro Devices, Inc."
450.Re
451.Pp
452Event specifiers for AMD K7 PMCs can have the following optional
453qualifiers:
454.Bl -tag -width indent
455.It Li count= Ns Ar value
456Configure the counter to increment only if the number of configured
457events measured in a cycle is greater than or equal to
458.Ar value .
459.It Li edge
460Configure the counter to only count negated-to-asserted transitions
461of the conditions expressed by the other qualifiers.
462In other words, the counter will increment only once whenever a given
463condition becomes true, irrespective of the number of clocks during
464which the condition remains true.
465.It Li inv
466Invert the sense of comparision when the
467.Dq Li count
468qualifier is present, making the counter to increment when the
469number of events per cycle is less than the value specified by
470the
471.Dq Li count
472qualifier.
473.It Li os
474Configure the PMC to count events happening at privilege level 0.
475.It Li unitmask= Ns Ar mask
476This qualifier is used to further qualify a select few events,
477.Dq Li k7-dc-refills-from-l2 ,
478.Dq Li k7-dc-refills-from-system
479and
480.Dq Li k7-dc-writebacks .
481Here
482.Ar mask
483is a string of the following characters optionally separated by
484.Ql +
485characters:
486.Pp
487.Bl -tag -width indent -compact
488.It Li m
489Count operations for lines in the
490.Dq Modified
491state.
492.It Li o
493Count operations for lines in the
494.Dq Owner
495state.
496.It Li e
497Count operations for lines in the
498.Dq Exclusive
499state.
500.It Li s
501Count operations for lines in the
502.Dq Shared
503state.
504.It Li i
505Count operations for lines in the
506.Dq Invalid
507state.
508.El
509.Pp
510If no
511.Dq Li unitmask
512qualifier is specified, the default is to count events for caches
513lines in any of the above states.
514.It Li usr
515Configure the PMC to count events occurring at privilege levels 1, 2
516or 3.
517.El
518.Pp
519If neither of the
520.Dq Li os
521or
522.Dq Li usr
523qualifiers were specified, the default is to enable both.
524.Pp
525The event specifiers supported on AMD K7 PMCs are:
526.Bl -tag -width indent
527.It Li k7-dc-accesses
528Count data cache accesses.
529.It Li k7-dc-misses
530Count data cache misses.
531.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask
532Count data cache refills from L2 cache.
533This event may be further qualified using the
534.Dq Li unitmask
535qualifier.
536.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask
537Count data cache refills from system memory.
538This event may be further qualified using the
539.Dq Li unitmask
540qualifier.
541.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask
542Count data cache writebacks.
543This event may be further qualified using the
544.Dq Li unitmask
545qualifier.
546.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits
547Count L1 DTLB misses and L2 DTLB hits.
548.It Li k7-l1-and-l2-dtlb-misses
549Count L1 and L2 DTLB misses.
550.It Li k7-misaligned-references
551Count misaligned data references.
552.It Li k7-ic-fetches
553Count instruction cache fetches.
554.It Li k7-ic-misses
555Count instruction cache misses.
556.It Li k7-l1-itlb-misses
557Count L1 ITLB misses that are L2 ITLB hits.
558.It Li k7-l1-l2-itlb-misses
559Count L1 (and L2) ITLB misses.
560.It Li k7-retired-instructions
561Count all retired instructions.
562.It Li k7-retired-ops
563Count retired ops.
564.It Li k7-retired-branches
565Count all retired branches (conditional, unconditional, exceptions
566and interrupts).
567.It Li k7-retired-branches-mispredicted
568Count all misprediced retired branches.
569.It Li k7-retired-taken-branches
570Count retired taken branches.
571.It Li k7-retired-taken-branches-mispredicted
572Count mispredicted taken branches that were retired.
573.It Li k7-retired-far-control-transfers
574Count retired far control transfers.
575.It Li k7-retired-resync-branches
576Count retired resync branches (non control transfer branches).
577.It Li k7-interrupts-masked-cycles
578Count the number of cycles when the processor's
579.Va IF
580flag was zero.
581.It Li k7-interrupts-masked-while-pending-cycles
582Count the number of cycles interrupts were masked while pending due
583to the processor's
584.Va IF
585flag being zero.
586.It Li k7-hardware-interrupts
587Count the number of taken hardware interrupts.
588.El
589.Ss AMD (K8) PMCs
590These PMCs are present in the
591.Tn "AMD Athlon64"
592and
593.Tn "AMD Opteron"
594series of CPUs.
595They are documented in:
596.Rs
597.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
598.%N "Publication No. 26094"
599.%D "April 2004"
600.%Q "Advanced Micro Devices, Inc."
601.Re
602.Pp
603Event specifiers for AMD K8 PMCs can have the following optional
604qualifiers:
605.Bl -tag -width indent
606.It Li count= Ns Ar value
607Configure the counter to increment only if the number of configured
608events measured in a cycle is greater than or equal to
609.Ar value .
610.It Li edge
611Configure the counter to only count negated-to-asserted transitions
612of the conditions expressed by the other fields.
613In other words, the counter will increment only once whenever a given
614condition becomes true, irrespective of the number of clocks during
615which the condition remains true.
616.It Li inv
617Invert the sense of comparision when the
618.Dq Li count
619qualifier is present, making the counter to increment when the
620number of events per cycle is less than the value specified by
621the
622.Dq Li count
623qualifier.
624.It Li mask= Ns Ar qualifier
625Many event specifiers for AMD K8 PMCs need to be additionally
626qualified using a mask qualifier.
627These additional qualifiers are event-specific and are documented
628along with their associated event specifiers below.
629.It Li os
630Configure the PMC to count events happening at privilege level 0.
631.It Li usr
632Configure the PMC to count events occurring at privilege levels 1, 2
633or 3.
634.El
635.Pp
636If neither of the
637.Dq Li os
638or
639.Dq Li usr
640qualifiers were specified, the default is to enable both.
641.Pp
642The event specifiers supported on AMD K8 PMCs are:
643.Bl -tag -width indent
644.It Li k8-bu-cpu-clk-unhalted
645Count the number of clock cycles when the CPU is not in the HLT or
646STPCLK states.
647.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
648Count fill requests that missed in the L2 cache.
649This event may be further qualified using
650.Ar qualifier ,
651which is a
652.Ql +
653separated set of the following keywords:
654.Pp
655.Bl -tag -width indent -compact
656.It Li dc-fill
657Count data cache fill requests.
658.It Li ic-fill
659Count instruction cache fill requests.
660.It Li tlb-reload
661Count TLB reloads.
662.El
663.Pp
664The default is to count all types of requests.
665.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
666Count internally generated requests to the L2 cache.
667This event may be further qualified using
668.Ar qualifier ,
669which is a
670.Ql +
671separated set of the following keywords:
672.Pp
673.Bl -tag -width indent -compact
674.It Li cancelled
675Count cancelled requests.
676.It Li dc-fill
677Count data cache fill requests.
678.It Li ic-fill
679Count instruction cache fill requests.
680.It Li tag-snoop
681Count tag snoop requests.
682.It Li tlb-reload
683Count TLB reloads.
684.El
685.Pp
686The default is to count all types of requests.
687.It Li k8-dc-access
688Count data cache accesses including microcode scratchpad accesses.
689.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
690Count data cache copyback operations.
691This event may be further qualified using
692.Ar qualifier ,
693which is a
694.Ql +
695separated set of the following keywords:
696.Pp
697.Bl -tag -width indent -compact
698.It Li exclusive
699Count operations for lines in the
700.Dq exclusive
701state.
702.It Li invalid
703Count operations for lines in the
704.Dq invalid
705state.
706.It Li modified
707Count operations for lines in the
708.Dq modified
709state.
710.It Li owner
711Count operations for lines in the
712.Dq owner
713state.
714.It Li shared
715Count operations for lines in the
716.Dq shared
717state.
718.El
719.Pp
720The default is to count operations for lines in all the
721above states.
722.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
723Count data cache accesses by lock instructions.
724This event is only available on processors of revision C or later
725vintage.
726This event may be further qualified using
727.Ar qualifier ,
728which is a
729.Ql +
730separated set of the following keywords:
731.Pp
732.Bl -tag -width indent -compact
733.It Li accesses
734Count data cache accesses by lock instructions.
735.It Li misses
736Count data cache misses by lock instructions.
737.El
738.Pp
739The default is to count all accesses.
740.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
741Count the number of dispatched prefetch instructions.
742This event may be further qualified using
743.Ar qualifier ,
744which is a
745.Ql +
746separated set of the following keywords:
747.Pp
748.Bl -tag -width indent -compact
749.It Li load
750Count load operations.
751.It Li nta
752Count non-temporal operations.
753.It Li store
754Count store operations.
755.El
756.Pp
757The default is to count all operations.
758.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
759Count L1 DTLB misses that are L2 DTLB hits.
760.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
761Count L1 DTLB misses that are also misses in the L2 DTLB.
762.It Li k8-dc-microarchitectural-early-cancel-of-an-access
763Count microarchitectural early cancels of data cache accesses.
764.It Li k8-dc-microarchitectural-late-cancel-of-an-access
765Count microarchitectural late cancels of data cache accesses.
766.It Li k8-dc-misaligned-data-reference
767Count misaligned data references.
768.It Li k8-dc-miss
769Count data cache misses.
770.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
771Count one bit ECC errors found by the scrubber.
772This event may be further qualified using
773.Ar qualifier ,
774which is a
775.Ql +
776separated set of the following keywords:
777.Pp
778.Bl -tag -width indent -compact
779.It Li scrubber
780Count scrubber detected errors.
781.It Li piggyback
782Count piggyback scrubber errors.
783.El
784.Pp
785The default is to count both kinds of errors.
786.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
787Count data cache refills from L2 cache.
788This event may be further qualified using
789.Ar qualifier ,
790which is a
791.Ql +
792separated set of the following keywords:
793.Pp
794.Bl -tag -width indent -compact
795.It Li exclusive
796Count operations for lines in the
797.Dq exclusive
798state.
799.It Li invalid
800Count operations for lines in the
801.Dq invalid
802state.
803.It Li modified
804Count operations for lines in the
805.Dq modified
806state.
807.It Li owner
808Count operations for lines in the
809.Dq owner
810state.
811.It Li shared
812Count operations for lines in the
813.Dq shared
814state.
815.El
816.Pp
817The default is to count operations for lines in all the
818above states.
819.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
820Count data cache refills from system memory.
821This event may be further qualified using
822.Ar qualifier ,
823which is a
824.Ql +
825separated set of the following keywords:
826.Pp
827.Bl -tag -width indent -compact
828.It Li exclusive
829Count operations for lines in the
830.Dq exclusive
831state.
832.It Li invalid
833Count operations for lines in the
834.Dq invalid
835state.
836.It Li modified
837Count operations for lines in the
838.Dq modified
839state.
840.It Li owner
841Count operations for lines in the
842.Dq owner
843state.
844.It Li shared
845Count operations for lines in the
846.Dq shared
847state.
848.El
849.Pp
850The default is to count operations for lines in all the
851above states.
852.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
853Count the number of dispatched FPU ops.
854This event is supported in revision B and later CPUs.
855This event may be further qualified using
856.Ar qualifier ,
857which is a
858.Ql +
859separated set of the following keywords:
860.Pp
861.Bl -tag -width indent -compact
862.It Li add-pipe-excluding-junk-ops
863Count add pipe ops excluding junk ops.
864.It Li add-pipe-junk-ops
865Count junk ops in the add pipe.
866.It Li multiply-pipe-excluding-junk-ops
867Count multiply pipe ops excluding junk ops.
868.It Li multiply-pipe-junk-ops
869Count junk ops in the multiply pipe.
870.It Li store-pipe-excluding-junk-ops
871Count store pipe ops excluding junk ops
872.It Li store-pipe-junk-ops
873Count junk ops in the store pipe.
874.El
875.Pp
876The default is to count all types of ops.
877.It Li k8-fp-cycles-with-no-fpu-ops-retired
878Count cycles when no FPU ops were retired.
879This event is supported in revision B and later CPUs.
880.It Li k8-fp-dispatched-fpu-fast-flag-ops
881Count dispatched FPU ops that use the fast flag interface.
882This event is supported in revision B and later CPUs.
883.It Li k8-fr-decoder-empty
884Count cycles when there was nothing to dispatch (i.e., the decoder
885was empty).
886.It Li k8-fr-dispatch-stalls
887Count all dispatch stalls.
888.It Li k8-fr-dispatch-stall-for-segment-load
889Count dispatch stalls for segment loads.
890.It Li k8-fr-dispatch-stall-for-serialization
891Count dispatch stalls for serialization.
892.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
893Count dispatch stalls from branch abort to retiral.
894.It Li k8-fr-dispatch-stall-when-fpu-is-full
895Count dispatch stalls when the FPU is full.
896.It Li k8-fr-dispatch-stall-when-ls-is-full
897Count dispatch stalls when the load/store unit is full.
898.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
899Count dispatch stalls when the reorder buffer is full.
900.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
901Count dispatch stalls when reservation stations are full.
902.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
903Count dispatch stalls when waiting for all to be quiet.
904.\" XXX What does "waiting for all to be quiet" mean?
905.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
906Count dispatch stalls when a far control transfer or a resync branch
907is pending.
908.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
909Count FPU exceptions.
910This event is supported in revision B and later CPUs.
911This event may be further qualified using
912.Ar qualifier ,
913which is a
914.Ql +
915separated set of the following keywords:
916.Pp
917.Bl -tag -width indent -compact
918.It Li sse-and-x87-microtraps
919Count SSE and x87 microtraps.
920.It Li sse-reclass-microfaults
921Count SSE reclass microfaults
922.It Li sse-retype-microfaults
923Count SSE retype microfaults
924.It Li x87-reclass-microfaults
925Count x87 reclass microfaults.
926.El
927.Pp
928The default is to count all types of exceptions.
929.It Li k8-fr-interrupts-masked-cycles
930Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
931.It Li k8-fr-interrupts-masked-while-pending-cycles
932Count cycles while interrupts were masked while pending (i.e., cycles
933when INTR was asserted while CPU RFLAGS field IF was zero).
934.It Li k8-fr-number-of-breakpoints-for-dr0
935Count the number of breakpoints for DR0.
936.It Li k8-fr-number-of-breakpoints-for-dr1
937Count the number of breakpoints for DR1.
938.It Li k8-fr-number-of-breakpoints-for-dr2
939Count the number of breakpoints for DR2.
940.It Li k8-fr-number-of-breakpoints-for-dr3
941Count the number of breakpoints for DR3.
942.It Li k8-fr-retired-branches
943Count retired branches including exceptions and interrupts.
944.It Li k8-fr-retired-branches-mispredicted
945Count mispredicted retired branches.
946.It Li k8-fr-retired-far-control-transfers
947Count retired far control transfers (which are always mispredicted).
948.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
949Count retired fastpath double op instructions.
950This event is supported in revision B and later CPUs.
951This event may be further qualified using
952.Ar qualifier ,
953which is a
954.Ql +
955separated set of the following keywords:
956.Pp
957.Bl -tag -width indent -compact
958.It Li low-op-pos-0
959Count instructions with the low op in position 0.
960.It Li low-op-pos-1
961Count instructions with the low op in position 1.
962.It Li low-op-pos-2
963Count instructions with the low op in position 2.
964.El
965.Pp
966The default is to count all types of instructions.
967.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
968Count retired FPU instructions.
969This event is supported in revision B and later CPUs.
970This event may be further qualified using
971.Ar qualifier ,
972which is a
973.Ql +
974separated set of the following keywords:
975.Pp
976.Bl -tag -width indent -compact
977.It Li mmx-3dnow
978Count MMX and 3DNow!\& instructions.
979.It Li packed-sse-sse2
980Count packed SSE and SSE2 instructions.
981.It Li scalar-sse-sse2
982Count scalar SSE and SSE2 instructions
983.It Li x87
984Count x87 instructions.
985.El
986.Pp
987The default is to count all types of instructions.
988.It Li k8-fr-retired-near-returns
989Count retired near returns.
990.It Li k8-fr-retired-near-returns-mispredicted
991Count mispredicted near returns.
992.It Li k8-fr-retired-resyncs
993Count retired resyncs (non-control transfer branches).
994.It Li k8-fr-retired-taken-hardware-interrupts
995Count retired taken hardware interrupts.
996.It Li k8-fr-retired-taken-branches
997Count retired taken branches.
998.It Li k8-fr-retired-taken-branches-mispredicted
999Count retired taken branches that were mispredicted.
1000.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
1001Count retired taken branches that were mispredicted only due to an
1002address miscompare.
1003.It Li k8-fr-retired-uops
1004Count retired uops.
1005.It Li k8-fr-retired-x86-instructions
1006Count retired x86 instructions including exceptions and interrupts.
1007.It Li k8-ic-fetch
1008Count instruction cache fetches.
1009.It Li k8-ic-instruction-fetch-stall
1010Count cycles in stalls due to instruction fetch.
1011.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
1012Count L1 ITLB misses that are L2 ITLB hits.
1013.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
1014Count ITLB misses that miss in both L1 and L2 ITLBs.
1015.It Li k8-ic-microarchitectural-resync-by-snoop
1016Count microarchitectural resyncs caused by snoops.
1017.It Li k8-ic-miss
1018Count instruction cache misses.
1019.It Li k8-ic-refill-from-l2
1020Count instruction cache refills from L2 cache.
1021.It Li k8-ic-refill-from-system
1022Count instruction cache refills from system memory.
1023.It Li k8-ic-return-stack-hits
1024Count hits to the return stack.
1025.It Li k8-ic-return-stack-overflow
1026Count overflows of the return stack.
1027.It Li k8-ls-buffer2-full
1028Count load/store buffer2 full events.
1029.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
1030Count locked operations.
1031For revision C and later CPUs, the following qualifiers are supported:
1032.Pp
1033.Bl -tag -width indent -compact
1034.It Li cycles-in-request
1035Count the number of cycles in the lock request/grant stage.
1036.It Li cycles-to-complete
1037Count the number of cycles a lock takes to complete once it is
1038non-speculative and is the older load/store operation.
1039.It Li locked-instructions
1040Count the number of lock instructions executed.
1041.El
1042.Pp
1043The default is to count the number of lock instructions executed.
1044.It Li k8-ls-microarchitectural-late-cancel
1045Count microarchitectural late cancels of operations in the load/store
1046unit.
1047.It Li k8-ls-microarchitectural-resync-by-self-modifying-code
1048Count microarchitectural resyncs caused by self-modifying code.
1049.It Li k8-ls-microarchitectural-resync-by-snoop
1050Count microarchitectural resyncs caused by snoops.
1051.It Li k8-ls-retired-cflush-instructions
1052Count retired CFLUSH instructions.
1053.It Li k8-ls-retired-cpuid-instructions
1054Count retired CPUID instructions.
1055.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
1056Count segment register loads.
1057This event may be further qualified using
1058.Ar qualifier ,
1059which is a
1060.Ql +
1061separated set of the following keywords:
1062.Bl -tag -width indent -compact
1063.It Li cs
1064Count CS register loads.
1065.It Li ds
1066Count DS register loads.
1067.It Li es
1068Count ES register loads.
1069.It Li fs
1070Count FS register loads.
1071.It Li gs
1072Count GS register loads.
1073.\" .It Li hs
1074.\" Count HS register loads.
1075.\" XXX "HS" register?
1076.It Li ss
1077Count SS register loads.
1078.El
1079.Pp
1080The default is to count all types of loads.
1081.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
1082Count memory controller bypass counter saturation events.
1083This event may be further qualified using
1084.Ar qualifier ,
1085which is a
1086.Ql +
1087separated set of the following keywords:
1088.Pp
1089.Bl -tag -width indent -compact
1090.It Li dram-controller-interface-bypass
1091Count DRAM controller interface bypass.
1092.It Li dram-controller-queue-bypass
1093Count DRAM controller queue bypass.
1094.It Li memory-controller-hi-pri-bypass
1095Count memory controller high priority bypasses.
1096.It Li memory-controller-lo-pri-bypass
1097Count memory controller low priority bypasses.
1098.El
1099.Pp
1100.It Li k8-nb-memory-controller-dram-slots-missed
1101Count memory controller DRAM command slots missed (in MemClks).
1102.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
1103Count memory controller page access events.
1104This event may be further qualified using
1105.Ar qualifier ,
1106which is a
1107.Ql +
1108separated set of the following keywords:
1109.Pp
1110.Bl -tag -width indent -compact
1111.It Li page-conflict
1112Count page conflicts.
1113.It Li page-hit
1114Count page hits.
1115.It Li page-miss
1116Count page misses.
1117.El
1118.Pp
1119The default is to count all types of events.
1120.It Li k8-nb-memory-controller-page-table-overflow
1121Count memory control page table overflow events.
1122.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
1123Count probe events.
1124This event may be further qualified using
1125.Ar qualifier ,
1126which is a
1127.Ql +
1128separated set of the following keywords:
1129.Pp
1130.Bl -tag -width indent -compact
1131.It Li probe-hit
1132Count all probe hits.
1133.It Li probe-hit-dirty-no-memory-cancel
1134Count probe hits without memory cancels.
1135.It Li probe-hit-dirty-with-memory-cancel
1136Count probe hits with memory cancels.
1137.It Li probe-miss
1138Count probe misses.
1139.El
1140.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
1141Count sized commands issued.
1142This event may be further qualified using
1143.Ar qualifier ,
1144which is a
1145.Ql +
1146separated set of the following keywords:
1147.Pp
1148.Bl -tag -width indent -compact
1149.It Li nonpostwrszbyte
1150.It Li nonpostwrszdword
1151.It Li postwrszbyte
1152.It Li postwrszdword
1153.It Li rdszbyte
1154.It Li rdszdword
1155.It Li rdmodwr
1156.El
1157.Pp
1158The default is to count all types of commands.
1159.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
1160Count memory control turnaround events.
1161This event may be further qualified using
1162.Ar qualifier ,
1163which is a
1164.Ql +
1165separated set of the following keywords:
1166.Pp
1167.Bl -tag -width indent -compact
1168.\" XXX doc is unclear whether these are cycle counts or event counts
1169.It Li dimm-turnaround
1170Count DIMM turnarounds.
1171.It Li read-to-write-turnaround
1172Count read to write turnarounds.
1173.It Li write-to-read-turnaround
1174Count write to read turnarounds.
1175.El
1176.Pp
1177The default is to count all types of events.
1178.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
1179.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
1180.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
1181Count events on the HyperTransport(tm) buses.
1182These events may be further qualified using
1183.Ar qualifier ,
1184which is a
1185.Ql +
1186separated set of the following keywords:
1187.Pp
1188.Bl -tag -width indent -compact
1189.It Li buffer-release
1190Count buffer release messages sent.
1191.It Li command
1192Count command messages sent.
1193.It Li data
1194Count data messages sent.
1195.It Li nop
1196Count nop messages sent.
1197.El
1198.Pp
1199The default is to count all types of messages.
1200.El
1201.Ss Intel P6 PMCS
1202Intel P6 PMCs are present in Intel
1203.Tn "Pentium Pro" ,
1204.Tn "Pentium II" ,
1205.Tn Celeron ,
1206.Tn "Pentium III"
1207and
1208.Tn "Pentium M"
1209processors.
1210.Pp
1211These CPUs have two counters.
1212Some events may only be used on specific counters and some events are
1213defined only on specific processor models.
1214.Pp
1215These PMCs are documented in
1216.Rs
1217.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
1218.%T "Volume 3: System Programming Guide"
1219.%N "Order Number 245472-012"
1220.%D 2003
1221.%Q "Intel Corporation"
1222.Re
1223.Pp
1224Some of these events are affected by processor errata described in
1225.Rs
1226.%B "Intel(R) Pentium(R) III Processor Specification Update"
1227.%N "Document Number: 244453-054"
1228.%D "April 2005"
1229.%Q "Intel Corporation"
1230.Re
1231.Pp
1232Event specifiers for Intel P6 PMCs can have the following common
1233qualifiers:
1234.Bl -tag -width indent
1235.It Li cmask= Ns Ar value
1236Configure the PMC to increment only if the number of configured
1237events measured in a cycle is greater than or equal to
1238.Ar value .
1239.It Li edge
1240Configure the PMC to count the number of deasserted to asserted
1241transitions of the conditions expressed by the other qualifiers.
1242If specified, the counter will increment only once whenever a
1243condition becomes true, irrespective of the number of clocks during
1244which the condition remains true.
1245.It Li inv
1246Invert the sense of comparision when the
1247.Dq Li cmask
1248qualifier is present, making the counter increment when the number of
1249events per cycle is less than the value specified by the
1250.Dq Li cmask
1251qualifier.
1252.It Li os
1253Configure the PMC to count events happening at processor privilege
1254level 0.
1255.It Li umask= Ns Ar value
1256This qualifier is used to further qualify the event selected (see
1257below).
1258.It Li usr
1259Configure the PMC to count events occurring at privilege levels 1, 2
1260or 3.
1261.El
1262.Pp
1263If neither of the
1264.Dq Li os
1265or
1266.Dq Li usr
1267qualifiers are specified, the default is to enable both.
1268.Pp
1269The event specifiers supported by Intel P6 PMCs are:
1270.Bl -tag -width indent
1271.It Li p6-baclears
1272Count the number of times a static branch prediction was made by the
1273branch decoder because the BTB did not have a prediction.
1274.It Li p6-br-bac-missp-exec
1275.Pq Tn "Pentium M"
1276Count the number of branch instructions executed that where
1277mispredicted at the Front End (BAC).
1278.It Li p6-br-bogus
1279Count the number of bogus branches.
1280.It Li p6-br-call-exec
1281.Pq Tn "Pentium M"
1282Count the number of call instructions executed.
1283.It Li p6-br-call-missp-exec
1284.Pq Tn "Pentium M"
1285Count the number of call instructions executed that were mispredicted.
1286.It Li p6-br-cnd-exec
1287.Pq Tn "Pentium M"
1288Count the number of conditional branch instructions executed.
1289.It Li p6-br-cnd-missp-exec
1290.Pq Tn "Pentium M"
1291Count the number of conditional branch instructions executed that were
1292mispredicted.
1293.It Li p6-br-ind-call-exec
1294.Pq Tn "Pentium M"
1295Count the number of indirect call instructions executed.
1296.It Li p6-br-ind-exec
1297.Pq Tn "Pentium M"
1298Count the number of indirect branch instructions executed.
1299.It Li p6-br-ind-missp-exec
1300.Pq Tn "Pentium M"
1301Count the number of indirect branch instructions executed that were
1302mispredicted.
1303.It Li p6-br-inst-decoded
1304Count the number of branch instructions decoded.
1305.It Li p6-br-inst-exec
1306.Pq Tn "Pentium M"
1307Count the number of branch instructions executed but necessarily retired.
1308.It Li p6-br-inst-retired
1309Count the number of branch instructions retired.
1310.It Li p6-br-miss-pred-retired
1311Count the number of mispredicted branch instructions retired.
1312.It Li p6-br-miss-pred-taken-ret
1313Count the number of taken mispredicted branches retired.
1314.It Li p6-br-missp-exec
1315.Pq Tn "Pentium M"
1316Count the number of branch instructions executed that were
1317mispredicted at execution.
1318.It Li p6-br-ret-bac-missp-exec
1319.Pq Tn "Pentium M"
1320Count the number of return instructions executed that were
1321mispredicted at the Front End (BAC).
1322.It Li p6-br-ret-exec
1323.Pq Tn "Pentium M"
1324Count the number of return instructions executed.
1325.It Li p6-br-ret-missp-exec
1326.Pq Tn "Pentium M"
1327Count the number of return instructions executed that were
1328mispredicted at execution.
1329.It Li p6-br-taken-retired
1330Count the number of taken branches retired.
1331.It Li p6-btb-misses
1332Count the number of branches for which the BTB did not produce a
1333prediction.
1334.It Li p6-bus-bnr-drv
1335Count the number of bus clock cycles during which this processor is
1336driving the BNR# pin.
1337.It Li p6-bus-data-rcv
1338Count the number of bus clock cycles during which this processor is
1339receiving data.
1340.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier
1341Count the number of clocks during which DRDY# is asserted.
1342An additional qualifier may be specified, and comprises one of the
1343following keywords:
1344.Pp
1345.Bl -tag -width indent -compact
1346.It Li any
1347Count transactions generated by any agent on the bus.
1348.It Li self
1349Count transactions generated by this processor.
1350.El
1351.Pp
1352The default is to count operations generated by this processor.
1353.It Li p6-bus-hit-drv
1354Count the number of bus clock cycles during which this processor is
1355driving the HIT# pin.
1356.It Li p6-bus-hitm-drv
1357Count the number of bus clock cycles during which this processor is
1358driving the HITM# pin.
1359.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier
1360Count the number of clocks during with LOCK# is asserted on the
1361external system bus.
1362An additional qualifier may be specified and comprises one of the following
1363keywords:
1364.Pp
1365.Bl -tag -width indent -compact
1366.It Li any
1367Count transactions generated by any agent on the bus.
1368.It Li self
1369Count transactions generated by this processor.
1370.El
1371.Pp
1372The default is to count operations generated by this processor.
1373.It Li p6-bus-req-outstanding
1374Count the number of bus requests outstanding in any given cycle.
1375.It Li p6-bus-snoop-stall
1376Count the number of clock cycles during which the bus is snoop stalled.
1377.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier
1378Count the number of completed bus transactions of any kind.
1379An additional qualifier may be specified and comprises one of the following
1380keywords:
1381.Pp
1382.Bl -tag -width indent -compact
1383.It Li any
1384Count transactions generated by any agent on the bus.
1385.It Li self
1386Count transactions generated by this processor.
1387.El
1388.Pp
1389The default is to count operations generated by this processor.
1390.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier
1391Count the number of burst read transactions.
1392An additional qualifier may be specified and comprises one of the following
1393keywords:
1394.Pp
1395.Bl -tag -width indent -compact
1396.It Li any
1397Count transactions generated by any agent on the bus.
1398.It Li self
1399Count transactions generated by this processor.
1400.El
1401.Pp
1402The default is to count operations generated by this processor.
1403.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier
1404Count the number of completed burst transactions.
1405An additional qualifier may be specified and comprises one of the following
1406keywords:
1407.Pp
1408.Bl -tag -width indent -compact
1409.It Li any
1410Count transactions generated by any agent on the bus.
1411.It Li self
1412Count transactions generated by this processor.
1413.El
1414.Pp
1415The default is to count operations generated by this processor.
1416.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier
1417Count the number of completed deferred transactions.
1418An additional qualifier may be specified and comprises one of the following
1419keywords:
1420.Pp
1421.Bl -tag -width indent -compact
1422.It Li any
1423Count transactions generated by any agent on the bus.
1424.It Li self
1425Count transactions generated by this processor.
1426.El
1427.Pp
1428The default is to count operations generated by this processor.
1429.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier
1430Count the number of completed instruction fetch transactions.
1431An additional qualifier may be specified and comprises one of the following
1432keywords:
1433.Pp
1434.Bl -tag -width indent -compact
1435.It Li any
1436Count transactions generated by any agent on the bus.
1437.It Li self
1438Count transactions generated by this processor.
1439.El
1440.Pp
1441The default is to count operations generated by this processor.
1442.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier
1443Count the number of completed invalidate transactions.
1444An additional qualifier may be specified and comprises one of the following
1445keywords:
1446.Pp
1447.Bl -tag -width indent -compact
1448.It Li any
1449Count transactions generated by any agent on the bus.
1450.It Li self
1451Count transactions generated by this processor.
1452.El
1453.Pp
1454The default is to count operations generated by this processor.
1455.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier
1456Count the number of completed memory transactions.
1457An additional qualifier may be specified and comprises one of the following
1458keywords:
1459.Pp
1460.Bl -tag -width indent -compact
1461.It Li any
1462Count transactions generated by any agent on the bus.
1463.It Li self
1464Count transactions generated by this processor.
1465.El
1466.Pp
1467The default is to count operations generated by this processor.
1468.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier
1469Count the number of completed partial write transactions.
1470An additional qualifier may be specified and comprises one of the following
1471keywords:
1472.Pp
1473.Bl -tag -width indent -compact
1474.It Li any
1475Count transactions generated by any agent on the bus.
1476.It Li self
1477Count transactions generated by this processor.
1478.El
1479.Pp
1480The default is to count operations generated by this processor.
1481.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier
1482Count the number of completed read-for-ownership transactions.
1483An additional qualifier may be specified and comprises one of the following
1484keywords:
1485.Pp
1486.Bl -tag -width indent -compact
1487.It Li any
1488Count transactions generated by any agent on the bus.
1489.It Li self
1490Count transactions generated by this processor.
1491.El
1492.Pp
1493The default is to count operations generated by this processor.
1494.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier
1495Count the number of completed I/O transactions.
1496An additional qualifier may be specified and comprises one of the following
1497keywords:
1498.Pp
1499.Bl -tag -width indent -compact
1500.It Li any
1501Count transactions generated by any agent on the bus.
1502.It Li self
1503Count transactions generated by this processor.
1504.El
1505.Pp
1506The default is to count operations generated by this processor.
1507.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier
1508Count the number of completed partial transactions.
1509An additional qualifier may be specified and comprises one of the following
1510keywords:
1511.Pp
1512.Bl -tag -width indent -compact
1513.It Li any
1514Count transactions generated by any agent on the bus.
1515.It Li self
1516Count transactions generated by this processor.
1517.El
1518.Pp
1519The default is to count operations generated by this processor.
1520.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier
1521Count the number of completed write-back transactions.
1522An additional qualifier may be specified and comprises one of the following
1523keywords:
1524.Pp
1525.Bl -tag -width indent -compact
1526.It Li any
1527Count transactions generated by any agent on the bus.
1528.It Li self
1529Count transactions generated by this processor.
1530.El
1531.Pp
1532The default is to count operations generated by this processor.
1533.It Li p6-cpu-clk-unhalted
1534Count the number of cycles during with the processor was not halted.
1535.Pp
1536.Pq Tn "Pentium M"
1537Count the number of cycles during with the processor was not halted
1538and not in a thermal trip.
1539.It Li p6-cycles-div-busy
1540Count the number of cycles during which the divider is busy and cannot
1541accept new divides.
1542This event is only allocated on counter 0.
1543.It Li p6-cycles-in-pending-and-masked
1544Count the number of processor cycles for which interrupts were
1545disabled and interrupts were pending.
1546.It Li p6-cycles-int-masked
1547Count the number of processor cycles for which interrupts were
1548disabled.
1549.It Li p6-data-mem-refs
1550Count all loads and all stores using any memory type, including
1551internal retries.
1552Each part of a split store is counted separately.
1553.It Li p6-dcu-lines-in
1554Count the total lines allocated in the data cache unit.
1555.It Li p6-dcu-m-lines-in
1556Count the number of M state lines allocated in the data cache unit.
1557.It Li p6-dcu-m-lines-out
1558Count the number of M state lines evicted from the data cache unit.
1559.It Li p6-dcu-miss-outstanding
1560Count the weighted number of cycles while a data cache unit miss is
1561outstanding, incremented by the number of outstanding cache misses at
1562any time.
1563.It Li p6-div
1564Count the number of integer and floating-point divides including
1565speculative divides.
1566This event is only allocated on counter 1.
1567.It Li p6-emon-esp-uops
1568.Pq Tn "Pentium M"
1569Count the total number of micro-ops.
1570.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier
1571.Pq Tn "Pentium M"
1572Count the number of
1573.Tn "Enhanced Intel SpeedStep"
1574transitions.
1575An additional qualifier may be specified, and can be one of the
1576following keywords:
1577.Pp
1578.Bl -tag -width indent -compact
1579.It Li all
1580Count all transitions.
1581.It Li freq
1582Count only frequency transitions.
1583.El
1584.Pp
1585The default is to count all transitions.
1586.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier
1587.Pq Tn "Pentium M"
1588Count the number of retired fused micro-ops.
1589An additional qualifier may be specified, and may be one of the
1590following keywords:
1591.Pp
1592.Bl -tag -width indent -compact
1593.It Li all
1594Count all fused micro-ops.
1595.It Li loadop
1596Count only load and op micro-ops.
1597.It Li stdsta
1598Count only STD/STA micro-ops.
1599.El
1600.Pp
1601The default is to count all fused micro-ops.
1602.It Li p6-emon-kni-comp-inst-ret
1603.Pq Tn "Pentium III"
1604Count the number of SSE computational instructions retired.
1605An additional qualifier may be specified, and comprises one of the
1606following keywords:
1607.Pp
1608.Bl -tag -width indent -compact
1609.It Li packed-and-scalar
1610Count packed and scalar operations.
1611.It Li scalar
1612Count scalar operations only.
1613.El
1614.Pp
1615The default is to count packed and scalar operations.
1616.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier
1617.Pq Tn "Pentium III"
1618Count the number of SSE instructions retired.
1619An additional qualifier may be specified, and comprises one of the
1620following keywords:
1621.Pp
1622.Bl -tag -width indent -compact
1623.It Li packed-and-scalar
1624Count packed and scalar operations.
1625.It Li scalar
1626Count scalar operations only.
1627.El
1628.Pp
1629The default is to count packed and scalar operations.
1630.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier
1631.Pq Tn "Pentium III"
1632Count the number of SSE prefetch or weakly ordered instructions
1633dispatched (including speculative prefetches).
1634An additional qualifier may be specified, and comprises one of the
1635following keywords:
1636.Pp
1637.Bl -tag -width indent -compact
1638.It Li nta
1639Count non-temporal prefetches.
1640.It Li t1
1641Count prefetches to L1.
1642.It Li t2
1643Count prefetches to L2.
1644.It Li wos
1645Count weakly ordered stores.
1646.El
1647.Pp
1648The default is to count non-temporal prefetches.
1649.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier
1650.Pq Tn "Pentium III"
1651Count the number of prefetch or weakly ordered instructions that miss
1652all caches.
1653An additional qualifier may be specified, and comprises one of the
1654following keywords:
1655.Pp
1656.Bl -tag -width indent -compact
1657.It Li nta
1658Count non-temporal prefetches.
1659.It Li t1
1660Count prefetches to L1.
1661.It Li t2
1662Count prefetches to L2.
1663.It Li wos
1664Count weakly ordered stores.
1665.El
1666.Pp
1667The default is to count non-temporal prefetches.
1668.It Li p6-emon-pref-rqsts-dn
1669.Pq Tn "Pentium M"
1670Count the number of downward prefetches issued.
1671.It Li p6-emon-pref-rqsts-up
1672.Pq Tn "Pentium M"
1673Count the number of upward prefetches issued.
1674.It Li p6-emon-simd-instr-retired
1675.Pq Tn "Pentium M"
1676Count the number of retired
1677.Tn MMX
1678instructions.
1679.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier
1680.Pq Tn "Pentium M"
1681Count the number of computational SSE instructions retired.
1682An additional qualifier may be specified and can be one of the
1683following keywords:
1684.Pp
1685.Bl -tag -width indent -compact
1686.It Li sse-packed-single
1687Count SSE packed-single instructions.
1688.It Li sse-scalar-single
1689Count SSE scalar-single instructions.
1690.It Li sse2-packed-double
1691Count SSE2 packed-double instructions.
1692.It Li sse2-scalar-double
1693Count SSE2 scalar-double instructions.
1694.El
1695.Pp
1696The default is to count SSE packed-single instructions.
1697.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer
1698.Pp
1699.Pq Tn "Pentium M"
1700Count the number of SSE instructions retired.
1701An additional qualifier can be specified, and can be one of the
1702following keywords:
1703.Pp
1704.Bl -tag -width indent -compact
1705.It Li sse-packed-single
1706Count SSE packed-single instructions.
1707.It Li sse-packed-single-scalar-single
1708Count SSE packed-single and scalar-single instructions.
1709.It Li sse2-packed-double
1710Count SSE2 packed-double instructions.
1711.It Li sse2-scalar-double
1712Count SSE2 scalar-double instructions.
1713.El
1714.Pp
1715The default is to count SSE packed-single instructions.
1716.It Li p6-emon-synch-uops
1717.Pq Tn "Pentium M"
1718Count the number of sync micro-ops.
1719.It Li p6-emon-thermal-trip
1720.Pq Tn "Pentium M"
1721Count the duration or occurrences of thermal trips.
1722Use the
1723.Dq Li edge
1724qualifier to count occurrences of thermal trips.
1725.It Li p6-emon-unfusion
1726.Pq Tn "Pentium M"
1727Count the number of unfusion events in the reorder buffer.
1728.It Li p6-flops
1729Count the number of computational floating point operations retired.
1730This event is only allocated on counter 0.
1731.It Li p6-fp-assist
1732Count the number of floating point exceptions handled by microcode.
1733This event is only allocated on counter 1.
1734.It Li p6-fp-comps-ops-exe
1735Count the number of computation floating point operations executed.
1736This event is only allocated on counter 0.
1737.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier
1738.Pq Tn "Pentium II" , Tn "Pentium III"
1739Count the number of transitions between MMX and floating-point
1740instructions.
1741An additional qualifier may be specified, and comprises one of the
1742following keywords:
1743.Pp
1744.Bl -tag -width indent -compact
1745.It Li mmxtofp
1746Count transitions from MMX instructions to floating-point instructions.
1747.It Li fptommx
1748Count transitions from floating-point instructions to MMX instructions.
1749.El
1750.Pp
1751The default is to count MMX to floating-point transitions.
1752.It Li p6-hw-int-rx
1753Count the number of hardware interrupts received.
1754.It Li p6-ifu-fetch
1755Count the number of instruction fetches, both cacheable and non-cacheable.
1756.It Li p6-ifu-fetch-miss
1757Count the number of instruction fetch misses (i.e., those that produce
1758memory accesses).
1759.It Li p6-ifu-mem-stall
1760Count the number of cycles instruction fetch is stalled for any reason.
1761.It Li p6-ild-stall
1762Count the number of cycles the instruction length decoder is stalled.
1763.It Li p6-inst-decoded
1764Count the number of instructions decoded.
1765.It Li p6-inst-retired
1766Count the number of instructions retired.
1767.It Li p6-itlb-miss
1768Count the number of instruction TLB misses.
1769.It Li p6-l2-ads
1770Count the number of L2 address strobes.
1771.It Li p6-l2-dbus-busy
1772Count the number of cycles during which the L2 cache data bus was busy.
1773.It Li p6-l2-dbus-busy-rd
1774Count the number of cycles during which the L2 cache data bus was busy
1775transferring read data from L2 to the processor.
1776.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier
1777Count the number of L2 instruction fetches.
1778An additional qualifier may be specified and comprises a list of the following
1779keywords separated by
1780.Ql +
1781characters:
1782.Pp
1783.Bl -tag -width indent -compact
1784.It Li e
1785Count operations affecting E (exclusive) state lines.
1786.It Li i
1787Count operations affecting I (invalid) state lines.
1788.It Li m
1789Count operations affecting M (modified) state lines.
1790.It Li s
1791Count operations affecting S (shared) state lines.
1792.El
1793.Pp
1794The default is to count operations affecting all (MESI) state lines.
1795.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier
1796Count the number of L2 data loads.
1797An additional qualifier may be specified and comprises a list of the following
1798keywords separated by
1799.Ql +
1800characters:
1801.Pp
1802.Bl -tag -width indent -compact
1803.It Li both
1804.Pq Tn "Pentium M"
1805Count both hardware-prefetched lines and non-hardware-prefetched lines.
1806.It Li e
1807Count operations affecting E (exclusive) state lines.
1808.It Li hw
1809.Pq Tn "Pentium M"
1810Count hardware-prefetched lines only.
1811.It Li i
1812Count operations affecting I (invalid) state lines.
1813.It Li m
1814Count operations affecting M (modified) state lines.
1815.It Li nonhw
1816.Pq Tn "Pentium M"
1817Exclude hardware-prefetched lines.
1818.It Li s
1819Count operations affecting S (shared) state lines.
1820.El
1821.Pp
1822The default on processors other than
1823.Tn "Pentium M"
1824processors is to count operations affecting all (MESI) state lines.
1825The default on
1826.Tn "Pentium M"
1827processors is to count both hardware-prefetched and
1828non-hardware-prefetch operations on all (MESI) state lines.
1829.Pq Errata
1830This event is affected by processor errata E53.
1831.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier
1832Count the number of L2 lines allocated.
1833An additional qualifier may be specified and comprises a list of the following
1834keywords separated by
1835.Ql +
1836characters:
1837.Pp
1838.Bl -tag -width indent -compact
1839.It Li both
1840.Pq Tn "Pentium M"
1841Count both hardware-prefetched lines and non-hardware-prefetched lines.
1842.It Li e
1843Count operations affecting E (exclusive) state lines.
1844.It Li hw
1845.Pq Tn "Pentium M"
1846Count hardware-prefetched lines only.
1847.It Li i
1848Count operations affecting I (invalid) state lines.
1849.It Li m
1850Count operations affecting M (modified) state lines.
1851.It Li nonhw
1852.Pq Tn "Pentium M"
1853Exclude hardware-prefetched lines.
1854.It Li s
1855Count operations affecting S (shared) state lines.
1856.El
1857.Pp
1858The default on processors other than
1859.Tn "Pentium M"
1860processors is to count operations affecting all (MESI) state lines.
1861The default on
1862.Tn "Pentium M"
1863processors is to count both hardware-prefetched and
1864non-hardware-prefetch operations on all (MESI) state lines.
1865.Pq Errata
1866This event is affected by processor errata E45.
1867.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier
1868Count the number of L2 lines evicted.
1869An additional qualifier may be specified and comprises a list of the following
1870keywords separated by
1871.Ql +
1872characters:
1873.Pp
1874.Bl -tag -width indent -compact
1875.It Li both
1876.Pq Tn "Pentium M"
1877Count both hardware-prefetched lines and non-hardware-prefetched lines.
1878.It Li e
1879Count operations affecting E (exclusive) state lines.
1880.It Li hw
1881.Pq Tn "Pentium M"
1882Count hardware-prefetched lines only.
1883.It Li i
1884Count operations affecting I (invalid) state lines.
1885.It Li m
1886Count operations affecting M (modified) state lines.
1887.It Li nonhw
1888.Pq Tn "Pentium M" only
1889Exclude hardware-prefetched lines.
1890.It Li s
1891Count operations affecting S (shared) state lines.
1892.El
1893.Pp
1894The default on processors other than
1895.Tn "Pentium M"
1896processors is to count operations affecting all (MESI) state lines.
1897The default on
1898.Tn "Pentium M"
1899processors is to count both hardware-prefetched and
1900non-hardware-prefetch operations on all (MESI) state lines.
1901.Pq Errata
1902This event is affected by processor errata E45.
1903.It Li p6-l2-m-lines-inm
1904Count the number of modified lines allocated in L2 cache.
1905.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier
1906Count the number of L2 M-state lines evicted.
1907.Pp
1908.Pq Tn "Pentium M"
1909On these processors an additional qualifier may be specified and
1910comprises a list of the following keywords separated by
1911.Ql +
1912characters:
1913.Pp
1914.Bl -tag -width indent -compact
1915.It Li both
1916Count both hardware-prefetched lines and non-hardware-prefetched lines.
1917.It Li hw
1918Count hardware-prefetched lines only.
1919.It Li nonhw
1920Exclude hardware-prefetched lines.
1921.El
1922.Pp
1923The default is to count both hardware-prefetched and
1924non-hardware-prefetch operations.
1925.Pq Errata
1926This event is affected by processor errata E53.
1927.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier
1928Count the total number of L2 requests.
1929An additional qualifier may be specified and comprises a list of the following
1930keywords separated by
1931.Ql +
1932characters:
1933.Pp
1934.Bl -tag -width indent -compact
1935.It Li e
1936Count operations affecting E (exclusive) state lines.
1937.It Li i
1938Count operations affecting I (invalid) state lines.
1939.It Li m
1940Count operations affecting M (modified) state lines.
1941.It Li s
1942Count operations affecting S (shared) state lines.
1943.El
1944.Pp
1945The default is to count operations affecting all (MESI) state lines.
1946.It Li p6-l2-st
1947Count the number of L2 data stores.
1948An additional qualifier may be specified and comprises a list of the following
1949keywords separated by
1950.Ql +
1951characters:
1952.Pp
1953.Bl -tag -width indent -compact
1954.It Li e
1955Count operations affecting E (exclusive) state lines.
1956.It Li i
1957Count operations affecting I (invalid) state lines.
1958.It Li m
1959Count operations affecting M (modified) state lines.
1960.It Li s
1961Count operations affecting S (shared) state lines.
1962.El
1963.Pp
1964The default is to count operations affecting all (MESI) state lines.
1965.It Li p6-ld-blocks
1966Count the number of load operations delayed due to store buffer blocks.
1967.It Li p6-misalign-mem-ref
1968Count the number of misaligned data memory references (crossing a 64
1969bit boundary).
1970.It Li p6-mmx-assist
1971.Pq Tn "Pentium II" , Tn "Pentium III"
1972Count the number of MMX assists executed.
1973.It Li p6-mmx-instr-exec
1974.Pq Tn Celeron , Tn "Pentium II"
1975Count the number of MMX instructions executed, except MOVQ and MOVD
1976stores from register to memory.
1977.It Li p6-mmx-instr-ret
1978.Pq Tn "Pentium II"
1979Count the number of MMX instructions retired.
1980.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier
1981.Pq Tn "Pentium II" , Tn "Pentium III"
1982Count the number of MMX instructions executed.
1983An additional qualifier may be specified and comprises a list of
1984the following keywords separated by
1985.Ql +
1986characters:
1987.Pp
1988.Bl -tag -width indent -compact
1989.It Li pack
1990Count MMX pack operation instructions.
1991.It Li packed-arithmetic
1992Count MMX packed arithmetic instructions.
1993.It Li packed-logical
1994Count MMX packed logical instructions.
1995.It Li packed-multiply
1996Count MMX packed multiply instructions.
1997.It Li packed-shift
1998Count MMX packed shift instructions.
1999.It Li unpack
2000Count MMX unpack operation instructions.
2001.El
2002.Pp
2003The default is to count all operations.
2004.It Li p6-mmx-sat-instr-exec
2005.Pq Tn "Pentium II" , Tn "Pentium III"
2006Count the number of MMX saturating instructions executed.
2007.It Li p6-mmx-uops-exec
2008.Pq Tn "Pentium II" , Tn "Pentium III"
2009Count the number of MMX micro-ops executed.
2010.It Li p6-mul
2011Count the number of integer and floating-point multiplies, including
2012speculative multiplies.
2013This event is only allocated on counter 1.
2014.It Li p6-partial-rat-stalls
2015Count the number of cycles or events for partial stalls.
2016.It Li p6-resource-stalls
2017Count the number of cycles there was a resource related stall of any kind.
2018.It Li p6-ret-seg-renames
2019.Pq Tn "Pentium II" , Tn "Pentium III"
2020Count the number of segment register rename events retired.
2021.It Li p6-sb-drains
2022Count the number of cycles the store buffer is draining.
2023.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier
2024.Pq Tn "Pentium II" , Tn "Pentium III"
2025Count the number of segment register renames.
2026An additional qualifier may be specified, and comprises a list of the
2027following keywords separated by
2028.Ql +
2029characters:
2030.Pp
2031.Bl -tag -width indent -compact
2032.It Li ds
2033Count renames for segment register DS.
2034.It Li es
2035Count renames for segment register ES.
2036.It Li fs
2037Count renames for segment register FS.
2038.It Li gs
2039Count renames for segment register GS.
2040.El
2041.Pp
2042The default is to count operations affecting all segment registers.
2043.It Li p6-seg-rename-stalls
2044.Pq Tn "Pentium II" , Tn "Pentium III"
2045Count the number of segment register renaming stalls.
2046An additional qualifier may be specified, and comprises a list of the
2047following keywords separated by
2048.Ql +
2049characters:
2050.Pp
2051.Bl -tag -width indent -compact
2052.It Li ds
2053Count stalls for segment register DS.
2054.It Li es
2055Count stalls for segment register ES.
2056.It Li fs
2057Count stalls for segment register FS.
2058.It Li gs
2059Count stalls for segment register GS.
2060.El
2061.Pp
2062The default is to count operations affecting all the segment registers.
2063.It Li p6-segment-reg-loads
2064Count the number of segment register loads.
2065.It Li p6-uops-retired
2066Count the number of micro-ops retired.
2067.El
2068.Ss Intel P4 PMCS
2069Intel P4 PMCs are present in Intel
2070.Tn "Pentium 4"
2071and
2072.Tn Xeon
2073processors.
2074These PMCs are documented in
2075.Rs
2076.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
2077.%T "Volume 3: System Programming Guide"
2078.%N "Order Number 245472-012"
2079.%D 2003
2080.%Q "Intel Corporation"
2081.Re
2082Further information about using these PMCs may be found in
2083.Rs
2084.%B "IA-32 Intel(R) Architecture Optimization Guide"
2085.%D 2003
2086.%N "Order Number 248966-009"
2087.%Q "Intel Corporation"
2088.Re
2089Some of these events are affected by processor errata described in
2090.Rs
2091.%B "Intel(R) Pentium(R) 4 Processor Specification Update"
2092.%N "Document Number: 249199-059"
2093.%D "April 2005"
2094.%Q "Intel Corporation"
2095.Re
2096.Pp
2097Event specifiers for Intel P4 PMCs can have the following common
2098qualifiers:
2099.Bl -tag -width indent
2100.It Li active= Ns Ar choice
2101(On P4 HTT CPUs) Filter event counting based on which logical
2102processors are active.
2103The allowed values of
2104.Ar choice
2105are:
2106.Pp
2107.Bl -tag -width indent -compact
2108.It Li any
2109Count when either logical processor is active.
2110.It Li both
2111Count when both logical processors are active.
2112.It Li none
2113Count only when neither logical processor is active.
2114.It Li single
2115Count only when one logical processor is active.
2116.El
2117.Pp
2118The default is
2119.Dq Li both .
2120.It Li cascade
2121Configure the PMC to cascade onto its partner.
2122See
2123.Sx "Cascading P4 PMCs"
2124below for more information.
2125.It Li edge
2126Configure the counter to count false to true transitions of the threshold
2127comparision output.
2128This qualifier only takes effect if a threshold qualifier has also been
2129specified.
2130.It Li complement
2131Configure the counter to increment only when the event count seen is
2132less than the threshold qualifier value specified.
2133.It Li mask= Ns Ar qualifier
2134Many event specifiers for Intel P4 PMCs need to be additionally
2135qualified using a mask qualifier.
2136The allowed syntax for these qualifiers is event specific and is
2137described along with the events.
2138.It Li os
2139Configure the PMC to count when the CPL of the processor is 0.
2140.It Li precise
2141Select precise event based sampling.
2142Precise sampling is supported by the hardware for a limited set of
2143events.
2144.It Li tag= Ns Ar value
2145Configure the PMC to tag the internal uop selected by the other
2146fields in this event specifier with value
2147.Ar value .
2148This feature is used when cascading PMCs.
2149.It Li threshold= Ns Ar value
2150Configure the PMC to increment only when the event counts seen are
2151greater than the specified threshold value
2152.Ar value .
2153.It Li usr
2154Configure the PMC to count when the CPL of the processor is 1, 2 or 3.
2155.El
2156.Pp
2157If neither of the
2158.Dq Li os
2159or
2160.Dq Li usr
2161qualifiers are specified, the default is to enable both.
2162.Pp
2163On Intel Pentium 4 processors with HTT, events are
2164divided into two classes:
2165.Pp
2166.Bl -tag -width indent -compact
2167.It "TS Events"
2168are those where hardware can differentiate between events
2169generated on one logical processor from those generated on the
2170other.
2171.It "TI Events"
2172are those where hardware cannot differentiate between events
2173generated by multiple logical processors in a package.
2174.El
2175.Pp
2176Only TS events are allowed for use with process-mode PMCs on
2177Pentium-4/HTT CPUs.
2178.Pp
2179The event specifiers supported by Intel P4 PMCs are:
2180.Pp
2181.Bl -tag -width indent
2182.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags
2183.Pq "TI event"
2184Count integer SIMD SSE2 instructions that operate on 128 bit SIMD
2185operands.
2186Qualifier
2187.Ar flags
2188can take the following value (which is also the default):
2189.Pp
2190.Bl -tag -width indent -compact
2191.It Li all
2192Count all uops operating on 128 bit SIMD integer operands in memory or
2193XMM register.
2194.El
2195.Pp
2196If an instruction contains more than one 128 bit MMX uop, then each
2197uop will be counted.
2198.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags
2199.Pq "TI event"
2200Count MMX instructions that operate on 64 bit SIMD operands.
2201Qualifier
2202.Ar flags
2203can take the following value (which is also the default):
2204.Pp
2205.Bl -tag -width indent -compact
2206.It Li all
2207Count all uops operating on 64 bit SIMD integer operands in memory or
2208in MMX registers.
2209.El
2210.Pp
2211If an instruction contains more than one 64 bit MMX uop, then each
2212uop will be counted.
2213.It Li p4-b2b-cycles
2214.Pq "TI event"
2215Count back-to-back bys cycles.
2216Further documentation for this event is unavailable.
2217.It Li p4-bnr
2218.Pq "TI event"
2219Count bus-not-ready conditions.
2220Further documentation for this event is unavailable.
2221.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier
2222.Pq "TS event"
2223Count instruction fetch requests qualified by additional
2224flags specified in
2225.Ar qualifier .
2226At this point only one flag is supported:
2227.Pp
2228.Bl -tag -width indent -compact
2229.It Li tcmiss
2230Count trace cache lookup misses.
2231.El
2232.Pp
2233The default qualifier is also
2234.Dq Li mask=tcmiss .
2235.It Li p4-branch-retired Op Li ,mask= Ns Ar flags
2236.Pq "TS event"
2237Counts retired branches.
2238Qualifier
2239.Ar flags
2240is a list of the following
2241.Ql +
2242separated strings:
2243.Pp
2244.Bl -tag -width indent -compact
2245.It Li mmnp
2246Count branches not-taken and predicted.
2247.It Li mmnm
2248Count branches not-taken and mis-predicted.
2249.It Li mmtp
2250Count branches taken and predicted.
2251.It Li mmtm
2252Count branches taken and mis-predicted.
2253.El
2254.Pp
2255The default qualifier counts all four kinds of branches.
2256.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier
2257.Pq "TS event"
2258Count the number of entries (clipped at 15) currently active in the
2259BSQ.
2260Qualifier
2261.Ar qualifier
2262is a
2263.Ql +
2264separated set of the following flags:
2265.Pp
2266.Bl -tag -width indent -compact
2267.It Li req-type0 , Li req-type1
2268Forms a 2-bit number used to select the request type encoding:
2269.Pp
2270.Bl -tag -width indent -compact
2271.It Li 0
2272reads excluding read invalidate
2273.It Li 1
2274read invalidates
2275.It Li 2
2276writes other than writebacks
2277.It Li 3
2278writebacks
2279.El
2280.Pp
2281Bit
2282.Dq Li req-type1
2283is the MSB for this two bit number.
2284.It Li req-len0 , Li req-len1
2285Forms a two-bit number that specifies the request length encoding:
2286.Pp
2287.Bl -tag -width indent -compact
2288.It Li 0
22890 chunks
2290.It Li 1
22911 chunk
2292.It Li 3
22938 chunks
2294.El
2295.Pp
2296Bit
2297.Dq Li req-len1
2298is the MSB for this two bit number.
2299.It Li req-io-type
2300Count requests that are input or output requests.
2301.It Li req-lock-type
2302Count requests that lock the bus.
2303.It Li req-lock-cache
2304Count requests that lock the cache.
2305.It Li req-split-type
2306Count requests that is a bus 8-byte chunk that is split across an
23078-byte boundary.
2308.It Li req-dem-type
2309Count requests that are demand (not prefetches) if set.
2310Count requests that are prefetches if not set.
2311.It Li req-ord-type
2312Count requests that are ordered.
2313.It Li mem-type0 , Li mem-type1 , Li mem-type2
2314Forms a 3-bit number that specifies a memory type encoding:
2315.Pp
2316.Bl -tag -width indent -compact
2317.It Li 0
2318UC
2319.It Li 1
2320USWC
2321.It Li 4
2322WT
2323.It Li 5
2324WP
2325.It Li 6
2326WB
2327.El
2328.Pp
2329Bit
2330.Dq Li mem-type2
2331is the MSB of this 3-bit number.
2332.El
2333.Pp
2334The default qualifier has all the above bits set.
2335.Pp
2336Edge triggering using the
2337.Dq Li edge
2338qualifier should not be used with this event when counting cycles.
2339.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier
2340.Pq "TS event"
2341Count allocations in the bus sequence unit according to the flags
2342specified in
2343.Ar qualifier ,
2344which is a
2345.Ql +
2346separated set of the following flags:
2347.Pp
2348.Bl -tag -width indent -compact
2349.It Li req-type0 , Li req-type1
2350Forms a 2-bit number used to select the request type encoding:
2351.Pp
2352.Bl -tag -width indent -compact
2353.It Li 0
2354reads excluding read invalidate
2355.It Li 1
2356read invalidates
2357.It Li 2
2358writes other than writebacks
2359.It Li 3
2360writebacks
2361.El
2362.Pp
2363Bit
2364.Dq Li req-type1
2365is the MSB for this two bit number.
2366.It Li req-len0 , Li req-len1
2367Forms a two-bit number that specifies the request length encoding:
2368.Pp
2369.Bl -tag -width indent -compact
2370.It Li 0
23710 chunks
2372.It Li 1
23731 chunk
2374.It Li 3
23758 chunks
2376.El
2377.Pp
2378Bit
2379.Dq Li req-len1
2380is the MSB for this two bit number.
2381.It Li req-io-type
2382Count requests that are input or output requests.
2383.It Li req-lock-type
2384Count requests that lock the bus.
2385.It Li req-lock-cache
2386Count requests that lock the cache.
2387.It Li req-split-type
2388Count requests that is a bus 8-byte chunk that is split across an
23898-byte boundary.
2390.It Li req-dem-type
2391Count requests that are demand (not prefetches) if set.
2392Count requests that are prefetches if not set.
2393.It Li req-ord-type
2394Count requests that are ordered.
2395.It Li mem-type0 , Li mem-type1 , Li mem-type2
2396Forms a 3-bit number that specifies a memory type encoding:
2397.Pp
2398.Bl -tag -width indent -compact
2399.It Li 0
2400UC
2401.It Li 1
2402USWC
2403.It Li 4
2404WT
2405.It Li 5
2406WP
2407.It Li 6
2408WB
2409.El
2410.Pp
2411Bit
2412.Dq Li mem-type2
2413is the MSB of this 3-bit number.
2414.El
2415.Pp
2416The default qualifier has all the above bits set.
2417.Pp
2418This event is usually used along with the
2419.Dq Li edge
2420qualifier to avoid multiple counting.
2421.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier
2422.Pq "TS event"
2423Count cache references as seen by the bus unit (2nd or 3rd level
2424cache references).
2425Qualifier
2426.Ar qualifier
2427is a
2428.Ql +
2429separated list of the following keywords:
2430.Pp
2431.Bl -tag -width indent -compact
2432.It Li rd-2ndl-hits
2433Count 2nd level cache hits in the shared state.
2434.It Li rd-2ndl-hite
2435Count 2nd level cache hits in the exclusive state.
2436.It Li rd-2ndl-hitm
2437Count 2nd level cache hits in the modified state.
2438.It Li rd-3rdl-hits
2439Count 3rd level cache hits in the shared state.
2440.It Li rd-3rdl-hite
2441Count 3rd level cache hits in the exclusive state.
2442.It Li rd-3rdl-hitm
2443Count 3rd level cache hits in the modified state.
2444.It Li rd-2ndl-miss
2445Count 2nd level cache misses.
2446.It Li rd-3rdl-miss
2447Count 3rd level cache misses.
2448.It Li wr-2ndl-miss
2449Count write-back lookups from the data access cache that miss the 2nd
2450level cache.
2451.El
2452.Pp
2453The default is to count all the above events.
2454.It Li p4-execution-event Op Li ,mask= Ns Ar flags
2455.Pq "TS event"
2456Count the retirement of tagged uops selected through the execution
2457tagging mechanism.
2458Qualifier
2459.Ar flags
2460can contain the following strings separated by
2461.Ql +
2462characters:
2463.Pp
2464.Bl -tag -width indent -compact
2465.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3
2466The marked uops are not bogus.
2467.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3
2468The marked uops are bogus.
2469.El
2470.Pp
2471This event requires additional (upstream) events to be allocated to
2472perform the desired uop tagging.
2473The default is to set all the above flags.
2474This event can be used for precise event based sampling.
2475.It Li p4-front-end-event Op Li ,mask= Ns Ar flags
2476.Pq "TS event"
2477Count the retirement of tagged uops selected through the front-end
2478tagging mechanism.
2479Qualifier
2480.Ar flags
2481can contain the following strings separated by
2482.Ql +
2483characters:
2484.Pp
2485.Bl -tag -width indent -compact
2486.It Li nbogus
2487The marked uops are not bogus.
2488.It Li bogus
2489The marked uops are bogus.
2490.El
2491.Pp
2492This event requires additional (upstream) events to be allocated to
2493perform the desired uop tagging.
2494The default is to select both kinds of events.
2495This event can be used for precise event based sampling.
2496.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags
2497.Pq "TI event"
2498Count each DBSY or DRDY event selected by qualifier
2499.Ar flags .
2500Qualifier
2501.Ar flags
2502is a
2503.Ql +
2504separated set of the following flags:
2505.Pp
2506.Bl -tag -width indent -compact
2507.It Li drdy-drv
2508Count when this processor is driving data onto the bus.
2509.It Li drdy-own
2510Count when this processor is reading data from the bus.
2511.It Li drdy-other
2512Count when data is on the bus but not being sampled by this processor.
2513.It Li dbsy-drv
2514Count when this processor reserves the bus for use in the next cycle
2515in order to drive data.
2516.It Li dbsy-own
2517Count when some agent reserves the bus for use in the next bus cycle
2518to drive data that this processor will sample.
2519.It Li dbsy-other
2520Count when some agent reserves the bus for use in the next bus cycle
2521to drive data that this processor will not sample.
2522.El
2523.Pp
2524Flags
2525.Dq Li drdy-own
2526and
2527.Dq Li drdy-other
2528are mutually exclusive.
2529Flags
2530.Dq Li dbsy-own
2531and
2532.Dq Li dbsy-other
2533are mutually exclusive.
2534The default value for
2535.Ar qualifier
2536is
2537.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own .
2538.It Li p4-global-power-events Op Li ,mask= Ns Ar flags
2539.Pq "TS event"
2540Count cycles during which the processor is not stopped.
2541Qualifier
2542.Ar flags
2543can take the following value (which is also the default):
2544.Pp
2545.Bl -tag -width indent -compact
2546.It Li running
2547Count cycles when the processor is active.
2548.El
2549.Pp
2550.It Li p4-instr-retired Op Li ,mask= Ns Ar flags
2551.Pq "TS event"
2552Count instructions retired during a clock cycle.
2553Qualifer
2554.Ar flags
2555comprises of the following strings separated by
2556.Ql +
2557characters:
2558.Pp
2559.Bl -tag -width indent -compact
2560.It Li nbogusntag
2561Count non-bogus instructions that are not tagged.
2562.It Li nbogustag
2563Count non-bogus instructions that are tagged.
2564.It Li bogusntag
2565Count bogus instructions that are not tagged.
2566.It Li bogustag
2567Count bogus instructions that are tagged.
2568.El
2569.Pp
2570The default qualifier counts all the above kinds of instructions.
2571.It Li p4-ioq-active-entries Xo
2572.Op Li ,mask= Ns Ar qualifier
2573.Op Li ,busreqtype= Ns Ar req-type
2574.Xc
2575.Pq "TS event"
2576Count the number of entries (clipped at 15) in the IOQ that are
2577active.
2578The event masks are specified by qualifier
2579.Ar qualifier
2580and
2581.Ar req-type .
2582.Pp
2583Qualifier
2584.Ar qualifier
2585is a
2586.Ql +
2587separated set of the following flags:
2588.Pp
2589.Bl -tag -width indent -compact
2590.It Li all-read
2591Count read entries.
2592.It Li all-write
2593Count write entries.
2594.It Li mem-uc
2595Count entries accessing uncacheable memory.
2596.It Li mem-wc
2597Count entries accessing write-combining memory.
2598.It Li mem-wt
2599Count entries accessing write-through memory.
2600.It Li mem-wp
2601Count entries accessing write-protected memory
2602.It Li mem-wb
2603Count entries accessing write-back memory.
2604.It Li own
2605Count store requests driven by the processor (i.e., not by other
2606processors or by DMA).
2607.It Li other
2608Count store requests driven by other processors or by DMA.
2609.It Li prefetch
2610Include hardware and software prefetch requests in the count.
2611.El
2612.Pp
2613The default value for
2614.Ar qualifier
2615is to enable all the above flags.
2616.Pp
2617The
2618.Ar req-type
2619qualifier is a 5-bit number can be additionally used to select a
2620specific bus request type.
2621The default is 0.
2622.Pp
2623The
2624.Dq Li edge
2625qualifier should not be used when counting cycles with this event.
2626The exact behaviour of this event depends on the processor revision.
2627.It Li p4-ioq-allocation Xo
2628.Op Li ,mask= Ns Ar qualifier
2629.Op Li ,busreqtype= Ns Ar req-type
2630.Xc
2631.Pq "TS event"
2632Count various types of transactions on the bus matching the flags set
2633in
2634.Ar qualifier
2635and
2636.Ar req-type .
2637.Pp
2638Qualifier
2639.Ar qualifier
2640is a
2641.Ql +
2642separated set of the following flags:
2643.Pp
2644.Bl -tag -width indent -compact
2645.It Li all-read
2646Count read entries.
2647.It Li all-write
2648Count write entries.
2649.It Li mem-uc
2650Count entries accessing uncacheable memory.
2651.It Li mem-wc
2652Count entries accessing write-combining memory.
2653.It Li mem-wt
2654Count entries accessing write-through memory.
2655.It Li mem-wp
2656Count entries accessing write-protected memory
2657.It Li mem-wb
2658Count entries accessing write-back memory.
2659.It Li own
2660Count store requests driven by the processor (i.e., not by other
2661processors or by DMA).
2662.It Li other
2663Count store requests driven by other processors or by DMA.
2664.It Li prefetch
2665Include hardware and software prefetch requests in the count.
2666.El
2667.Pp
2668The default value for
2669.Ar qualifier
2670is to enable all the above flags.
2671.Pp
2672The
2673.Ar req-type
2674qualifier is a 5-bit number can be additionally used to select a
2675specific bus request type.
2676The default is 0.
2677.Pp
2678The
2679.Dq Li edge
2680qualifier is normally used with this event to prevent multiple
2681counting.
2682The exact behaviour of this event depends on the processor revision.
2683.It Li p4-itlb-reference Op mask= Ns Ar qualifier
2684.Pq "TS event"
2685Count translations using the intruction translation look-aside
2686buffer.
2687The
2688.Ar qualifier
2689argument is a list of the following strings separated by
2690.Ql +
2691characters.
2692.Pp
2693.Bl -tag -width indent -compact
2694.It Li hit
2695Count ITLB hits.
2696.It Li miss
2697Count ITLB misses.
2698.It Li hit-uc
2699Count uncacheable ITLB hits.
2700.El
2701.Pp
2702If no
2703.Ar qualifier
2704is specified the default is to count all the three kinds of ITLB
2705translations.
2706.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier
2707.Pq "TS event"
2708Count replayed events at the load port.
2709Qualifier
2710.Ar qualifier
2711can take on one value:
2712.Pp
2713.Bl -tag -width indent -compact
2714.It Li split-ld
2715Count split loads.
2716.El
2717.Pp
2718The default value for
2719.Ar qualifier
2720is
2721.Dq Li split-ld .
2722.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags
2723.Pq "TS event"
2724Count mispredicted IA-32 branch instructions.
2725Qualifier
2726.Ar flags
2727can take the following value (which is also the default):
2728.Pp
2729.Bl -tag -width indent -compact
2730.It Li nbogus
2731Count non-bogus retired branch instructions.
2732.El
2733.It Li p4-machine-clear Op Li ,mask= Ns Ar flags
2734.Pq "TS event"
2735Count the number of pipeline clears seen by the processor.
2736Qualifer
2737.Ar flags
2738is a list of the following strings separated by
2739.Ql +
2740characters:
2741.Pp
2742.Bl -tag -width indent -compact
2743.It Li clear
2744Count for a portion of the many cycles when the machine is being
2745cleared for any reason.
2746.It Li moclear
2747Count machine clears due to memory ordering issues.
2748.It Li smclear
2749Count machine clears due to self-modifying code.
2750.El
2751.Pp
2752Use qualifier
2753.Dq Li edge
2754to get a count of occurrences of machine clears.
2755The default qualifier is
2756.Dq Li clear .
2757.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list
2758.Pq "TS event"
2759Count the cancelling of various kinds of requests in the data cache
2760address control unit of the CPU.
2761The qualifier
2762.Ar event-list
2763is a list of the following strings separated by
2764.Ql +
2765characters:
2766.Pp
2767.Bl -tag -width indent -compact
2768.It Li st-rb-full
2769Requests cancelled because no store request buffer was available.
2770.It Li 64k-conf
2771Requests that conflict due to 64K aliasing.
2772.El
2773.Pp
2774If
2775.Ar event-list
2776is not specified, then the default is to count both kinds of events.
2777.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list
2778.Pq "TS event"
2779Count the completion of load split, store split, uncacheable split and
2780uncacheable load operations selected by qualifier
2781.Ar event-list .
2782The qualifier
2783.Ar event-list
2784is a
2785.Ql +
2786separated list of the following flags:
2787.Pp
2788.Bl -tag -width indent -compact
2789.It Li lsc
2790Count load splits completed, excluding loads from uncacheable or
2791write-combining areas.
2792.It Li ssc
2793Count any split stores completed.
2794.El
2795.Pp
2796The default is to count both kinds of operations.
2797.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier
2798.Pq "TS event"
2799Count load replays triggered by the memory order buffer.
2800Qualifier
2801.Ar qualifier
2802can be a
2803.Ql +
2804separated list of the following flags:
2805.Pp
2806.Bl -tag -width indent -compact
2807.It Li no-sta
2808Count replays because of unknown store addresses.
2809.It Li no-std
2810Count replays because of unknown store data.
2811.It Li partial-data
2812Count replays because of partially overlapped data accesses between
2813load and store operations.
2814.It Li unalgn-addr
2815Count replays because of mismatches in the lower 4 bits of load and
2816store operations.
2817.El
2818.Pp
2819The default qualifier is
2820.Ar no-sta+no-std+partial-data+unalgn-addr .
2821.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags
2822.Pq "TI event"
2823Count packed double-precision uops.
2824Qualifier
2825.Ar flags
2826can take the following value (which is also the default):
2827.Pp
2828.Bl -tag -width indent -compact
2829.It Li all
2830Count all uops operating on packed double-precision operands.
2831.El
2832.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags
2833.Pq "TI event"
2834Count packed single-precision uops.
2835Qualifier
2836.Ar flags
2837can take the following value (which is also the default):
2838.Pp
2839.Bl -tag -width indent -compact
2840.It Li all
2841Count all uops operating on packed single-precision operands.
2842.El
2843.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier
2844.Pq "TI event"
2845Count page walks performed by the page miss handler.
2846Qualifier
2847.Ar qualifier
2848can be a
2849.Ql +
2850separated list of the following keywords:
2851.Pp
2852.Bl -tag -width indent -compact
2853.It Li dtmiss
2854Count page walks for data TLB misses.
2855.It Li itmiss
2856Count page walks for instruction TLB misses.
2857.El
2858.Pp
2859The default value for
2860.Ar qualifier
2861is
2862.Dq Li dtmiss+itmiss .
2863.It Li p4-replay-event Op Li ,mask= Ns Ar flags
2864.Pq "TS event"
2865Count the retirement of tagged uops selected through the replay
2866tagging mechanism.
2867Qualifier
2868.Ar flags
2869contains a
2870.Ql +
2871separated set of the following strings:
2872.Pp
2873.Bl -tag -width indent -compact
2874.It Li nbogus
2875The marked uops are not bogus.
2876.It Li bogus
2877The marked uops are bogus.
2878.El
2879.Pp
2880This event requires additional (upstream) events to be allocated to
2881perform the desired uop tagging.
2882The default qualifier counts both kinds of uops.
2883This event can be used for precise event based sampling.
2884.It Li p4-resource-stall Op Li ,mask= Ns Ar flags
2885.Pq "TS event"
2886Count the occurrence or latency of stalls in the allocator.
2887Qualifier
2888.Ar flags
2889can take the following value (which is also the default):
2890.Pp
2891.Bl -tag -width indent -compact
2892.It Li sbfull
2893A stall due to the lack of store buffers.
2894.El
2895.It Li p4-response
2896.Pq "TI event"
2897Count different types of responses.
2898Further documentation on this event is not available.
2899.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags
2900.Pq "TS event"
2901Count branches retired.
2902Qualifier
2903.Ar flags
2904contains a
2905.Ql +
2906separated list of strings:
2907.Pp
2908.Bl -tag -width indent -compact
2909.It Li conditional
2910Count conditional jumps.
2911.It Li call
2912Count direct and indirect call branches.
2913.It Li return
2914Count return branches.
2915.It Li indirect
2916Count returns, indirect calls or indirect jumps.
2917.El
2918.Pp
2919The default qualifier counts all the above branch types.
2920.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags
2921.Pq "TS event"
2922Count mispredicted branches retired.
2923Qualifier
2924.Ar flags
2925contains a
2926.Ql +
2927separated list of strings:
2928.Pp
2929.Bl -tag -width indent -compact
2930.It Li conditional
2931Count conditional jumps.
2932.It Li call
2933Count indirect call branches.
2934.It Li return
2935Count return branches.
2936.It Li indirect
2937Count returns, indirect calls or indirect jumps.
2938.El
2939.Pp
2940The default qualifier counts all the above branch types.
2941.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags
2942.Pq "TI event"
2943Count the number of scalar double-precision uops.
2944Qualifier
2945.Ar flags
2946can take the following value (which is also the default):
2947.Pp
2948.Bl -tag -width indent -compact
2949.It Li all
2950Count the number of scalar double-precision uops.
2951.El
2952.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags
2953.Pq "TI event"
2954Count the number of scalar single-precision uops.
2955Qualifier
2956.Ar flags
2957can take the following value (which is also the default):
2958.Pp
2959.Bl -tag -width indent -compact
2960.It Li all
2961Count all uops operating on scalar single-precision operands.
2962.El
2963.It Li p4-snoop
2964.Pq "TI event"
2965Count snoop traffic.
2966Further documentation on this event is not available.
2967.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags
2968.Pq "TI event"
2969Count the number of times an assist is required to handle problems
2970with the operands for SSE and SSE2 operations.
2971Qualifier
2972.Ar flags
2973can take the following value (which is also the default):
2974.Pp
2975.Bl -tag -width indent -compact
2976.It Li all
2977Count assists for all SSE and SSE2 uops.
2978.El
2979.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier
2980.Pq "TS event"
2981Count events replayed at the store port.
2982Qualifier
2983.Ar qualifier
2984can take on one value:
2985.Pp
2986.Bl -tag -width indent -compact
2987.It Li split-st
2988Count split stores.
2989.El
2990.Pp
2991The default value for
2992.Ar qualifier
2993is
2994.Dq Li split-st .
2995.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier
2996.Pq "TI event"
2997Count the duration in cycles of operating modes of the trace cache and
2998decode engine.
2999The desired operating mode is selected by
3000.Ar qualifier ,
3001which is a list of the following strings separated by
3002.Ql +
3003characters:
3004.Pp
3005.Bl -tag -width indent -compact
3006.It Li DD
3007Both logical processors are in deliver mode.
3008.It Li DB
3009Logical processor 0 is in deliver mode while logical processor 1 is in
3010build mode.
3011.It Li DI
3012Logical processor 0 is in deliver mode while logical processor 1 is
3013halted, or in machine clear, or transitioning to a long microcode
3014flow.
3015.It Li BD
3016Logical processor 0 is in build mode while logical processor 1 is in
3017deliver mode.
3018.It Li BB
3019Both logical processors are in build mode.
3020.It Li BI
3021Logical processor 0 is in build mode while logical processor 1 is
3022halted, or in machine clear or transitioning to a long microcode
3023flow.
3024.It Li ID
3025Logical processor 0 is halted, or in machine clear or transitioning to
3026a long microcode flow while logical processor 1 is in deliver mode.
3027.It Li IB
3028Logical processor 0 is halted, or in machine clear or transitioning to
3029a long microcode flow while logical processor 1 is in build mode.
3030.El
3031.Pp
3032If there is only one logical processor in the processor package then
3033the qualifier for logical processor 1 is ignored.
3034If no qualifier is specified, the default qualifier is
3035.Dq Li DD+DB+DI+BD+BB+BI+ID+IB .
3036.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags
3037.Pq "TI event"
3038Count the number of times uop delivery changed from the trace cache to
3039MS ROM.
3040Qualifier
3041.Ar flags
3042can take the following value (which is also the default):
3043.Pp
3044.Bl -tag -width indent -compact
3045.It Li cisc
3046Count TC to MS transfers.
3047.El
3048.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags
3049.Pq "TS event"
3050Count the number of valid uops written to the uop queue.
3051Qualifier
3052.Ar flags
3053is a list of the following strings, separated by
3054.Ql +
3055characters:
3056.Pp
3057.Bl -tag -width indent -compact
3058.It Li from-tc-build
3059Count uops being written from the trace cache in build mode.
3060.It Li from-tc-deliver
3061Count uops being written from the trace cache in deliver mode.
3062.It Li from-rom
3063Count uops being written from microcode ROM.
3064.El
3065.Pp
3066The default qualifier counts all the above kinds of uops.
3067.It Li p4-uop-type Op Li ,mask= Ns Ar flags
3068.Pq "TS event"
3069This event is used in conjunction with the front-end at-retirement
3070mechanism to tag load and store uops.
3071Qualifer
3072.Ar flags
3073comprises the following strings separated by
3074.Ql +
3075characters:
3076.Pp
3077.Bl -tag -width indent -compact
3078.It Li tagloads
3079Mark uops that are load operations.
3080.It Li tagstores
3081Mark uops that are store operations.
3082.El
3083.Pp
3084The default qualifier counts both kinds of uops.
3085.It Li p4-uops-retired Op Li ,mask= Ns Ar flags
3086.Pq "TS event"
3087Count uops retired during a clock cycle.
3088Qualifier
3089.Ar flags
3090comprises the following strings separated by
3091.Ql +
3092characters:
3093.Pp
3094.Bl -tag -width indent -compact
3095.It Li nbogus
3096Count marked uops that are not bogus.
3097.It Li bogus
3098Count marked uops that are bogus.
3099.El
3100.Pp
3101The default qualifier counts both kinds of uops.
3102.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags
3103.Pq "TI event"
3104Count write-combining buffer operations.
3105Qualifier
3106.Ar flags
3107contains the following strings separated by
3108.Ql +
3109characters:
3110.Pp
3111.Bl -tag -width indent -compact
3112.It Li wcb-evicts
3113WC buffer evictions due to any cause.
3114.It Li wcb-full-evict
3115WC buffer evictions due to no WC buffer being available.
3116.El
3117.Pp
3118The default qualifer counts both kinds of evictions.
3119.It Li p4-x87-assist Op Li ,mask= Ns Ar flags
3120.Pq "TS event"
3121Count the retirement of x87 instructions that required special
3122handling.
3123Qualifier
3124.Ar flags
3125contains the following strings separated by
3126.Ql +
3127characters:
3128.Pp
3129.Bl -tag -width indent -compact
3130.It Li fpsu
3131Count instructions that saw an FP stack underflow.
3132.It Li fpso
3133Count instructions that saw an FP stack overflow.
3134.It Li poao
3135Count instructions that saw an x87 output overflow.
3136.It Li poau
3137Count instructions that saw an x87 output underflow.
3138.It Li prea
3139Count instructions that needed an x87 input assist.
3140.El
3141.Pp
3142The default qualifier counts all the above types of instruction
3143retirements.
3144.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags
3145.Pq "TI event"
3146Count x87 floating-point uops.
3147Qualifier
3148.Ar flags
3149can take the following value (which is also the default):
3150.Pp
3151.Bl -tag -width indent -compact
3152.It Li all
3153Count all x87 floating-point uops.
3154.El
3155.Pp
3156If an instruction contains more than one x87 floating-point uops, then
3157all x87 floating-point uops will be counted.
3158This event does not count x87 floating-point data movement operations.
3159.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags
3160.Pq "TI event"
3161Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store
3162data or perform register-to-register moves.
3163This event does not count integer move uops.
3164Qualifier
3165.Ar flags
3166may contain the following keywords separated by
3167.Ql +
3168characters:
3169.Pp
3170.Bl -tag -width indent -compact
3171.It Li allp0
3172Count all x87 and SIMD store and move uops.
3173.It Li allp2
3174Count all x87 and SIMD load uops.
3175.El
3176.Pp
3177The default is to count all uops.
3178.Pq Errata
3179This event may be affected by processor errata N43.
3180.El
3181.Ss "Cascading P4 PMCs"
3182PMC cascading support is currently poorly implemented.
3183While individual event counters may be allocated with a
3184.Dq Li cascade
3185qualifier, the current API does not offer the ability
3186to name and allocate all the resources needed for a
3187cascaded event counter pair in a single operation.
3188.Ss "Precise Event Based Sampling"
3189Support for precise event based sampling is currently
3190unimplemented.
3191.Sh COMPATIBILITY
3192The interface between the
3193.Nm pmc
3194library and the
3195.Xr hwpmc 4
3196driver is intended to be private to the implementation and may
3197change.
3198In order to ease forward compatibility with future versions of the
3199.Xr hwpmc 4
3200driver, applications are urged to dynamically link with the
3201.Nm pmc
3202library.
3203.Pp
3204The
3205.Nm pmc
3206API is
3207.Ud
3208.Sh SEE ALSO
3209.Xr pmclog 3 ,
3210.Xr hwpmc 4 ,
3211.Xr pmccontrol 8 ,
3212.Xr pmcstat 8
3213.Sh HISTORY
3214The
3215.Nm pmc
3216library first appeared in
3217.Fx 6.0 .
3218.Sh AUTHORS
3219The
3220.Lb libpmc
3221library was written by
3222.An "Joseph Koshy"
3223.Aq jkoshy@FreeBSD.org .
3224