xref: /freebsd/lib/libpmc/pmc.3 (revision 4fd2d3b6927878771635a3628ae1623daf810d39)
1.\" Copyright (c) 2003-2008 Joseph Koshy.  All rights reserved.
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" This software is provided by Joseph Koshy ``as is'' and
13.\" any express or implied warranties, including, but not limited to, the
14.\" implied warranties of merchantability and fitness for a particular purpose
15.\" are disclaimed.  in no event shall Joseph Koshy be liable
16.\" for any direct, indirect, incidental, special, exemplary, or consequential
17.\" damages (including, but not limited to, procurement of substitute goods
18.\" or services; loss of use, data, or profits; or business interruption)
19.\" however caused and on any theory of liability, whether in contract, strict
20.\" liability, or tort (including negligence or otherwise) arising in any way
21.\" out of the use of this software, even if advised of the possibility of
22.\" such damage.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd March 14, 2008
27.Os
28.Dt PMC 3
29.Sh NAME
30.Nm pmc
31.Nd library for accessing hardware performance monitoring counters
32.Sh LIBRARY
33.Lb libpmc
34.Sh SYNOPSIS
35.In pmc.h
36.Sh DESCRIPTION
37The
38.Lb libpmc
39provides a programming interface that allows applications to use
40hardware performance counters to gather performance data about
41specific processes or for the system as a whole.
42The library is implemented using the lower-level facilities offered by
43the
44.Xr hwpmc 4
45driver.
46.Ss Key Concepts
47Performance monitoring counters (PMCs) are represented by the library
48using a software abstraction.
49These
50.Dq abstract
51PMCs can have one two scopes:
52.Bl -bullet
53.It
54System scope.
55These PMCs measure events in a whole-system manner, i.e., independent
56of the currently executing thread.
57System scope PMCs are allocated on specific CPUs and do not
58migrate between CPUs.
59Non-privileged process are allowed to allocate system scope PMCs if the
60.Xr hwpmc 4
61sysctl tunable:
62.Va security.bsd.unprivileged_syspmcs
63is non-zero.
64.It
65Process scope.
66These PMCs only measure hardware events when the processes they are
67attached to are executing on a CPU.
68In an SMP system, process scope PMCs migrate between CPUs along with
69their target processes.
70.El
71.Pp
72Orthogonal to PMC scope, PMCs may be allocated in one of two
73operational modes:
74.Bl -bullet
75.It
76Counting PMCs measure events according to their scope
77(system or process).
78The application needs to explicitly read these counters
79to retrieve their value.
80.It
81Sampling PMCs cause the CPU to be periodically interrupted
82and information about its state of execution to be collected.
83Sampling PMCs are used to profile specific processes and kernel
84threads or to profile the system as a whole.
85.El
86.Pp
87The scope and operational mode for a software PMC are specified at
88PMC allocation time.
89An application is allowed to allocate multiple PMCs subject
90to availability of hardware resources.
91.Pp
92The library uses human-readable strings to name the event being
93measured by hardware.
94The syntax used for specifying a hardware event along with additional
95event specific qualifiers (if any) is described in detail in section
96.Sx "EVENT SPECIFIERS"
97below.
98.Pp
99PMCs are associated with the process that allocated them and
100will be automatically reclaimed by the system when the process exits.
101Additionally, process-scope PMCs have to be attached to one or more
102target processes before they can perform measurements.
103A process-scope PMC may be attached to those target processes
104that its owner process would otherwise be permitted to debug.
105An owner process may attach PMCs to itself allowing
106it to measure its own behavior.
107Additionally, on some machine architectures, such self-attached PMCs
108may be read cheaply using specialized instructions supported by the
109processor.
110.Pp
111Certain kinds of PMCs require that a log file be configured before
112they may be started.
113These include:
114.Bl -bullet -compact
115.It
116System scope sampling PMCs.
117.It
118Process scope sampling PMCs.
119.It
120Process scope counting PMCs that have been configured to report PMC
121readings on process context switches or process exits.
122.El
123Upto one log file may be configured per owner process.
124Events logged to a log file may be subsequently analyzed using the
125.Xr pmclog 3
126family of functions.
127.Ss Supported CPUs
128The CPUs known to the PMC library are named by the
129.Vt "enum pmc_cputype"
130enumeration.
131Supported CPUs include:
132.Bl -tag -width PMC_CPU_INTEL_PIII -compact
133.It PMC_CPU_AMD_K7
134.Tn "AMD Athlon"
135CPUs.
136.It PMC_CPU_AMD_K8
137.Tn "AMD Athlon64"
138CPUs.
139.It PMC_CPU_INTEL_P5
140.Tn Intel
141.Tn "Pentium"
142CPUs.
143.It PMC_CPU_INTEL_P6
144.Tn Intel
145.Tn "Pentium Pro"
146CPUs.
147.It PMC_CPU_INTEL_PII
148.Tn "Intel Pentium II"
149CPUs.
150.It PMC_CPU_INTEL_PIII
151.Tn "Intel Pentium III"
152CPUs.
153.It PMC_CPU_INTEL_PM
154.Tn "Intel Pentium M"
155CPUs.
156.It PMC_CPU_INTEL_PIV
157.Tn "Intel Pentium 4"
158CPUs.
159.El
160.Ss Supported PMCs
161PMC supported by this library are named by the
162.Vt enum pmc_class
163enumeration.
164Supported PMC kinds include:
165.Bl -tag -width PMC_CLASS_TSC -compact
166.It PMC_CLASS_TSC
167The timestamp counter on i386 and amd64 architecture CPUs.
168.It PMC_CLASS_K7
169Programmable hardware counters present in
170.Tn "AMD Athlon"
171CPUs.
172.It PMC_CLASS_K8
173Programmable hardware counters present in
174.Tn "AMD Athlon64"
175CPUs.
176.It PMC_CLASS_P5
177Programmable hardware counters present in
178.Tn Intel
179.Tn Pentium
180CPUs.
181.It PMC_CLASS_P6
182Programmable hardware counters present in
183.Tn Intel
184.Tn "Pentium Pro" ,
185.Tn "Pentium II" ,
186.Tn "Pentium III" ,
187.Tn "Celeron" ,
188and
189.Tn "Pentium M"
190CPUs.
191.It PMC_CLASS_P4
192Programmable hardware counters present in
193.Tn "Intel Pentium 4"
194CPUs.
195.El
196.Ss PMC Capabilities
197.Pp
198Capabilities of performance monitoring hardware are denoted using
199the
200.Vt "enum pmc_caps"
201enumeration.
202Supported capabilities include:
203.Bl -tag -width "PMC_CAP_INTERRUPT" -compact
204.It PMC_CAP_EDGE
205The ability to count negated to asserted transitions of the hardware
206conditions being probed for.
207.It PMC_CAP_INTERRUPT
208The ability to interrupt the CPU.
209.It PMC_CAP_INVERT
210The ability to invert the sense of the hardware conditions being
211measured.
212.It PMC_CAP_READ
213PMC hardware allows the CPU to read performance counters.
214.It PMC_CAP_QUALIFIER
215The hardware allows monitored to be further qualified in some
216system dependent way.
217.It PMC_CAP_SYSTEM
218The ability to restrict counting of hardware events to when the CPU is
219running privileged code.
220.It PMC_CAP_THRESHOLD
221The ability to ignore simultaneous hardware events below a
222programmable threshold.
223.It PMC_CAP_USER
224The ability to restrict counting of hardware events to those when the
225CPU is running unprivileged code.
226.It PMC_CAP_WRITE
227PMC hardware allows CPUs write to counters.
228.El
229.Ss Functional Grouping
230This section contains a brief overview of the available functionality
231in the PMC library.
232Each function listed here is described further in its own manual page.
233.Bl -tag -width indent
234.It Administration
235.Bl -tag -compact
236.It Fn pmc_disable , Fn pmc_enable
237Administratively disable (enable) specific performance monitoring
238counter hardware.
239Counters that are disabled will not be available to applications to
240use.
241.El
242.It "Convenience Functions"
243.Bl -tag -compact
244.It Fn pmc_event_names_of_class
245Returns a list of event names supported by a given PMC type.
246.It Fn pmc_name_of_capability
247Convert a
248.Dv PMC_CAP_*
249flag to a human-readable string.
250.It Fn pmc_name_of_class
251Convert a
252.Dv PMC_CLASS_*
253constant to a human-readable string.
254.It Fn pmc_name_of_cputype
255Return a human-readable name for a CPU type.
256.It Fn pmc_name_of_disposition
257Return a human-readable string describing a PMC's disposition.
258.It Fn pmc_name_of_event
259Convert a numeric event code to a human-readable string.
260.It Fn pmc_name_of_mode
261Convert a
262.Dv PMC_MODE_*
263constant to a human-readable name.
264.It Fn pmc_name_of_state
265Return a human-readable string describing a PMC's current state.
266.El
267.It "Library Initialization"
268.Bl -tag -compact
269.It Fn pmc_init
270Initialize the library.
271This function must be called before any other library function.
272.El
273.It "Log File Handling"
274.Bl -tag -compact
275.It Fn pmc_configure_logfile
276Configure a log file for
277.Xr hwpmc 4
278to write logged events to.
279.It Fn pmc_flush_logfile
280Flush all pending log data in
281.Xr hwpmc 4 Ns Ap s
282buffers.
283.It Fn pmc_writelog
284Append arbitrary user data to the current log file.
285.El
286.It "PMC Management"
287.Bl -tag -compact
288.It Fn pmc_allocate , Fn pmc_release
289Allocate (free) a PMC.
290.It Fn pmc_attach , Fn pmc_detach
291Attach (detach) a process scope PMC to a target.
292.It Fn pmc_read , Fn pmc_write , Fn pmc_rw
293Read (write) a value from (to) a PMC.
294.It Fn pmc_start , Fn pmc_stop
295Start (stop) a software PMC.
296.It Fn pmc_set
297Set the reload value for a sampling PMC.
298.El
299.It "Queries"
300.Bl -tag -compact
301.It Fn pmc_capabilities
302Retrieve the capabilities for a given PMC.
303.It Fn pmc_cpuinfo
304Retrieve information about the CPUs and PMC hardware present in the
305system.
306.It Fn pmc_get_driver_stats
307Retrieve statistics maintained by
308.Xr hwpmc 4 .
309.It Fn pmc_ncpu
310Determine the number of CPUs in the system.
311.It Fn pmc_npmc
312Return the number of hardware PMCs present in a given CPU.
313.It Fn pmc_pmcinfo
314Return information about the state of a given CPU's PMCs.
315.It Fn pmc_width
316Determine the width of a hardware counter in bits.
317.El
318.It "x86 Architecture Specific API"
319.Bl -tag -compact
320.It Fn pmc_get_msr
321Returns the processor model specific register number
322associated with
323.Fa pmc .
324Applications may then use the x86
325.Ic RDPMC
326instruction to directly read the contents of the PMC.
327.El
328.El
329.Ss Signal Handling Requirements
330Applications using PMCs are required to handle the following signals:
331.Bl -tag -width ".Dv SIGBUS"
332.It Dv SIGBUS
333When the
334.Xr hwpmc 4
335module is unloaded using
336.Xr kldunload 8 ,
337processes that have PMCs allocated to them will be sent a
338.Dv SIGBUS
339signal.
340.It Dv SIGIO
341The
342.Xr hwpmc 4
343driver will send a PMC owning process a
344.Dv SIGIO
345signal if:
346.Bl -bullet
347.It
348If any process-mode PMC allocated by it loses all its
349target processes.
350.It
351If the driver encounters an error when writing log data to a
352configured log file.
353This error may be retrieved by a subsequent call to
354.Fn pmc_flush_logfile .
355.El
356.El
357.Ss Typical Program Flow
358.Bl -enum
359.It
360An application would first invoke function
361.Fn pmc_init
362to allow the library to initialize itself.
363.It
364Signal handling would then be set up.
365.It
366Next the application would allocate the PMCs it desires using function
367.Fn pmc_allocate .
368.It
369Initial values for PMCs may be set using function
370.Fn pmc_set .
371.It
372If a log file is necessary for the PMCs to work, it would
373be configured using function
374.Fn pmc_configure_logfile .
375.It
376Process scope PMCs would then be attached to their target processes
377using function
378.Fn pmc_attach .
379.It
380The PMCs would then be started using function
381.Fn pmc_start .
382.It
383Once started, the values of counting PMCs may be read using function
384.Fn pmc_start .
385For PMCs that write events to the log file, this logged data would be
386read and parsed using the
387.Xr pmclog 3
388family of functions.
389.It
390PMCs are stopped using function
391.Fn pmc_stop ,
392and process scope PMCs are detached from their targets using
393function
394.Fn pmc_detach .
395.It
396Before the process exits, its may release its PMCs using function
397.Fn pmc_release .
398Any configured log file may be closed using function
399.Fn pmc_configure_logfile .
400.El
401.Sh EVENT SPECIFIERS
402Event specifiers are strings comprising of an event name, followed by
403optional parameters modifying the semantics of the hardware event
404being probed.
405Event names are PMC architecture dependent, but the PMC library defines
406machine independent aliases for commonly used events.
407.Ss Event Name Aliases
408Event name aliases are CPU architecture independent names for commonly
409used events.
410The following aliases are known to this version of the
411.Nm pmc
412library:
413.Bl -tag -width indent
414.It Li branches
415Measure the number of branches retired.
416.It Li branch-mispredicts
417Measure the number of retired branches that were mispredicted.
418.It Li cycles
419Measure processor cycles.
420This event is implemented using the processor's Time Stamp Counter
421register.
422.It Li dc-misses
423Measure the number of data cache misses.
424.It Li ic-misses
425Measure the number of instruction cache misses.
426.It Li instructions
427Measure the number of instructions retired.
428.It Li interrupts
429Measure the number of interrupts seen.
430.It Li unhalted-cycles
431Measure the number of cycles the processor is not in a halted
432or sleep state.
433.El
434.Ss Time Stamp Counter (TSC)
435The timestamp counter is a monotonically non-decreasing counter that
436counts processor cycles.
437.Pp
438In the i386 architecture, this counter may
439be selected by requesting an event with event specifier
440.Dq Li tsc .
441The
442.Dq Li tsc
443event does not support any further qualifiers.
444It can only be allocated in system-wide counting mode,
445and is a read-only counter.
446Multiple processes are allowed to allocate the TSC.
447Once allocated, it may be read using the
448.Fn pmc_read
449function, or by using the RDTSC instruction.
450.Ss AMD (K7) PMCs
451These PMCs are present in the
452.Tn "AMD Athlon"
453series of CPUs and are documented in:
454.Rs
455.%B "AMD Athlon Processor x86 Code Optimization Guide"
456.%N "Publication No. 22007"
457.%D "February 2002"
458.%Q "Advanced Micro Devices, Inc."
459.Re
460.Pp
461Event specifiers for AMD K7 PMCs can have the following optional
462qualifiers:
463.Bl -tag -width indent
464.It Li count= Ns Ar value
465Configure the counter to increment only if the number of configured
466events measured in a cycle is greater than or equal to
467.Ar value .
468.It Li edge
469Configure the counter to only count negated-to-asserted transitions
470of the conditions expressed by the other qualifiers.
471In other words, the counter will increment only once whenever a given
472condition becomes true, irrespective of the number of clocks during
473which the condition remains true.
474.It Li inv
475Invert the sense of comparision when the
476.Dq Li count
477qualifier is present, making the counter to increment when the
478number of events per cycle is less than the value specified by
479the
480.Dq Li count
481qualifier.
482.It Li os
483Configure the PMC to count events happening at privilege level 0.
484.It Li unitmask= Ns Ar mask
485This qualifier is used to further qualify a select few events,
486.Dq Li k7-dc-refills-from-l2 ,
487.Dq Li k7-dc-refills-from-system
488and
489.Dq Li k7-dc-writebacks .
490Here
491.Ar mask
492is a string of the following characters optionally separated by
493.Ql +
494characters:
495.Pp
496.Bl -tag -width indent -compact
497.It Li m
498Count operations for lines in the
499.Dq Modified
500state.
501.It Li o
502Count operations for lines in the
503.Dq Owner
504state.
505.It Li e
506Count operations for lines in the
507.Dq Exclusive
508state.
509.It Li s
510Count operations for lines in the
511.Dq Shared
512state.
513.It Li i
514Count operations for lines in the
515.Dq Invalid
516state.
517.El
518.Pp
519If no
520.Dq Li unitmask
521qualifier is specified, the default is to count events for caches
522lines in any of the above states.
523.It Li usr
524Configure the PMC to count events occurring at privilege levels 1, 2
525or 3.
526.El
527.Pp
528If neither of the
529.Dq Li os
530or
531.Dq Li usr
532qualifiers were specified, the default is to enable both.
533.Pp
534The event specifiers supported on AMD K7 PMCs are:
535.Bl -tag -width indent
536.It Li k7-dc-accesses
537Count data cache accesses.
538.It Li k7-dc-misses
539Count data cache misses.
540.It Li k7-dc-refills-from-l2 Op Li ,unitmask= Ns Ar mask
541Count data cache refills from L2 cache.
542This event may be further qualified using the
543.Dq Li unitmask
544qualifier.
545.It Li k7-dc-refills-from-system Op Li ,unitmask= Ns Ar mask
546Count data cache refills from system memory.
547This event may be further qualified using the
548.Dq Li unitmask
549qualifier.
550.It Li k7-dc-writebacks Op Li ,unitmask= Ns Ar mask
551Count data cache writebacks.
552This event may be further qualified using the
553.Dq Li unitmask
554qualifier.
555.It Li k7-l1-dtlb-miss-and-l2-dtlb-hits
556Count L1 DTLB misses and L2 DTLB hits.
557.It Li k7-l1-and-l2-dtlb-misses
558Count L1 and L2 DTLB misses.
559.It Li k7-misaligned-references
560Count misaligned data references.
561.It Li k7-ic-fetches
562Count instruction cache fetches.
563.It Li k7-ic-misses
564Count instruction cache misses.
565.It Li k7-l1-itlb-misses
566Count L1 ITLB misses that are L2 ITLB hits.
567.It Li k7-l1-l2-itlb-misses
568Count L1 (and L2) ITLB misses.
569.It Li k7-retired-instructions
570Count all retired instructions.
571.It Li k7-retired-ops
572Count retired ops.
573.It Li k7-retired-branches
574Count all retired branches (conditional, unconditional, exceptions
575and interrupts).
576.It Li k7-retired-branches-mispredicted
577Count all misprediced retired branches.
578.It Li k7-retired-taken-branches
579Count retired taken branches.
580.It Li k7-retired-taken-branches-mispredicted
581Count mispredicted taken branches that were retired.
582.It Li k7-retired-far-control-transfers
583Count retired far control transfers.
584.It Li k7-retired-resync-branches
585Count retired resync branches (non control transfer branches).
586.It Li k7-interrupts-masked-cycles
587Count the number of cycles when the processor's
588.Va IF
589flag was zero.
590.It Li k7-interrupts-masked-while-pending-cycles
591Count the number of cycles interrupts were masked while pending due
592to the processor's
593.Va IF
594flag being zero.
595.It Li k7-hardware-interrupts
596Count the number of taken hardware interrupts.
597.El
598.Ss AMD (K8) PMCs
599These PMCs are present in the
600.Tn "AMD Athlon64"
601and
602.Tn "AMD Opteron"
603series of CPUs.
604They are documented in:
605.Rs
606.%B "BIOS and Kernel Developer's Guide for the AMD Athlon(tm) 64 and AMD Opteron Processors"
607.%N "Publication No. 26094"
608.%D "April 2004"
609.%Q "Advanced Micro Devices, Inc."
610.Re
611.Pp
612Event specifiers for AMD K8 PMCs can have the following optional
613qualifiers:
614.Bl -tag -width indent
615.It Li count= Ns Ar value
616Configure the counter to increment only if the number of configured
617events measured in a cycle is greater than or equal to
618.Ar value .
619.It Li edge
620Configure the counter to only count negated-to-asserted transitions
621of the conditions expressed by the other fields.
622In other words, the counter will increment only once whenever a given
623condition becomes true, irrespective of the number of clocks during
624which the condition remains true.
625.It Li inv
626Invert the sense of comparision when the
627.Dq Li count
628qualifier is present, making the counter to increment when the
629number of events per cycle is less than the value specified by
630the
631.Dq Li count
632qualifier.
633.It Li mask= Ns Ar qualifier
634Many event specifiers for AMD K8 PMCs need to be additionally
635qualified using a mask qualifier.
636These additional qualifiers are event-specific and are documented
637along with their associated event specifiers below.
638.It Li os
639Configure the PMC to count events happening at privilege level 0.
640.It Li usr
641Configure the PMC to count events occurring at privilege levels 1, 2
642or 3.
643.El
644.Pp
645If neither of the
646.Dq Li os
647or
648.Dq Li usr
649qualifiers were specified, the default is to enable both.
650.Pp
651The event specifiers supported on AMD K8 PMCs are:
652.Bl -tag -width indent
653.It Li k8-bu-cpu-clk-unhalted
654Count the number of clock cycles when the CPU is not in the HLT or
655STPCLK states.
656.It Li k8-bu-fill-request-l2-miss Op Li ,mask= Ns Ar qualifier
657Count fill requests that missed in the L2 cache.
658This event may be further qualified using
659.Ar qualifier ,
660which is a
661.Ql +
662separated set of the following keywords:
663.Pp
664.Bl -tag -width indent -compact
665.It Li dc-fill
666Count data cache fill requests.
667.It Li ic-fill
668Count instruction cache fill requests.
669.It Li tlb-reload
670Count TLB reloads.
671.El
672.Pp
673The default is to count all types of requests.
674.It Li k8-bu-internal-l2-request Op Li ,mask= Ns Ar qualifier
675Count internally generated requests to the L2 cache.
676This event may be further qualified using
677.Ar qualifier ,
678which is a
679.Ql +
680separated set of the following keywords:
681.Pp
682.Bl -tag -width indent -compact
683.It Li cancelled
684Count cancelled requests.
685.It Li dc-fill
686Count data cache fill requests.
687.It Li ic-fill
688Count instruction cache fill requests.
689.It Li tag-snoop
690Count tag snoop requests.
691.It Li tlb-reload
692Count TLB reloads.
693.El
694.Pp
695The default is to count all types of requests.
696.It Li k8-dc-access
697Count data cache accesses including microcode scratchpad accesses.
698.It Li k8-dc-copyback Op Li ,mask= Ns Ar qualifier
699Count data cache copyback operations.
700This event may be further qualified using
701.Ar qualifier ,
702which is a
703.Ql +
704separated set of the following keywords:
705.Pp
706.Bl -tag -width indent -compact
707.It Li exclusive
708Count operations for lines in the
709.Dq exclusive
710state.
711.It Li invalid
712Count operations for lines in the
713.Dq invalid
714state.
715.It Li modified
716Count operations for lines in the
717.Dq modified
718state.
719.It Li owner
720Count operations for lines in the
721.Dq owner
722state.
723.It Li shared
724Count operations for lines in the
725.Dq shared
726state.
727.El
728.Pp
729The default is to count operations for lines in all the
730above states.
731.It Li k8-dc-dcache-accesses-by-locks Op Li ,mask= Ns Ar qualifier
732Count data cache accesses by lock instructions.
733This event is only available on processors of revision C or later
734vintage.
735This event may be further qualified using
736.Ar qualifier ,
737which is a
738.Ql +
739separated set of the following keywords:
740.Pp
741.Bl -tag -width indent -compact
742.It Li accesses
743Count data cache accesses by lock instructions.
744.It Li misses
745Count data cache misses by lock instructions.
746.El
747.Pp
748The default is to count all accesses.
749.It Li k8-dc-dispatched-prefetch-instructions Op Li ,mask= Ns Ar qualifier
750Count the number of dispatched prefetch instructions.
751This event may be further qualified using
752.Ar qualifier ,
753which is a
754.Ql +
755separated set of the following keywords:
756.Pp
757.Bl -tag -width indent -compact
758.It Li load
759Count load operations.
760.It Li nta
761Count non-temporal operations.
762.It Li store
763Count store operations.
764.El
765.Pp
766The default is to count all operations.
767.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-hit
768Count L1 DTLB misses that are L2 DTLB hits.
769.It Li k8-dc-l1-dtlb-miss-and-l2-dtlb-miss
770Count L1 DTLB misses that are also misses in the L2 DTLB.
771.It Li k8-dc-microarchitectural-early-cancel-of-an-access
772Count microarchitectural early cancels of data cache accesses.
773.It Li k8-dc-microarchitectural-late-cancel-of-an-access
774Count microarchitectural late cancels of data cache accesses.
775.It Li k8-dc-misaligned-data-reference
776Count misaligned data references.
777.It Li k8-dc-miss
778Count data cache misses.
779.It Li k8-dc-one-bit-ecc-error Op Li ,mask= Ns Ar qualifier
780Count one bit ECC errors found by the scrubber.
781This event may be further qualified using
782.Ar qualifier ,
783which is a
784.Ql +
785separated set of the following keywords:
786.Pp
787.Bl -tag -width indent -compact
788.It Li scrubber
789Count scrubber detected errors.
790.It Li piggyback
791Count piggyback scrubber errors.
792.El
793.Pp
794The default is to count both kinds of errors.
795.It Li k8-dc-refill-from-l2 Op Li ,mask= Ns Ar qualifier
796Count data cache refills from L2 cache.
797This event may be further qualified using
798.Ar qualifier ,
799which is a
800.Ql +
801separated set of the following keywords:
802.Pp
803.Bl -tag -width indent -compact
804.It Li exclusive
805Count operations for lines in the
806.Dq exclusive
807state.
808.It Li invalid
809Count operations for lines in the
810.Dq invalid
811state.
812.It Li modified
813Count operations for lines in the
814.Dq modified
815state.
816.It Li owner
817Count operations for lines in the
818.Dq owner
819state.
820.It Li shared
821Count operations for lines in the
822.Dq shared
823state.
824.El
825.Pp
826The default is to count operations for lines in all the
827above states.
828.It Li k8-dc-refill-from-system Op Li ,mask= Ns Ar qualifier
829Count data cache refills from system memory.
830This event may be further qualified using
831.Ar qualifier ,
832which is a
833.Ql +
834separated set of the following keywords:
835.Pp
836.Bl -tag -width indent -compact
837.It Li exclusive
838Count operations for lines in the
839.Dq exclusive
840state.
841.It Li invalid
842Count operations for lines in the
843.Dq invalid
844state.
845.It Li modified
846Count operations for lines in the
847.Dq modified
848state.
849.It Li owner
850Count operations for lines in the
851.Dq owner
852state.
853.It Li shared
854Count operations for lines in the
855.Dq shared
856state.
857.El
858.Pp
859The default is to count operations for lines in all the
860above states.
861.It Li k8-fp-dispatched-fpu-ops Op Li ,mask= Ns Ar qualifier
862Count the number of dispatched FPU ops.
863This event is supported in revision B and later CPUs.
864This event may be further qualified using
865.Ar qualifier ,
866which is a
867.Ql +
868separated set of the following keywords:
869.Pp
870.Bl -tag -width indent -compact
871.It Li add-pipe-excluding-junk-ops
872Count add pipe ops excluding junk ops.
873.It Li add-pipe-junk-ops
874Count junk ops in the add pipe.
875.It Li multiply-pipe-excluding-junk-ops
876Count multiply pipe ops excluding junk ops.
877.It Li multiply-pipe-junk-ops
878Count junk ops in the multiply pipe.
879.It Li store-pipe-excluding-junk-ops
880Count store pipe ops excluding junk ops
881.It Li store-pipe-junk-ops
882Count junk ops in the store pipe.
883.El
884.Pp
885The default is to count all types of ops.
886.It Li k8-fp-cycles-with-no-fpu-ops-retired
887Count cycles when no FPU ops were retired.
888This event is supported in revision B and later CPUs.
889.It Li k8-fp-dispatched-fpu-fast-flag-ops
890Count dispatched FPU ops that use the fast flag interface.
891This event is supported in revision B and later CPUs.
892.It Li k8-fr-decoder-empty
893Count cycles when there was nothing to dispatch (i.e., the decoder
894was empty).
895.It Li k8-fr-dispatch-stalls
896Count all dispatch stalls.
897.It Li k8-fr-dispatch-stall-for-segment-load
898Count dispatch stalls for segment loads.
899.It Li k8-fr-dispatch-stall-for-serialization
900Count dispatch stalls for serialization.
901.It Li k8-fr-dispatch-stall-from-branch-abort-to-retire
902Count dispatch stalls from branch abort to retiral.
903.It Li k8-fr-dispatch-stall-when-fpu-is-full
904Count dispatch stalls when the FPU is full.
905.It Li k8-fr-dispatch-stall-when-ls-is-full
906Count dispatch stalls when the load/store unit is full.
907.It Li k8-fr-dispatch-stall-when-reorder-buffer-is-full
908Count dispatch stalls when the reorder buffer is full.
909.It Li k8-fr-dispatch-stall-when-reservation-stations-are-full
910Count dispatch stalls when reservation stations are full.
911.It Li k8-fr-dispatch-stall-when-waiting-for-all-to-be-quiet
912Count dispatch stalls when waiting for all to be quiet.
913.\" XXX What does "waiting for all to be quiet" mean?
914.It Li k8-fr-dispatch-stall-when-waiting-far-xfer-or-resync-branch-pending
915Count dispatch stalls when a far control transfer or a resync branch
916is pending.
917.It Li k8-fr-fpu-exceptions Op Li ,mask= Ns Ar qualifier
918Count FPU exceptions.
919This event is supported in revision B and later CPUs.
920This event may be further qualified using
921.Ar qualifier ,
922which is a
923.Ql +
924separated set of the following keywords:
925.Pp
926.Bl -tag -width indent -compact
927.It Li sse-and-x87-microtraps
928Count SSE and x87 microtraps.
929.It Li sse-reclass-microfaults
930Count SSE reclass microfaults
931.It Li sse-retype-microfaults
932Count SSE retype microfaults
933.It Li x87-reclass-microfaults
934Count x87 reclass microfaults.
935.El
936.Pp
937The default is to count all types of exceptions.
938.It Li k8-fr-interrupts-masked-cycles
939Count cycles when interrupts were masked (by CPU RFLAGS field IF was zero).
940.It Li k8-fr-interrupts-masked-while-pending-cycles
941Count cycles while interrupts were masked while pending (i.e., cycles
942when INTR was asserted while CPU RFLAGS field IF was zero).
943.It Li k8-fr-number-of-breakpoints-for-dr0
944Count the number of breakpoints for DR0.
945.It Li k8-fr-number-of-breakpoints-for-dr1
946Count the number of breakpoints for DR1.
947.It Li k8-fr-number-of-breakpoints-for-dr2
948Count the number of breakpoints for DR2.
949.It Li k8-fr-number-of-breakpoints-for-dr3
950Count the number of breakpoints for DR3.
951.It Li k8-fr-retired-branches
952Count retired branches including exceptions and interrupts.
953.It Li k8-fr-retired-branches-mispredicted
954Count mispredicted retired branches.
955.It Li k8-fr-retired-far-control-transfers
956Count retired far control transfers (which are always mispredicted).
957.It Li k8-fr-retired-fastpath-double-op-instructions Op Li ,mask= Ns Ar qualifier
958Count retired fastpath double op instructions.
959This event is supported in revision B and later CPUs.
960This event may be further qualified using
961.Ar qualifier ,
962which is a
963.Ql +
964separated set of the following keywords:
965.Pp
966.Bl -tag -width indent -compact
967.It Li low-op-pos-0
968Count instructions with the low op in position 0.
969.It Li low-op-pos-1
970Count instructions with the low op in position 1.
971.It Li low-op-pos-2
972Count instructions with the low op in position 2.
973.El
974.Pp
975The default is to count all types of instructions.
976.It Li k8-fr-retired-fpu-instructions Op Li ,mask= Ns Ar qualifier
977Count retired FPU instructions.
978This event is supported in revision B and later CPUs.
979This event may be further qualified using
980.Ar qualifier ,
981which is a
982.Ql +
983separated set of the following keywords:
984.Pp
985.Bl -tag -width indent -compact
986.It Li mmx-3dnow
987Count MMX and 3DNow!\& instructions.
988.It Li packed-sse-sse2
989Count packed SSE and SSE2 instructions.
990.It Li scalar-sse-sse2
991Count scalar SSE and SSE2 instructions
992.It Li x87
993Count x87 instructions.
994.El
995.Pp
996The default is to count all types of instructions.
997.It Li k8-fr-retired-near-returns
998Count retired near returns.
999.It Li k8-fr-retired-near-returns-mispredicted
1000Count mispredicted near returns.
1001.It Li k8-fr-retired-resyncs
1002Count retired resyncs (non-control transfer branches).
1003.It Li k8-fr-retired-taken-hardware-interrupts
1004Count retired taken hardware interrupts.
1005.It Li k8-fr-retired-taken-branches
1006Count retired taken branches.
1007.It Li k8-fr-retired-taken-branches-mispredicted
1008Count retired taken branches that were mispredicted.
1009.It Li k8-fr-retired-taken-branches-mispredicted-by-addr-miscompare
1010Count retired taken branches that were mispredicted only due to an
1011address miscompare.
1012.It Li k8-fr-retired-uops
1013Count retired uops.
1014.It Li k8-fr-retired-x86-instructions
1015Count retired x86 instructions including exceptions and interrupts.
1016.It Li k8-ic-fetch
1017Count instruction cache fetches.
1018.It Li k8-ic-instruction-fetch-stall
1019Count cycles in stalls due to instruction fetch.
1020.It Li k8-ic-l1-itlb-miss-and-l2-itlb-hit
1021Count L1 ITLB misses that are L2 ITLB hits.
1022.It Li k8-ic-l1-itlb-miss-and-l2-itlb-miss
1023Count ITLB misses that miss in both L1 and L2 ITLBs.
1024.It Li k8-ic-microarchitectural-resync-by-snoop
1025Count microarchitectural resyncs caused by snoops.
1026.It Li k8-ic-miss
1027Count instruction cache misses.
1028.It Li k8-ic-refill-from-l2
1029Count instruction cache refills from L2 cache.
1030.It Li k8-ic-refill-from-system
1031Count instruction cache refills from system memory.
1032.It Li k8-ic-return-stack-hits
1033Count hits to the return stack.
1034.It Li k8-ic-return-stack-overflow
1035Count overflows of the return stack.
1036.It Li k8-ls-buffer2-full
1037Count load/store buffer2 full events.
1038.It Li k8-ls-locked-operation Op Li ,mask= Ns Ar qualifier
1039Count locked operations.
1040For revision C and later CPUs, the following qualifiers are supported:
1041.Pp
1042.Bl -tag -width indent -compact
1043.It Li cycles-in-request
1044Count the number of cycles in the lock request/grant stage.
1045.It Li cycles-to-complete
1046Count the number of cycles a lock takes to complete once it is
1047non-speculative and is the older load/store operation.
1048.It Li locked-instructions
1049Count the number of lock instructions executed.
1050.El
1051.Pp
1052The default is to count the number of lock instructions executed.
1053.It Li k8-ls-microarchitectural-late-cancel
1054Count microarchitectural late cancels of operations in the load/store
1055unit.
1056.It Li k8-ls-microarchitectural-resync-by-self-modifying-code
1057Count microarchitectural resyncs caused by self-modifying code.
1058.It Li k8-ls-microarchitectural-resync-by-snoop
1059Count microarchitectural resyncs caused by snoops.
1060.It Li k8-ls-retired-cflush-instructions
1061Count retired CFLUSH instructions.
1062.It Li k8-ls-retired-cpuid-instructions
1063Count retired CPUID instructions.
1064.It Li k8-ls-segment-register-load Op Li ,mask= Ns Ar qualifier
1065Count segment register loads.
1066This event may be further qualified using
1067.Ar qualifier ,
1068which is a
1069.Ql +
1070separated set of the following keywords:
1071.Bl -tag -width indent -compact
1072.It Li cs
1073Count CS register loads.
1074.It Li ds
1075Count DS register loads.
1076.It Li es
1077Count ES register loads.
1078.It Li fs
1079Count FS register loads.
1080.It Li gs
1081Count GS register loads.
1082.\" .It Li hs
1083.\" Count HS register loads.
1084.\" XXX "HS" register?
1085.It Li ss
1086Count SS register loads.
1087.El
1088.Pp
1089The default is to count all types of loads.
1090.It Li k8-nb-memory-controller-bypass-saturation Op Li ,mask= Ns Ar qualifier
1091Count memory controller bypass counter saturation events.
1092This event may be further qualified using
1093.Ar qualifier ,
1094which is a
1095.Ql +
1096separated set of the following keywords:
1097.Pp
1098.Bl -tag -width indent -compact
1099.It Li dram-controller-interface-bypass
1100Count DRAM controller interface bypass.
1101.It Li dram-controller-queue-bypass
1102Count DRAM controller queue bypass.
1103.It Li memory-controller-hi-pri-bypass
1104Count memory controller high priority bypasses.
1105.It Li memory-controller-lo-pri-bypass
1106Count memory controller low priority bypasses.
1107.El
1108.Pp
1109.It Li k8-nb-memory-controller-dram-slots-missed
1110Count memory controller DRAM command slots missed (in MemClks).
1111.It Li k8-nb-memory-controller-page-access-event Op Li ,mask= Ns Ar qualifier
1112Count memory controller page access events.
1113This event may be further qualified using
1114.Ar qualifier ,
1115which is a
1116.Ql +
1117separated set of the following keywords:
1118.Pp
1119.Bl -tag -width indent -compact
1120.It Li page-conflict
1121Count page conflicts.
1122.It Li page-hit
1123Count page hits.
1124.It Li page-miss
1125Count page misses.
1126.El
1127.Pp
1128The default is to count all types of events.
1129.It Li k8-nb-memory-controller-page-table-overflow
1130Count memory control page table overflow events.
1131.It Li k8-nb-probe-result Op Li ,mask= Ns Ar qualifier
1132Count probe events.
1133This event may be further qualified using
1134.Ar qualifier ,
1135which is a
1136.Ql +
1137separated set of the following keywords:
1138.Pp
1139.Bl -tag -width indent -compact
1140.It Li probe-hit
1141Count all probe hits.
1142.It Li probe-hit-dirty-no-memory-cancel
1143Count probe hits without memory cancels.
1144.It Li probe-hit-dirty-with-memory-cancel
1145Count probe hits with memory cancels.
1146.It Li probe-miss
1147Count probe misses.
1148.El
1149.It Li k8-nb-sized-commands Op Li ,mask= Ns Ar qualifier
1150Count sized commands issued.
1151This event may be further qualified using
1152.Ar qualifier ,
1153which is a
1154.Ql +
1155separated set of the following keywords:
1156.Pp
1157.Bl -tag -width indent -compact
1158.It Li nonpostwrszbyte
1159.It Li nonpostwrszdword
1160.It Li postwrszbyte
1161.It Li postwrszdword
1162.It Li rdszbyte
1163.It Li rdszdword
1164.It Li rdmodwr
1165.El
1166.Pp
1167The default is to count all types of commands.
1168.It Li k8-nb-memory-controller-turnaround Op Li ,mask= Ns Ar qualifier
1169Count memory control turnaround events.
1170This event may be further qualified using
1171.Ar qualifier ,
1172which is a
1173.Ql +
1174separated set of the following keywords:
1175.Pp
1176.Bl -tag -width indent -compact
1177.\" XXX doc is unclear whether these are cycle counts or event counts
1178.It Li dimm-turnaround
1179Count DIMM turnarounds.
1180.It Li read-to-write-turnaround
1181Count read to write turnarounds.
1182.It Li write-to-read-turnaround
1183Count write to read turnarounds.
1184.El
1185.Pp
1186The default is to count all types of events.
1187.It Li k8-nb-ht-bus0-bandwidth Op Li ,mask= Ns Ar qualifier
1188.It Li k8-nb-ht-bus1-bandwidth Op Li ,mask= Ns Ar qualifier
1189.It Li k8-nb-ht-bus2-bandwidth Op Li ,mask= Ns Ar qualifier
1190Count events on the HyperTransport(tm) buses.
1191These events may be further qualified using
1192.Ar qualifier ,
1193which is a
1194.Ql +
1195separated set of the following keywords:
1196.Pp
1197.Bl -tag -width indent -compact
1198.It Li buffer-release
1199Count buffer release messages sent.
1200.It Li command
1201Count command messages sent.
1202.It Li data
1203Count data messages sent.
1204.It Li nop
1205Count nop messages sent.
1206.El
1207.Pp
1208The default is to count all types of messages.
1209.El
1210.Ss Intel Pentium PMCS
1211Intel Pentium PMCs are present in Intel
1212.Tn Pentium
1213and
1214.Tn "Pentium MMX"
1215processors.
1216.Pp
1217These CPUs have two counters.
1218Some events may only be used on specific counters and some events
1219are defined only on processors supporting the MMX instruction set.
1220.Pp
1221These PMCs are documented in
1222.Rs
1223.%B "Intel 64 and IA-32 Intel(R) Architectures Software Developer's Manual"
1224.%T "Volume 3B: System Programming Guide, Part 2"
1225.%N "Order Number 253669-024US"
1226.%D "August 2007"
1227.%Q "Intel Corporation"
1228.Re
1229.Pp
1230Event specifiers for Intel Pentium PMCs can have the following common
1231qualifiers:
1232.Bl -tag -width indent
1233.It Li duration
1234Count duration (in clocks) of events.
1235The default is to count events.
1236.It Li os
1237Measure events at privilege levels 0, 1 and 2.
1238.It Li overflow
1239Assert the external processor pin associated with a counter on counter
1240overflow.
1241.It Li usr
1242Measure events at privilege level 3.
1243.El
1244.Pp
1245Note that these PMCs do not have the ability to interrupt the CPU.
1246.Pp
1247The event specifiers supported by Intel Pentium PMCs are:
1248.Bl -tag -width indent
1249.It Li p5-any-segment-register-loaded
1250The number of writes to any segment register, including the LDTR,
1251GDTR, TR and IDTR.
1252Far control transfers and task switches that involve privilege
1253level changes will count this event twice.
1254.It Li p5-bank-conflicts
1255The number of actual bank conflicts.
1256.It Li p5-branches
1257The number of taken and not taken branches including branches, jumps, calls,
1258software interrupts and interrupt returns.
1259.It Li p5-breakpoint-match-on-dr0-register
1260The number of matches on the DR0 breakpoint register.
1261.It Li p5-breakpoint-match-on-dr1-register
1262The number of matches on the DR1 breakpoint register.
1263.It Li p5-breakpoint-match-on-dr2-register
1264The number of matches on the DR2 breakpoint register.
1265.It Li p5-breakpoint-match-on-dr3-register
1266The number of matches on the DR3 breakpoint register.
1267.It Li p5-btb-false-entries
1268.Pq Tn Pentium MMX
1269The number of false entries in the BTB.
1270This event is only allocated on counter 0.
1271.It Li p5-btb-hits
1272The number of branches executed that hit in the branch table buffer.
1273.It Li p5-btb-miss-prediction-on-not-taken-branch
1274.Pq Tn Pentium MMX
1275The number of times the BTB predicted a not-taken branch as taken.
1276This event is only allocated on counter 1.
1277.It Li p5-bus-cycle-duration
1278The number of cycles while a bus cycle was in progress.
1279.It Li p5-bus-ownership-latency
1280.Pq Tn Pentium MMX
1281The time from bus ownership being requested to ownership being granted.
1282This event is only allocated on counter 0.
1283.It Li p5-bus-ownership-transfers
1284.Pq Tn Pentium MMX
1285The number of bus ownership transfers.
1286This event is only allocated on counter 1.
1287.It Li p5-bus-utilization-due-to-processor-activity
1288.Pq Tn Pentium MMX
1289The number of clocks the bus is busy due to the processor's own
1290activity.
1291This event is only allocated on counter 0.
1292.It Li p5-cache-line-sharing
1293.Pq Tn Pentium MMX
1294The number of shared data lines in L1 cache.
1295This event is only allocated on counter 1.
1296.It Li p5-cache-m-state-line-sharing
1297.Pq Tn Pentium MMX
1298The number of hits to an M- state line due to a memory access by
1299another processor.
1300This event is only allocated on counter 0.
1301.It Li p5-code-cache-miss
1302The number of instruction reads that miss the internal code cache.
1303Both cacheable and uncacheable misses are counted.
1304.It Li p5-code-read
1305The number of instruction reads to both cacheable and uncacheable regions.
1306.It Li p5-code-tlb-miss
1307The number of instruction reads that miss the instruction TLB.
1308Both cacheable and uncacheable unreads are counted.
1309.It Li p5-d1-starvation-and-fifo-is-empty
1310.Pq Tn Pentium MMX
1311The number of times the D1 stage cannot issue any instructions because
1312the FIFO was empty.
1313This event is only allocated on counter 0.
1314.It Li p5-d1-starvation-and-only-one-instruction-in-fifo
1315.Pq Tn Pentium MMX
1316The number of times the D1 stage could issue only one instruction
1317because the FIFO had one instruction ready.
1318This event is only allocated on counter 1.
1319.It Li p5-data-cache-lines-written-back
1320The number of data cache lines that are written back, including
1321those caused by internal and external snoops.
1322.It Li p5-data-cache-tlb-miss-stall-duration
1323.Pq Tn Pentium MMX
1324The number of clocks the pipeline is stalled due to a data cache
1325TLB miss.
1326This event is only allocated on counter 1.
1327.It Li p5-data-read
1328The number of memory data reads, counting internal data cache hits and
1329misses.
1330I/O and data memory accesses due to TLB miss processing are
1331not included.
1332Split cycle reads are counted individually.
1333.It Li p5-data-read-miss
1334The number of memory read accesses that miss the data cache, counting
1335both cacheable and uncacheable accesses.
1336Data accesses that are part of TLB miss processing are not included.
1337I/O accesses are not included.
1338.It Li p5-data-read-miss-or-write-miss
1339The number of data reads and writes that miss the internal data cache,
1340counting uncacheable accesses.
1341Data accesses due to TLB miss processing are not counted.
1342.It Li p5-data-read-or-write
1343The number of data reads and writes including internal data cache hits
1344and misses.
1345Data reads due to TLB miss processing are not counted.
1346.It Li p5-data-tlb-miss
1347The number of misses to the data cache translation lookaside buffer.
1348.It Li p5-data-write
1349The number of memory data writes, counting internal data cache hits
1350and misses.
1351I/O is not included and split cycle writes are counted individually.
1352.It Li p5-data-write-miss
1353The number of memory write accesses that miss the data cache, counting
1354both cacheable and uncacheable accesses.
1355I/O accesses are not counted.
1356.It Li p5-emms-instructions-executed
1357.Pq Tn Pentium MMX
1358The number of EMMS instructions executed.
1359This event is only allocated on counter 0.
1360.It Li p5-external-data-cache-snoop-hits
1361The number of external snoops to the data cache that hit a valid line,
1362or the data line fill buffer, or one of the write back buffers.
1363.It Li p5-external-snoops
1364The number of external snoop requests accepted, including snoops that
1365hit in the code cache, the data cache and that hit in neither.
1366.It Li p5-floating-point-stalls-duration
1367.Pq Tn Pentium MMX
1368The number of cycles the pipeline is stalled due to a floating point
1369freeze.
1370This event is only allocated on counter 0.
1371.It Li p5-flops
1372The number of floating point adds, subtracts, multiples, divides and
1373square roots.
1374Transcendental instructions trigger this event multiple times.
1375Instructions generating divide-by-zero, negative square root, special
1376operand and stack exceptions are not counted.
1377Integer multiply instructions that use the x87 FPU are counted.
1378.It Li p5-full-write-buffer-stall-duration-while-executing-mmx-instructions
1379.Pq Tn Pentium MMX
1380The number of clocks the pipeline has stalled due to full write
1381buffers when executing MMX instructions.
1382This event is only allocated on counter 0.
1383.It Li p5-hardware-interrupts
1384The number of taken INTR and NMI interrupts.
1385.It Li p5-instructions-executed
1386The number of instructions executed.
1387Repeat prefixed instructions are counted only once.
1388The HLT instruction is counted only once, irrespective of the number
1389of cycles spent in the halted state.
1390All hardware and software exceptions are counted as instructions, and
1391fault handler invocations are also counted as instructions.
1392.It Li p5-instructions-executed-v-pipe
1393The number of instructions that executed in the V pipe.
1394.It Li p5-io-read-or-write-cycle
1395The number of bus cycles directed to I/O space.
1396.It Li p5-locked-bus-cycle
1397The number of locked bus cycles that occur on account of the lock
1398prefixes, LOCK instructions, page table updates and descriptor table
1399updates.
1400.It Li p5-memory-accesses-in-both-pipes
1401The number of data memory reads or writes that are paired in both pipes.
1402.It Li p5-misaligned-data-memory-on-mmx-instructions
1403.Pq Tn Pentium MMX
1404The number of misaligned data memory references when executing MMX
1405instructions.
1406This event is only allocated on counter 0.
1407.It Li p5-misaligned-data-memory-or-io-references
1408The number of memory or I/O reads or writes that are not aligned on
1409natural boundaries.
14102- and 4-byte accesses are counted as misaligned if they cross a 4
1411byte boundary.
1412.It Li p5-mispredicted-or-unpredicted-returns
1413.Pq Tn Pentium MMX
1414The number of returns predicted incorrectly or not at all, only
1415counting RET instructions.
1416This event is only allocated on counter 0.
1417.It Li p5-mmx-instruction-data-read-misses
1418.Pq Tn Pentium MMX
1419The number of MMX instruction data read misses.
1420This event is only allocated on counter 1.
1421.It Li p5-mmx-instruction-data-reads
1422.Pq Tn Pentium MMX
1423The number of MMX instruction data reads.
1424This event is only allocated on counter 0.
1425.It Li p5-mmx-instruction-data-write-misses
1426.Pq Tn Pentium MMX
1427The number of data write misses caused by MMX instructions.
1428This event is only allocated on counter 1.
1429.It Li p5-mmx-instruction-data-writes
1430.Pq Tn Pentium MMX
1431The number of data writes caused by MMX instructions.
1432This event is only allocated on counter 0.
1433.It Li p5-mmx-instructions-executed-u-pipe
1434.Pq Tn Pentium MMX
1435The number of MMX instructions executed in the U pipe.
1436This event is only allocated on counter 0.
1437.It Li p5-mmx-instructions-executed-v-pipe
1438The number of MMX instructions executed in the V pipe.
1439This event is only allocated on counter 1.
1440.It Li p5-mmx-multiply-unit-interlock
1441.Pq Tn Pentium MMX
1442The number of clocks the pipeline is stalled because the destination
1443of a prior MMX multiply is not ready.
1444This event is only allocated on counter 0.
1445.It Li p5-movd-movq-store-stall-due-to-previous-mmx-operation
1446.Pq Tn Pentium MMX
1447The number of clocks a MOVD/MOVQ instruction stalled in the D2 stage
1448of the pipeline due to a previous MMX instruction.
1449This event is only allocated on counter 1.
1450.It Li p5-noncacheable-memory-reads
1451The number of bus cycles for non-cacheable instruction or data reads,
1452including cycles caused by TLB misses.
1453.It Li p5-number-of-cycles-not-in-halt-state
1454.Pq Tn Pentium MMX
1455The number of cycles the processor is not idle due to the HLT
1456instruction.
1457This event is only allocated on counter 0.
1458.It Li p5-pipeline-agi-stalls
1459The number of address generation interlock stalls.
1460An AGI that occurs in both the U and V pipelines in the same clock
1461signals the event twice.
1462.It Li p5-pipeline-flushes
1463The number of pipeline flushes that occur.
1464Pipeline flushes are caused by branch mispredicts, exceptions,
1465interrupts, some segment register loads, and BTB misses.
1466Prefetch queue flushes due to serializing instructions are not
1467counted.
1468.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions
1469.Pq Tn Pentium MMX
1470The number of pipeline flushes due to wrong branch predictions
1471resolved in either the E- or WB- stage of the pipeline.
1472This event is only allocated on counter 0.
1473.It Li p5-pipeline-flushes-due-to-wrong-branch-predictions-resolved-in-wb-stage
1474.Pq Tn Pentium MMX
1475The number of pipeline flushes due to wrong branch predictions
1476resolved in the stage of the pipeline.
1477This event is only allocated on counter 1.
1478.It Li p5-pipeline-stall-for-mmx-instruction-data-memory-reads
1479.Pq Tn Pentium MMX
1480The number of clocks during pipeline stalls caused by waiting MMX data
1481memory reads.
1482This event is only allocated on counter 0.
1483.It Li p5-predicted-returns
1484.Pq Tn Pentium MMX
1485The number of predicted returns, whether correct or incorrect.
1486This counter only counts RET instructions.
1487This event is only allocated on counter 1.
1488.It Li p5-returns
1489.Pq Tn Pentium MMX
1490The number of RET instructions executed.
1491This event is only allocated on counter 0.
1492.It Li p5-saturating-mmx-instructions-executed
1493.Pq Tn Pentium MMX
1494The number of saturating MMX instructions executed.
1495This event is only allocated on counter 0.
1496.It Li p5-saturations-performed
1497.Pq Tn Pentium MMX
1498The number of saturating MMX instructions executed when at least one
1499of its results were actually saturated.
1500This event is only allocated on counter 1.
1501.It Li p5-stall-on-mmx-instruction-write-to-e-o-m-state-line
1502.Pq Tn Pentium MMX
1503The number of clocks during stalls on MMX instructions writing to
1504E- or M- state cache lines.
1505This event is only allocated on counter 1.
1506.It Li p5-stall-on-write-to-an-e-or-m-state-line
1507The number of stalls on a write to an exclusive or modified data cache
1508line.
1509.It Li p5-taken-branch-or-btb-hit
1510The number of events that may cause a hit in the BTB, namely either
1511taken branches or BTB hits.
1512.It Li p5-taken-branches
1513.Pq Tn Pentium MMX
1514The number of taken branches.
1515This event is only allocated on counter 1.
1516.It Li p5-transitions-between-mmx-and-fp-instructions
1517.Pq Tn Pentium MMX
1518The number of transitions between MMX and floating-point instructions
1519and vice-versa.
1520This event is only allocated on counter 1.
1521.It Li p5-waiting-for-data-memory-read-stall-duration
1522The number of clocks the pipeline was stalled waiting for data
1523memory reads.
1524Data TLB misses processing is included in this count.
1525.It Li p5-write-buffer-full-stall-duration
1526The number of clocks while the pipeline was stalled due to write
1527buffers being full.
1528.It Li p5-write-hit-to-m-or-e-state-lines
1529The number of writes that hit exclusive or modified lines in the data
1530cache.
1531.It Li p5-writes-to-noncacheable-memory
1532.Pq Tn Pentium MMX
1533The number of writes to non-cacheable memory, including write cycles
1534caused by TLB misses and I/O writes.
1535This event is only allocated on counter 1.
1536.El
1537.Ss Intel P6 PMCS
1538Intel P6 PMCs are present in Intel
1539.Tn "Pentium Pro" ,
1540.Tn "Pentium II" ,
1541.Tn Celeron ,
1542.Tn "Pentium III"
1543and
1544.Tn "Pentium M"
1545processors.
1546.Pp
1547These CPUs have two counters.
1548Some events may only be used on specific counters and some events are
1549defined only on specific processor models.
1550.Pp
1551These PMCs are documented in
1552.Rs
1553.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
1554.%T "Volume 3: System Programming Guide"
1555.%N "Order Number 245472-012"
1556.%D 2003
1557.%Q "Intel Corporation"
1558.Re
1559.Pp
1560Some of these events are affected by processor errata described in
1561.Rs
1562.%B "Intel(R) Pentium(R) III Processor Specification Update"
1563.%N "Document Number: 244453-054"
1564.%D "April 2005"
1565.%Q "Intel Corporation"
1566.Re
1567.Pp
1568Event specifiers for Intel P6 PMCs can have the following common
1569qualifiers:
1570.Bl -tag -width indent
1571.It Li cmask= Ns Ar value
1572Configure the PMC to increment only if the number of configured
1573events measured in a cycle is greater than or equal to
1574.Ar value .
1575.It Li edge
1576Configure the PMC to count the number of deasserted to asserted
1577transitions of the conditions expressed by the other qualifiers.
1578If specified, the counter will increment only once whenever a
1579condition becomes true, irrespective of the number of clocks during
1580which the condition remains true.
1581.It Li inv
1582Invert the sense of comparision when the
1583.Dq Li cmask
1584qualifier is present, making the counter increment when the number of
1585events per cycle is less than the value specified by the
1586.Dq Li cmask
1587qualifier.
1588.It Li os
1589Configure the PMC to count events happening at processor privilege
1590level 0.
1591.It Li umask= Ns Ar value
1592This qualifier is used to further qualify the event selected (see
1593below).
1594.It Li usr
1595Configure the PMC to count events occurring at privilege levels 1, 2
1596or 3.
1597.El
1598.Pp
1599If neither of the
1600.Dq Li os
1601or
1602.Dq Li usr
1603qualifiers are specified, the default is to enable both.
1604.Pp
1605The event specifiers supported by Intel P6 PMCs are:
1606.Bl -tag -width indent
1607.It Li p6-baclears
1608Count the number of times a static branch prediction was made by the
1609branch decoder because the BTB did not have a prediction.
1610.It Li p6-br-bac-missp-exec
1611.Pq Tn "Pentium M"
1612Count the number of branch instructions executed that where
1613mispredicted at the Front End (BAC).
1614.It Li p6-br-bogus
1615Count the number of bogus branches.
1616.It Li p6-br-call-exec
1617.Pq Tn "Pentium M"
1618Count the number of call instructions executed.
1619.It Li p6-br-call-missp-exec
1620.Pq Tn "Pentium M"
1621Count the number of call instructions executed that were mispredicted.
1622.It Li p6-br-cnd-exec
1623.Pq Tn "Pentium M"
1624Count the number of conditional branch instructions executed.
1625.It Li p6-br-cnd-missp-exec
1626.Pq Tn "Pentium M"
1627Count the number of conditional branch instructions executed that were
1628mispredicted.
1629.It Li p6-br-ind-call-exec
1630.Pq Tn "Pentium M"
1631Count the number of indirect call instructions executed.
1632.It Li p6-br-ind-exec
1633.Pq Tn "Pentium M"
1634Count the number of indirect branch instructions executed.
1635.It Li p6-br-ind-missp-exec
1636.Pq Tn "Pentium M"
1637Count the number of indirect branch instructions executed that were
1638mispredicted.
1639.It Li p6-br-inst-decoded
1640Count the number of branch instructions decoded.
1641.It Li p6-br-inst-exec
1642.Pq Tn "Pentium M"
1643Count the number of branch instructions executed but necessarily retired.
1644.It Li p6-br-inst-retired
1645Count the number of branch instructions retired.
1646.It Li p6-br-miss-pred-retired
1647Count the number of mispredicted branch instructions retired.
1648.It Li p6-br-miss-pred-taken-ret
1649Count the number of taken mispredicted branches retired.
1650.It Li p6-br-missp-exec
1651.Pq Tn "Pentium M"
1652Count the number of branch instructions executed that were
1653mispredicted at execution.
1654.It Li p6-br-ret-bac-missp-exec
1655.Pq Tn "Pentium M"
1656Count the number of return instructions executed that were
1657mispredicted at the Front End (BAC).
1658.It Li p6-br-ret-exec
1659.Pq Tn "Pentium M"
1660Count the number of return instructions executed.
1661.It Li p6-br-ret-missp-exec
1662.Pq Tn "Pentium M"
1663Count the number of return instructions executed that were
1664mispredicted at execution.
1665.It Li p6-br-taken-retired
1666Count the number of taken branches retired.
1667.It Li p6-btb-misses
1668Count the number of branches for which the BTB did not produce a
1669prediction.
1670.It Li p6-bus-bnr-drv
1671Count the number of bus clock cycles during which this processor is
1672driving the BNR# pin.
1673.It Li p6-bus-data-rcv
1674Count the number of bus clock cycles during which this processor is
1675receiving data.
1676.It Li p6-bus-drdy-clocks Op Li ,umask= Ns Ar qualifier
1677Count the number of clocks during which DRDY# is asserted.
1678An additional qualifier may be specified, and comprises one of the
1679following keywords:
1680.Pp
1681.Bl -tag -width indent -compact
1682.It Li any
1683Count transactions generated by any agent on the bus.
1684.It Li self
1685Count transactions generated by this processor.
1686.El
1687.Pp
1688The default is to count operations generated by this processor.
1689.It Li p6-bus-hit-drv
1690Count the number of bus clock cycles during which this processor is
1691driving the HIT# pin.
1692.It Li p6-bus-hitm-drv
1693Count the number of bus clock cycles during which this processor is
1694driving the HITM# pin.
1695.It Li p6-bus-lock-clocks Op Li ,umask= Ns Ar qualifier
1696Count the number of clocks during with LOCK# is asserted on the
1697external system bus.
1698An additional qualifier may be specified and comprises one of the following
1699keywords:
1700.Pp
1701.Bl -tag -width indent -compact
1702.It Li any
1703Count transactions generated by any agent on the bus.
1704.It Li self
1705Count transactions generated by this processor.
1706.El
1707.Pp
1708The default is to count operations generated by this processor.
1709.It Li p6-bus-req-outstanding
1710Count the number of bus requests outstanding in any given cycle.
1711.It Li p6-bus-snoop-stall
1712Count the number of clock cycles during which the bus is snoop stalled.
1713.It Li p6-bus-tran-any Op Li ,umask= Ns Ar qualifier
1714Count the number of completed bus transactions of any kind.
1715An additional qualifier may be specified and comprises one of the following
1716keywords:
1717.Pp
1718.Bl -tag -width indent -compact
1719.It Li any
1720Count transactions generated by any agent on the bus.
1721.It Li self
1722Count transactions generated by this processor.
1723.El
1724.Pp
1725The default is to count operations generated by this processor.
1726.It Li p6-bus-tran-brd Op Li ,umask= Ns Ar qualifier
1727Count the number of burst read transactions.
1728An additional qualifier may be specified and comprises one of the following
1729keywords:
1730.Pp
1731.Bl -tag -width indent -compact
1732.It Li any
1733Count transactions generated by any agent on the bus.
1734.It Li self
1735Count transactions generated by this processor.
1736.El
1737.Pp
1738The default is to count operations generated by this processor.
1739.It Li p6-bus-tran-burst Op Li ,umask= Ns Ar qualifier
1740Count the number of completed burst transactions.
1741An additional qualifier may be specified and comprises one of the following
1742keywords:
1743.Pp
1744.Bl -tag -width indent -compact
1745.It Li any
1746Count transactions generated by any agent on the bus.
1747.It Li self
1748Count transactions generated by this processor.
1749.El
1750.Pp
1751The default is to count operations generated by this processor.
1752.It Li p6-bus-tran-def Op Li ,umask= Ns Ar qualifier
1753Count the number of completed deferred transactions.
1754An additional qualifier may be specified and comprises one of the following
1755keywords:
1756.Pp
1757.Bl -tag -width indent -compact
1758.It Li any
1759Count transactions generated by any agent on the bus.
1760.It Li self
1761Count transactions generated by this processor.
1762.El
1763.Pp
1764The default is to count operations generated by this processor.
1765.It Li p6-bus-tran-ifetch Op Li ,umask= Ns Ar qualifier
1766Count the number of completed instruction fetch transactions.
1767An additional qualifier may be specified and comprises one of the following
1768keywords:
1769.Pp
1770.Bl -tag -width indent -compact
1771.It Li any
1772Count transactions generated by any agent on the bus.
1773.It Li self
1774Count transactions generated by this processor.
1775.El
1776.Pp
1777The default is to count operations generated by this processor.
1778.It Li p6-bus-tran-inval Op Li ,umask= Ns Ar qualifier
1779Count the number of completed invalidate transactions.
1780An additional qualifier may be specified and comprises one of the following
1781keywords:
1782.Pp
1783.Bl -tag -width indent -compact
1784.It Li any
1785Count transactions generated by any agent on the bus.
1786.It Li self
1787Count transactions generated by this processor.
1788.El
1789.Pp
1790The default is to count operations generated by this processor.
1791.It Li p6-bus-tran-mem Op Li ,umask= Ns Ar qualifier
1792Count the number of completed memory transactions.
1793An additional qualifier may be specified and comprises one of the following
1794keywords:
1795.Pp
1796.Bl -tag -width indent -compact
1797.It Li any
1798Count transactions generated by any agent on the bus.
1799.It Li self
1800Count transactions generated by this processor.
1801.El
1802.Pp
1803The default is to count operations generated by this processor.
1804.It Li p6-bus-tran-pwr Op Li ,umask= Ns Ar qualifier
1805Count the number of completed partial write transactions.
1806An additional qualifier may be specified and comprises one of the following
1807keywords:
1808.Pp
1809.Bl -tag -width indent -compact
1810.It Li any
1811Count transactions generated by any agent on the bus.
1812.It Li self
1813Count transactions generated by this processor.
1814.El
1815.Pp
1816The default is to count operations generated by this processor.
1817.It Li p6-bus-tran-rfo Op Li ,umask= Ns Ar qualifier
1818Count the number of completed read-for-ownership transactions.
1819An additional qualifier may be specified and comprises one of the following
1820keywords:
1821.Pp
1822.Bl -tag -width indent -compact
1823.It Li any
1824Count transactions generated by any agent on the bus.
1825.It Li self
1826Count transactions generated by this processor.
1827.El
1828.Pp
1829The default is to count operations generated by this processor.
1830.It Li p6-bus-trans-io Op Li ,umask= Ns Ar qualifier
1831Count the number of completed I/O transactions.
1832An additional qualifier may be specified and comprises one of the following
1833keywords:
1834.Pp
1835.Bl -tag -width indent -compact
1836.It Li any
1837Count transactions generated by any agent on the bus.
1838.It Li self
1839Count transactions generated by this processor.
1840.El
1841.Pp
1842The default is to count operations generated by this processor.
1843.It Li p6-bus-trans-p Op Li ,umask= Ns Ar qualifier
1844Count the number of completed partial transactions.
1845An additional qualifier may be specified and comprises one of the following
1846keywords:
1847.Pp
1848.Bl -tag -width indent -compact
1849.It Li any
1850Count transactions generated by any agent on the bus.
1851.It Li self
1852Count transactions generated by this processor.
1853.El
1854.Pp
1855The default is to count operations generated by this processor.
1856.It Li p6-bus-trans-wb Op Li ,umask= Ns Ar qualifier
1857Count the number of completed write-back transactions.
1858An additional qualifier may be specified and comprises one of the following
1859keywords:
1860.Pp
1861.Bl -tag -width indent -compact
1862.It Li any
1863Count transactions generated by any agent on the bus.
1864.It Li self
1865Count transactions generated by this processor.
1866.El
1867.Pp
1868The default is to count operations generated by this processor.
1869.It Li p6-cpu-clk-unhalted
1870Count the number of cycles during with the processor was not halted.
1871.Pp
1872.Pq Tn "Pentium M"
1873Count the number of cycles during with the processor was not halted
1874and not in a thermal trip.
1875.It Li p6-cycles-div-busy
1876Count the number of cycles during which the divider is busy and cannot
1877accept new divides.
1878This event is only allocated on counter 0.
1879.It Li p6-cycles-in-pending-and-masked
1880Count the number of processor cycles for which interrupts were
1881disabled and interrupts were pending.
1882.It Li p6-cycles-int-masked
1883Count the number of processor cycles for which interrupts were
1884disabled.
1885.It Li p6-data-mem-refs
1886Count all loads and all stores using any memory type, including
1887internal retries.
1888Each part of a split store is counted separately.
1889.It Li p6-dcu-lines-in
1890Count the total lines allocated in the data cache unit.
1891.It Li p6-dcu-m-lines-in
1892Count the number of M state lines allocated in the data cache unit.
1893.It Li p6-dcu-m-lines-out
1894Count the number of M state lines evicted from the data cache unit.
1895.It Li p6-dcu-miss-outstanding
1896Count the weighted number of cycles while a data cache unit miss is
1897outstanding, incremented by the number of outstanding cache misses at
1898any time.
1899.It Li p6-div
1900Count the number of integer and floating-point divides including
1901speculative divides.
1902This event is only allocated on counter 1.
1903.It Li p6-emon-esp-uops
1904.Pq Tn "Pentium M"
1905Count the total number of micro-ops.
1906.It Li p6-emon-est-trans Op Li ,umask= Ns Ar qualifier
1907.Pq Tn "Pentium M"
1908Count the number of
1909.Tn "Enhanced Intel SpeedStep"
1910transitions.
1911An additional qualifier may be specified, and can be one of the
1912following keywords:
1913.Pp
1914.Bl -tag -width indent -compact
1915.It Li all
1916Count all transitions.
1917.It Li freq
1918Count only frequency transitions.
1919.El
1920.Pp
1921The default is to count all transitions.
1922.It Li p6-emon-fused-uops-ret Op Li ,umask= Ns Ar qualifier
1923.Pq Tn "Pentium M"
1924Count the number of retired fused micro-ops.
1925An additional qualifier may be specified, and may be one of the
1926following keywords:
1927.Pp
1928.Bl -tag -width indent -compact
1929.It Li all
1930Count all fused micro-ops.
1931.It Li loadop
1932Count only load and op micro-ops.
1933.It Li stdsta
1934Count only STD/STA micro-ops.
1935.El
1936.Pp
1937The default is to count all fused micro-ops.
1938.It Li p6-emon-kni-comp-inst-ret
1939.Pq Tn "Pentium III"
1940Count the number of SSE computational instructions retired.
1941An additional qualifier may be specified, and comprises one of the
1942following keywords:
1943.Pp
1944.Bl -tag -width indent -compact
1945.It Li packed-and-scalar
1946Count packed and scalar operations.
1947.It Li scalar
1948Count scalar operations only.
1949.El
1950.Pp
1951The default is to count packed and scalar operations.
1952.It Li p6-emon-kni-inst-retired Op Li ,umask= Ns Ar qualifier
1953.Pq Tn "Pentium III"
1954Count the number of SSE instructions retired.
1955An additional qualifier may be specified, and comprises one of the
1956following keywords:
1957.Pp
1958.Bl -tag -width indent -compact
1959.It Li packed-and-scalar
1960Count packed and scalar operations.
1961.It Li scalar
1962Count scalar operations only.
1963.El
1964.Pp
1965The default is to count packed and scalar operations.
1966.It Li p6-emon-kni-pref-dispatched Op Li ,umask= Ns Ar qualifier
1967.Pq Tn "Pentium III"
1968Count the number of SSE prefetch or weakly ordered instructions
1969dispatched (including speculative prefetches).
1970An additional qualifier may be specified, and comprises one of the
1971following keywords:
1972.Pp
1973.Bl -tag -width indent -compact
1974.It Li nta
1975Count non-temporal prefetches.
1976.It Li t1
1977Count prefetches to L1.
1978.It Li t2
1979Count prefetches to L2.
1980.It Li wos
1981Count weakly ordered stores.
1982.El
1983.Pp
1984The default is to count non-temporal prefetches.
1985.It Li p6-emon-kni-pref-miss Op Li ,umask= Ns Ar qualifier
1986.Pq Tn "Pentium III"
1987Count the number of prefetch or weakly ordered instructions that miss
1988all caches.
1989An additional qualifier may be specified, and comprises one of the
1990following keywords:
1991.Pp
1992.Bl -tag -width indent -compact
1993.It Li nta
1994Count non-temporal prefetches.
1995.It Li t1
1996Count prefetches to L1.
1997.It Li t2
1998Count prefetches to L2.
1999.It Li wos
2000Count weakly ordered stores.
2001.El
2002.Pp
2003The default is to count non-temporal prefetches.
2004.It Li p6-emon-pref-rqsts-dn
2005.Pq Tn "Pentium M"
2006Count the number of downward prefetches issued.
2007.It Li p6-emon-pref-rqsts-up
2008.Pq Tn "Pentium M"
2009Count the number of upward prefetches issued.
2010.It Li p6-emon-simd-instr-retired
2011.Pq Tn "Pentium M"
2012Count the number of retired
2013.Tn MMX
2014instructions.
2015.It Li p6-emon-sse-sse2-comp-inst-retired Op Li ,umask= Ns Ar qualifier
2016.Pq Tn "Pentium M"
2017Count the number of computational SSE instructions retired.
2018An additional qualifier may be specified and can be one of the
2019following keywords:
2020.Pp
2021.Bl -tag -width indent -compact
2022.It Li sse-packed-single
2023Count SSE packed-single instructions.
2024.It Li sse-scalar-single
2025Count SSE scalar-single instructions.
2026.It Li sse2-packed-double
2027Count SSE2 packed-double instructions.
2028.It Li sse2-scalar-double
2029Count SSE2 scalar-double instructions.
2030.El
2031.Pp
2032The default is to count SSE packed-single instructions.
2033.It Li p6-emon-sse-sse2-inst-retired Op Li ,umask= Ns Ar qualifer
2034.Pp
2035.Pq Tn "Pentium M"
2036Count the number of SSE instructions retired.
2037An additional qualifier can be specified, and can be one of the
2038following keywords:
2039.Pp
2040.Bl -tag -width indent -compact
2041.It Li sse-packed-single
2042Count SSE packed-single instructions.
2043.It Li sse-packed-single-scalar-single
2044Count SSE packed-single and scalar-single instructions.
2045.It Li sse2-packed-double
2046Count SSE2 packed-double instructions.
2047.It Li sse2-scalar-double
2048Count SSE2 scalar-double instructions.
2049.El
2050.Pp
2051The default is to count SSE packed-single instructions.
2052.It Li p6-emon-synch-uops
2053.Pq Tn "Pentium M"
2054Count the number of sync micro-ops.
2055.It Li p6-emon-thermal-trip
2056.Pq Tn "Pentium M"
2057Count the duration or occurrences of thermal trips.
2058Use the
2059.Dq Li edge
2060qualifier to count occurrences of thermal trips.
2061.It Li p6-emon-unfusion
2062.Pq Tn "Pentium M"
2063Count the number of unfusion events in the reorder buffer.
2064.It Li p6-flops
2065Count the number of computational floating point operations retired.
2066This event is only allocated on counter 0.
2067.It Li p6-fp-assist
2068Count the number of floating point exceptions handled by microcode.
2069This event is only allocated on counter 1.
2070.It Li p6-fp-comps-ops-exe
2071Count the number of computation floating point operations executed.
2072This event is only allocated on counter 0.
2073.It Li p6-fp-mmx-trans Op Li ,umask= Ns Ar qualifier
2074.Pq Tn "Pentium II" , Tn "Pentium III"
2075Count the number of transitions between MMX and floating-point
2076instructions.
2077An additional qualifier may be specified, and comprises one of the
2078following keywords:
2079.Pp
2080.Bl -tag -width indent -compact
2081.It Li mmxtofp
2082Count transitions from MMX instructions to floating-point instructions.
2083.It Li fptommx
2084Count transitions from floating-point instructions to MMX instructions.
2085.El
2086.Pp
2087The default is to count MMX to floating-point transitions.
2088.It Li p6-hw-int-rx
2089Count the number of hardware interrupts received.
2090.It Li p6-ifu-fetch
2091Count the number of instruction fetches, both cacheable and non-cacheable.
2092.It Li p6-ifu-fetch-miss
2093Count the number of instruction fetch misses (i.e., those that produce
2094memory accesses).
2095.It Li p6-ifu-mem-stall
2096Count the number of cycles instruction fetch is stalled for any reason.
2097.It Li p6-ild-stall
2098Count the number of cycles the instruction length decoder is stalled.
2099.It Li p6-inst-decoded
2100Count the number of instructions decoded.
2101.It Li p6-inst-retired
2102Count the number of instructions retired.
2103.It Li p6-itlb-miss
2104Count the number of instruction TLB misses.
2105.It Li p6-l2-ads
2106Count the number of L2 address strobes.
2107.It Li p6-l2-dbus-busy
2108Count the number of cycles during which the L2 cache data bus was busy.
2109.It Li p6-l2-dbus-busy-rd
2110Count the number of cycles during which the L2 cache data bus was busy
2111transferring read data from L2 to the processor.
2112.It Li p6-l2-ifetch Op Li ,umask= Ns Ar qualifier
2113Count the number of L2 instruction fetches.
2114An additional qualifier may be specified and comprises a list of the following
2115keywords separated by
2116.Ql +
2117characters:
2118.Pp
2119.Bl -tag -width indent -compact
2120.It Li e
2121Count operations affecting E (exclusive) state lines.
2122.It Li i
2123Count operations affecting I (invalid) state lines.
2124.It Li m
2125Count operations affecting M (modified) state lines.
2126.It Li s
2127Count operations affecting S (shared) state lines.
2128.El
2129.Pp
2130The default is to count operations affecting all (MESI) state lines.
2131.It Li p6-l2-ld Op Li ,umask= Ns Ar qualifier
2132Count the number of L2 data loads.
2133An additional qualifier may be specified and comprises a list of the following
2134keywords separated by
2135.Ql +
2136characters:
2137.Pp
2138.Bl -tag -width indent -compact
2139.It Li both
2140.Pq Tn "Pentium M"
2141Count both hardware-prefetched lines and non-hardware-prefetched lines.
2142.It Li e
2143Count operations affecting E (exclusive) state lines.
2144.It Li hw
2145.Pq Tn "Pentium M"
2146Count hardware-prefetched lines only.
2147.It Li i
2148Count operations affecting I (invalid) state lines.
2149.It Li m
2150Count operations affecting M (modified) state lines.
2151.It Li nonhw
2152.Pq Tn "Pentium M"
2153Exclude hardware-prefetched lines.
2154.It Li s
2155Count operations affecting S (shared) state lines.
2156.El
2157.Pp
2158The default on processors other than
2159.Tn "Pentium M"
2160processors is to count operations affecting all (MESI) state lines.
2161The default on
2162.Tn "Pentium M"
2163processors is to count both hardware-prefetched and
2164non-hardware-prefetch operations on all (MESI) state lines.
2165.Pq Errata
2166This event is affected by processor errata E53.
2167.It Li p6-l2-lines-in Op Li ,umask= Ns Ar qualifier
2168Count the number of L2 lines allocated.
2169An additional qualifier may be specified and comprises a list of the following
2170keywords separated by
2171.Ql +
2172characters:
2173.Pp
2174.Bl -tag -width indent -compact
2175.It Li both
2176.Pq Tn "Pentium M"
2177Count both hardware-prefetched lines and non-hardware-prefetched lines.
2178.It Li e
2179Count operations affecting E (exclusive) state lines.
2180.It Li hw
2181.Pq Tn "Pentium M"
2182Count hardware-prefetched lines only.
2183.It Li i
2184Count operations affecting I (invalid) state lines.
2185.It Li m
2186Count operations affecting M (modified) state lines.
2187.It Li nonhw
2188.Pq Tn "Pentium M"
2189Exclude hardware-prefetched lines.
2190.It Li s
2191Count operations affecting S (shared) state lines.
2192.El
2193.Pp
2194The default on processors other than
2195.Tn "Pentium M"
2196processors is to count operations affecting all (MESI) state lines.
2197The default on
2198.Tn "Pentium M"
2199processors is to count both hardware-prefetched and
2200non-hardware-prefetch operations on all (MESI) state lines.
2201.Pq Errata
2202This event is affected by processor errata E45.
2203.It Li p6-l2-lines-out Op Li ,umask= Ns Ar qualifier
2204Count the number of L2 lines evicted.
2205An additional qualifier may be specified and comprises a list of the following
2206keywords separated by
2207.Ql +
2208characters:
2209.Pp
2210.Bl -tag -width indent -compact
2211.It Li both
2212.Pq Tn "Pentium M"
2213Count both hardware-prefetched lines and non-hardware-prefetched lines.
2214.It Li e
2215Count operations affecting E (exclusive) state lines.
2216.It Li hw
2217.Pq Tn "Pentium M"
2218Count hardware-prefetched lines only.
2219.It Li i
2220Count operations affecting I (invalid) state lines.
2221.It Li m
2222Count operations affecting M (modified) state lines.
2223.It Li nonhw
2224.Pq Tn "Pentium M" only
2225Exclude hardware-prefetched lines.
2226.It Li s
2227Count operations affecting S (shared) state lines.
2228.El
2229.Pp
2230The default on processors other than
2231.Tn "Pentium M"
2232processors is to count operations affecting all (MESI) state lines.
2233The default on
2234.Tn "Pentium M"
2235processors is to count both hardware-prefetched and
2236non-hardware-prefetch operations on all (MESI) state lines.
2237.Pq Errata
2238This event is affected by processor errata E45.
2239.It Li p6-l2-m-lines-inm
2240Count the number of modified lines allocated in L2 cache.
2241.It Li p6-l2-m-lines-outm Op Li ,umask= Ns Ar qualifier
2242Count the number of L2 M-state lines evicted.
2243.Pp
2244.Pq Tn "Pentium M"
2245On these processors an additional qualifier may be specified and
2246comprises a list of the following keywords separated by
2247.Ql +
2248characters:
2249.Pp
2250.Bl -tag -width indent -compact
2251.It Li both
2252Count both hardware-prefetched lines and non-hardware-prefetched lines.
2253.It Li hw
2254Count hardware-prefetched lines only.
2255.It Li nonhw
2256Exclude hardware-prefetched lines.
2257.El
2258.Pp
2259The default is to count both hardware-prefetched and
2260non-hardware-prefetch operations.
2261.Pq Errata
2262This event is affected by processor errata E53.
2263.It Li p6-l2-rqsts Op Li ,umask= Ns Ar qualifier
2264Count the total number of L2 requests.
2265An additional qualifier may be specified and comprises a list of the following
2266keywords separated by
2267.Ql +
2268characters:
2269.Pp
2270.Bl -tag -width indent -compact
2271.It Li e
2272Count operations affecting E (exclusive) state lines.
2273.It Li i
2274Count operations affecting I (invalid) state lines.
2275.It Li m
2276Count operations affecting M (modified) state lines.
2277.It Li s
2278Count operations affecting S (shared) state lines.
2279.El
2280.Pp
2281The default is to count operations affecting all (MESI) state lines.
2282.It Li p6-l2-st
2283Count the number of L2 data stores.
2284An additional qualifier may be specified and comprises a list of the following
2285keywords separated by
2286.Ql +
2287characters:
2288.Pp
2289.Bl -tag -width indent -compact
2290.It Li e
2291Count operations affecting E (exclusive) state lines.
2292.It Li i
2293Count operations affecting I (invalid) state lines.
2294.It Li m
2295Count operations affecting M (modified) state lines.
2296.It Li s
2297Count operations affecting S (shared) state lines.
2298.El
2299.Pp
2300The default is to count operations affecting all (MESI) state lines.
2301.It Li p6-ld-blocks
2302Count the number of load operations delayed due to store buffer blocks.
2303.It Li p6-misalign-mem-ref
2304Count the number of misaligned data memory references (crossing a 64
2305bit boundary).
2306.It Li p6-mmx-assist
2307.Pq Tn "Pentium II" , Tn "Pentium III"
2308Count the number of MMX assists executed.
2309.It Li p6-mmx-instr-exec
2310.Pq Tn Celeron , Tn "Pentium II"
2311Count the number of MMX instructions executed, except MOVQ and MOVD
2312stores from register to memory.
2313.It Li p6-mmx-instr-ret
2314.Pq Tn "Pentium II"
2315Count the number of MMX instructions retired.
2316.It Li p6-mmx-instr-type-exec Op Li ,umask= Ns Ar qualifier
2317.Pq Tn "Pentium II" , Tn "Pentium III"
2318Count the number of MMX instructions executed.
2319An additional qualifier may be specified and comprises a list of
2320the following keywords separated by
2321.Ql +
2322characters:
2323.Pp
2324.Bl -tag -width indent -compact
2325.It Li pack
2326Count MMX pack operation instructions.
2327.It Li packed-arithmetic
2328Count MMX packed arithmetic instructions.
2329.It Li packed-logical
2330Count MMX packed logical instructions.
2331.It Li packed-multiply
2332Count MMX packed multiply instructions.
2333.It Li packed-shift
2334Count MMX packed shift instructions.
2335.It Li unpack
2336Count MMX unpack operation instructions.
2337.El
2338.Pp
2339The default is to count all operations.
2340.It Li p6-mmx-sat-instr-exec
2341.Pq Tn "Pentium II" , Tn "Pentium III"
2342Count the number of MMX saturating instructions executed.
2343.It Li p6-mmx-uops-exec
2344.Pq Tn "Pentium II" , Tn "Pentium III"
2345Count the number of MMX micro-ops executed.
2346.It Li p6-mul
2347Count the number of integer and floating-point multiplies, including
2348speculative multiplies.
2349This event is only allocated on counter 1.
2350.It Li p6-partial-rat-stalls
2351Count the number of cycles or events for partial stalls.
2352.It Li p6-resource-stalls
2353Count the number of cycles there was a resource related stall of any kind.
2354.It Li p6-ret-seg-renames
2355.Pq Tn "Pentium II" , Tn "Pentium III"
2356Count the number of segment register rename events retired.
2357.It Li p6-sb-drains
2358Count the number of cycles the store buffer is draining.
2359.It Li p6-seg-reg-renames Op Li ,umask= Ns Ar qualifier
2360.Pq Tn "Pentium II" , Tn "Pentium III"
2361Count the number of segment register renames.
2362An additional qualifier may be specified, and comprises a list of the
2363following keywords separated by
2364.Ql +
2365characters:
2366.Pp
2367.Bl -tag -width indent -compact
2368.It Li ds
2369Count renames for segment register DS.
2370.It Li es
2371Count renames for segment register ES.
2372.It Li fs
2373Count renames for segment register FS.
2374.It Li gs
2375Count renames for segment register GS.
2376.El
2377.Pp
2378The default is to count operations affecting all segment registers.
2379.It Li p6-seg-rename-stalls
2380.Pq Tn "Pentium II" , Tn "Pentium III"
2381Count the number of segment register renaming stalls.
2382An additional qualifier may be specified, and comprises a list of the
2383following keywords separated by
2384.Ql +
2385characters:
2386.Pp
2387.Bl -tag -width indent -compact
2388.It Li ds
2389Count stalls for segment register DS.
2390.It Li es
2391Count stalls for segment register ES.
2392.It Li fs
2393Count stalls for segment register FS.
2394.It Li gs
2395Count stalls for segment register GS.
2396.El
2397.Pp
2398The default is to count operations affecting all the segment registers.
2399.It Li p6-segment-reg-loads
2400Count the number of segment register loads.
2401.It Li p6-uops-retired
2402Count the number of micro-ops retired.
2403.El
2404.Ss Intel P4 PMCS
2405Intel P4 PMCs are present in Intel
2406.Tn "Pentium 4"
2407and
2408.Tn Xeon
2409processors.
2410These PMCs are documented in
2411.Rs
2412.%B "IA-32 Intel(R) Architecture Software Developer's Manual"
2413.%T "Volume 3: System Programming Guide"
2414.%N "Order Number 245472-012"
2415.%D 2003
2416.%Q "Intel Corporation"
2417.Re
2418Further information about using these PMCs may be found in
2419.Rs
2420.%B "IA-32 Intel(R) Architecture Optimization Guide"
2421.%D 2003
2422.%N "Order Number 248966-009"
2423.%Q "Intel Corporation"
2424.Re
2425Some of these events are affected by processor errata described in
2426.Rs
2427.%B "Intel(R) Pentium(R) 4 Processor Specification Update"
2428.%N "Document Number: 249199-059"
2429.%D "April 2005"
2430.%Q "Intel Corporation"
2431.Re
2432.Pp
2433Event specifiers for Intel P4 PMCs can have the following common
2434qualifiers:
2435.Bl -tag -width indent
2436.It Li active= Ns Ar choice
2437(On P4 HTT CPUs) Filter event counting based on which logical
2438processors are active.
2439The allowed values of
2440.Ar choice
2441are:
2442.Pp
2443.Bl -tag -width indent -compact
2444.It Li any
2445Count when either logical processor is active.
2446.It Li both
2447Count when both logical processors are active.
2448.It Li none
2449Count only when neither logical processor is active.
2450.It Li single
2451Count only when one logical processor is active.
2452.El
2453.Pp
2454The default is
2455.Dq Li both .
2456.It Li cascade
2457Configure the PMC to cascade onto its partner.
2458See
2459.Sx "Cascading P4 PMCs"
2460below for more information.
2461.It Li edge
2462Configure the counter to count false to true transitions of the threshold
2463comparision output.
2464This qualifier only takes effect if a threshold qualifier has also been
2465specified.
2466.It Li complement
2467Configure the counter to increment only when the event count seen is
2468less than the threshold qualifier value specified.
2469.It Li mask= Ns Ar qualifier
2470Many event specifiers for Intel P4 PMCs need to be additionally
2471qualified using a mask qualifier.
2472The allowed syntax for these qualifiers is event specific and is
2473described along with the events.
2474.It Li os
2475Configure the PMC to count when the CPL of the processor is 0.
2476.It Li precise
2477Select precise event based sampling.
2478Precise sampling is supported by the hardware for a limited set of
2479events.
2480.It Li tag= Ns Ar value
2481Configure the PMC to tag the internal uop selected by the other
2482fields in this event specifier with value
2483.Ar value .
2484This feature is used when cascading PMCs.
2485.It Li threshold= Ns Ar value
2486Configure the PMC to increment only when the event counts seen are
2487greater than the specified threshold value
2488.Ar value .
2489.It Li usr
2490Configure the PMC to count when the CPL of the processor is 1, 2 or 3.
2491.El
2492.Pp
2493If neither of the
2494.Dq Li os
2495or
2496.Dq Li usr
2497qualifiers are specified, the default is to enable both.
2498.Pp
2499On Intel Pentium 4 processors with HTT, events are
2500divided into two classes:
2501.Pp
2502.Bl -tag -width indent -compact
2503.It "TS Events"
2504are those where hardware can differentiate between events
2505generated on one logical processor from those generated on the
2506other.
2507.It "TI Events"
2508are those where hardware cannot differentiate between events
2509generated by multiple logical processors in a package.
2510.El
2511.Pp
2512Only TS events are allowed for use with process-mode PMCs on
2513Pentium-4/HTT CPUs.
2514.Pp
2515The event specifiers supported by Intel P4 PMCs are:
2516.Pp
2517.Bl -tag -width indent
2518.It Li p4-128bit-mmx-uop Op Li ,mask= Ns Ar flags
2519.Pq "TI event"
2520Count integer SIMD SSE2 instructions that operate on 128 bit SIMD
2521operands.
2522Qualifier
2523.Ar flags
2524can take the following value (which is also the default):
2525.Pp
2526.Bl -tag -width indent -compact
2527.It Li all
2528Count all uops operating on 128 bit SIMD integer operands in memory or
2529XMM register.
2530.El
2531.Pp
2532If an instruction contains more than one 128 bit MMX uop, then each
2533uop will be counted.
2534.It Li p4-64bit-mmx-uop Op Li ,mask= Ns Ar flags
2535.Pq "TI event"
2536Count MMX instructions that operate on 64 bit SIMD operands.
2537Qualifier
2538.Ar flags
2539can take the following value (which is also the default):
2540.Pp
2541.Bl -tag -width indent -compact
2542.It Li all
2543Count all uops operating on 64 bit SIMD integer operands in memory or
2544in MMX registers.
2545.El
2546.Pp
2547If an instruction contains more than one 64 bit MMX uop, then each
2548uop will be counted.
2549.It Li p4-b2b-cycles
2550.Pq "TI event"
2551Count back-to-back bus cycles.
2552Further documentation for this event is unavailable.
2553.It Li p4-bnr
2554.Pq "TI event"
2555Count bus-not-ready conditions.
2556Further documentation for this event is unavailable.
2557.It Li p4-bpu-fetch-request Op Li ,mask= Ns Ar qualifier
2558.Pq "TS event"
2559Count instruction fetch requests qualified by additional
2560flags specified in
2561.Ar qualifier .
2562At this point only one flag is supported:
2563.Pp
2564.Bl -tag -width indent -compact
2565.It Li tcmiss
2566Count trace cache lookup misses.
2567.El
2568.Pp
2569The default qualifier is also
2570.Dq Li mask=tcmiss .
2571.It Li p4-branch-retired Op Li ,mask= Ns Ar flags
2572.Pq "TS event"
2573Counts retired branches.
2574Qualifier
2575.Ar flags
2576is a list of the following
2577.Ql +
2578separated strings:
2579.Pp
2580.Bl -tag -width indent -compact
2581.It Li mmnp
2582Count branches not-taken and predicted.
2583.It Li mmnm
2584Count branches not-taken and mis-predicted.
2585.It Li mmtp
2586Count branches taken and predicted.
2587.It Li mmtm
2588Count branches taken and mis-predicted.
2589.El
2590.Pp
2591The default qualifier counts all four kinds of branches.
2592.It Li p4-bsq-active-entries Op Li ,mask= Ns Ar qualifier
2593.Pq "TS event"
2594Count the number of entries (clipped at 15) currently active in the
2595BSQ.
2596Qualifier
2597.Ar qualifier
2598is a
2599.Ql +
2600separated set of the following flags:
2601.Pp
2602.Bl -tag -width indent -compact
2603.It Li req-type0 , Li req-type1
2604Forms a 2-bit number used to select the request type encoding:
2605.Pp
2606.Bl -tag -width indent -compact
2607.It Li 0
2608reads excluding read invalidate
2609.It Li 1
2610read invalidates
2611.It Li 2
2612writes other than writebacks
2613.It Li 3
2614writebacks
2615.El
2616.Pp
2617Bit
2618.Dq Li req-type1
2619is the MSB for this two bit number.
2620.It Li req-len0 , Li req-len1
2621Forms a two-bit number that specifies the request length encoding:
2622.Pp
2623.Bl -tag -width indent -compact
2624.It Li 0
26250 chunks
2626.It Li 1
26271 chunk
2628.It Li 3
26298 chunks
2630.El
2631.Pp
2632Bit
2633.Dq Li req-len1
2634is the MSB for this two bit number.
2635.It Li req-io-type
2636Count requests that are input or output requests.
2637.It Li req-lock-type
2638Count requests that lock the bus.
2639.It Li req-lock-cache
2640Count requests that lock the cache.
2641.It Li req-split-type
2642Count requests that is a bus 8-byte chunk that is split across an
26438-byte boundary.
2644.It Li req-dem-type
2645Count requests that are demand (not prefetches) if set.
2646Count requests that are prefetches if not set.
2647.It Li req-ord-type
2648Count requests that are ordered.
2649.It Li mem-type0 , Li mem-type1 , Li mem-type2
2650Forms a 3-bit number that specifies a memory type encoding:
2651.Pp
2652.Bl -tag -width indent -compact
2653.It Li 0
2654UC
2655.It Li 1
2656USWC
2657.It Li 4
2658WT
2659.It Li 5
2660WP
2661.It Li 6
2662WB
2663.El
2664.Pp
2665Bit
2666.Dq Li mem-type2
2667is the MSB of this 3-bit number.
2668.El
2669.Pp
2670The default qualifier has all the above bits set.
2671.Pp
2672Edge triggering using the
2673.Dq Li edge
2674qualifier should not be used with this event when counting cycles.
2675.It Li p4-bsq-allocation Op Li ,mask= Ns Ar qualifier
2676.Pq "TS event"
2677Count allocations in the bus sequence unit according to the flags
2678specified in
2679.Ar qualifier ,
2680which is a
2681.Ql +
2682separated set of the following flags:
2683.Pp
2684.Bl -tag -width indent -compact
2685.It Li req-type0 , Li req-type1
2686Forms a 2-bit number used to select the request type encoding:
2687.Pp
2688.Bl -tag -width indent -compact
2689.It Li 0
2690reads excluding read invalidate
2691.It Li 1
2692read invalidates
2693.It Li 2
2694writes other than writebacks
2695.It Li 3
2696writebacks
2697.El
2698.Pp
2699Bit
2700.Dq Li req-type1
2701is the MSB for this two bit number.
2702.It Li req-len0 , Li req-len1
2703Forms a two-bit number that specifies the request length encoding:
2704.Pp
2705.Bl -tag -width indent -compact
2706.It Li 0
27070 chunks
2708.It Li 1
27091 chunk
2710.It Li 3
27118 chunks
2712.El
2713.Pp
2714Bit
2715.Dq Li req-len1
2716is the MSB for this two bit number.
2717.It Li req-io-type
2718Count requests that are input or output requests.
2719.It Li req-lock-type
2720Count requests that lock the bus.
2721.It Li req-lock-cache
2722Count requests that lock the cache.
2723.It Li req-split-type
2724Count requests that is a bus 8-byte chunk that is split across an
27258-byte boundary.
2726.It Li req-dem-type
2727Count requests that are demand (not prefetches) if set.
2728Count requests that are prefetches if not set.
2729.It Li req-ord-type
2730Count requests that are ordered.
2731.It Li mem-type0 , Li mem-type1 , Li mem-type2
2732Forms a 3-bit number that specifies a memory type encoding:
2733.Pp
2734.Bl -tag -width indent -compact
2735.It Li 0
2736UC
2737.It Li 1
2738USWC
2739.It Li 4
2740WT
2741.It Li 5
2742WP
2743.It Li 6
2744WB
2745.El
2746.Pp
2747Bit
2748.Dq Li mem-type2
2749is the MSB of this 3-bit number.
2750.El
2751.Pp
2752The default qualifier has all the above bits set.
2753.Pp
2754This event is usually used along with the
2755.Dq Li edge
2756qualifier to avoid multiple counting.
2757.It Li p4-bsq-cache-reference Op Li ,mask= Ns Ar qualifier
2758.Pq "TS event"
2759Count cache references as seen by the bus unit (2nd or 3rd level
2760cache references).
2761Qualifier
2762.Ar qualifier
2763is a
2764.Ql +
2765separated list of the following keywords:
2766.Pp
2767.Bl -tag -width indent -compact
2768.It Li rd-2ndl-hits
2769Count 2nd level cache hits in the shared state.
2770.It Li rd-2ndl-hite
2771Count 2nd level cache hits in the exclusive state.
2772.It Li rd-2ndl-hitm
2773Count 2nd level cache hits in the modified state.
2774.It Li rd-3rdl-hits
2775Count 3rd level cache hits in the shared state.
2776.It Li rd-3rdl-hite
2777Count 3rd level cache hits in the exclusive state.
2778.It Li rd-3rdl-hitm
2779Count 3rd level cache hits in the modified state.
2780.It Li rd-2ndl-miss
2781Count 2nd level cache misses.
2782.It Li rd-3rdl-miss
2783Count 3rd level cache misses.
2784.It Li wr-2ndl-miss
2785Count write-back lookups from the data access cache that miss the 2nd
2786level cache.
2787.El
2788.Pp
2789The default is to count all the above events.
2790.It Li p4-execution-event Op Li ,mask= Ns Ar flags
2791.Pq "TS event"
2792Count the retirement of tagged uops selected through the execution
2793tagging mechanism.
2794Qualifier
2795.Ar flags
2796can contain the following strings separated by
2797.Ql +
2798characters:
2799.Pp
2800.Bl -tag -width indent -compact
2801.It Li nbogus0 , Li nbogus1 , Li nbogus2 , Li nbogus3
2802The marked uops are not bogus.
2803.It Li bogus0 , Li bogus1 , Li bogus2 , Li bogus3
2804The marked uops are bogus.
2805.El
2806.Pp
2807This event requires additional (upstream) events to be allocated to
2808perform the desired uop tagging.
2809The default is to set all the above flags.
2810This event can be used for precise event based sampling.
2811.It Li p4-front-end-event Op Li ,mask= Ns Ar flags
2812.Pq "TS event"
2813Count the retirement of tagged uops selected through the front-end
2814tagging mechanism.
2815Qualifier
2816.Ar flags
2817can contain the following strings separated by
2818.Ql +
2819characters:
2820.Pp
2821.Bl -tag -width indent -compact
2822.It Li nbogus
2823The marked uops are not bogus.
2824.It Li bogus
2825The marked uops are bogus.
2826.El
2827.Pp
2828This event requires additional (upstream) events to be allocated to
2829perform the desired uop tagging.
2830The default is to select both kinds of events.
2831This event can be used for precise event based sampling.
2832.It Li p4-fsb-data-activity Op Li ,mask= Ns Ar flags
2833.Pq "TI event"
2834Count each DBSY or DRDY event selected by qualifier
2835.Ar flags .
2836Qualifier
2837.Ar flags
2838is a
2839.Ql +
2840separated set of the following flags:
2841.Pp
2842.Bl -tag -width indent -compact
2843.It Li drdy-drv
2844Count when this processor is driving data onto the bus.
2845.It Li drdy-own
2846Count when this processor is reading data from the bus.
2847.It Li drdy-other
2848Count when data is on the bus but not being sampled by this processor.
2849.It Li dbsy-drv
2850Count when this processor reserves the bus for use in the next cycle
2851in order to drive data.
2852.It Li dbsy-own
2853Count when some agent reserves the bus for use in the next bus cycle
2854to drive data that this processor will sample.
2855.It Li dbsy-other
2856Count when some agent reserves the bus for use in the next bus cycle
2857to drive data that this processor will not sample.
2858.El
2859.Pp
2860Flags
2861.Dq Li drdy-own
2862and
2863.Dq Li drdy-other
2864are mutually exclusive.
2865Flags
2866.Dq Li dbsy-own
2867and
2868.Dq Li dbsy-other
2869are mutually exclusive.
2870The default value for
2871.Ar qualifier
2872is
2873.Dq Li drdy-drv+drdy-own+dbsy-drv+dbsy-own .
2874.It Li p4-global-power-events Op Li ,mask= Ns Ar flags
2875.Pq "TS event"
2876Count cycles during which the processor is not stopped.
2877Qualifier
2878.Ar flags
2879can take the following value (which is also the default):
2880.Pp
2881.Bl -tag -width indent -compact
2882.It Li running
2883Count cycles when the processor is active.
2884.El
2885.Pp
2886.It Li p4-instr-retired Op Li ,mask= Ns Ar flags
2887.Pq "TS event"
2888Count instructions retired during a clock cycle.
2889Qualifer
2890.Ar flags
2891comprises of the following strings separated by
2892.Ql +
2893characters:
2894.Pp
2895.Bl -tag -width indent -compact
2896.It Li nbogusntag
2897Count non-bogus instructions that are not tagged.
2898.It Li nbogustag
2899Count non-bogus instructions that are tagged.
2900.It Li bogusntag
2901Count bogus instructions that are not tagged.
2902.It Li bogustag
2903Count bogus instructions that are tagged.
2904.El
2905.Pp
2906The default qualifier counts all the above kinds of instructions.
2907.It Li p4-ioq-active-entries Xo
2908.Op Li ,mask= Ns Ar qualifier
2909.Op Li ,busreqtype= Ns Ar req-type
2910.Xc
2911.Pq "TS event"
2912Count the number of entries (clipped at 15) in the IOQ that are
2913active.
2914The event masks are specified by qualifier
2915.Ar qualifier
2916and
2917.Ar req-type .
2918.Pp
2919Qualifier
2920.Ar qualifier
2921is a
2922.Ql +
2923separated set of the following flags:
2924.Pp
2925.Bl -tag -width indent -compact
2926.It Li all-read
2927Count read entries.
2928.It Li all-write
2929Count write entries.
2930.It Li mem-uc
2931Count entries accessing uncacheable memory.
2932.It Li mem-wc
2933Count entries accessing write-combining memory.
2934.It Li mem-wt
2935Count entries accessing write-through memory.
2936.It Li mem-wp
2937Count entries accessing write-protected memory
2938.It Li mem-wb
2939Count entries accessing write-back memory.
2940.It Li own
2941Count store requests driven by the processor (i.e., not by other
2942processors or by DMA).
2943.It Li other
2944Count store requests driven by other processors or by DMA.
2945.It Li prefetch
2946Include hardware and software prefetch requests in the count.
2947.El
2948.Pp
2949The default value for
2950.Ar qualifier
2951is to enable all the above flags.
2952.Pp
2953The
2954.Ar req-type
2955qualifier is a 5-bit number can be additionally used to select a
2956specific bus request type.
2957The default is 0.
2958.Pp
2959The
2960.Dq Li edge
2961qualifier should not be used when counting cycles with this event.
2962The exact behaviour of this event depends on the processor revision.
2963.It Li p4-ioq-allocation Xo
2964.Op Li ,mask= Ns Ar qualifier
2965.Op Li ,busreqtype= Ns Ar req-type
2966.Xc
2967.Pq "TS event"
2968Count various types of transactions on the bus matching the flags set
2969in
2970.Ar qualifier
2971and
2972.Ar req-type .
2973.Pp
2974Qualifier
2975.Ar qualifier
2976is a
2977.Ql +
2978separated set of the following flags:
2979.Pp
2980.Bl -tag -width indent -compact
2981.It Li all-read
2982Count read entries.
2983.It Li all-write
2984Count write entries.
2985.It Li mem-uc
2986Count entries accessing uncacheable memory.
2987.It Li mem-wc
2988Count entries accessing write-combining memory.
2989.It Li mem-wt
2990Count entries accessing write-through memory.
2991.It Li mem-wp
2992Count entries accessing write-protected memory
2993.It Li mem-wb
2994Count entries accessing write-back memory.
2995.It Li own
2996Count store requests driven by the processor (i.e., not by other
2997processors or by DMA).
2998.It Li other
2999Count store requests driven by other processors or by DMA.
3000.It Li prefetch
3001Include hardware and software prefetch requests in the count.
3002.El
3003.Pp
3004The default value for
3005.Ar qualifier
3006is to enable all the above flags.
3007.Pp
3008The
3009.Ar req-type
3010qualifier is a 5-bit number can be additionally used to select a
3011specific bus request type.
3012The default is 0.
3013.Pp
3014The
3015.Dq Li edge
3016qualifier is normally used with this event to prevent multiple
3017counting.
3018The exact behaviour of this event depends on the processor revision.
3019.It Li p4-itlb-reference Op mask= Ns Ar qualifier
3020.Pq "TS event"
3021Count translations using the intruction translation look-aside
3022buffer.
3023The
3024.Ar qualifier
3025argument is a list of the following strings separated by
3026.Ql +
3027characters.
3028.Pp
3029.Bl -tag -width indent -compact
3030.It Li hit
3031Count ITLB hits.
3032.It Li miss
3033Count ITLB misses.
3034.It Li hit-uc
3035Count uncacheable ITLB hits.
3036.El
3037.Pp
3038If no
3039.Ar qualifier
3040is specified the default is to count all the three kinds of ITLB
3041translations.
3042.It Li p4-load-port-replay Op Li ,mask= Ns Ar qualifier
3043.Pq "TS event"
3044Count replayed events at the load port.
3045Qualifier
3046.Ar qualifier
3047can take on one value:
3048.Pp
3049.Bl -tag -width indent -compact
3050.It Li split-ld
3051Count split loads.
3052.El
3053.Pp
3054The default value for
3055.Ar qualifier
3056is
3057.Dq Li split-ld .
3058.It Li p4-mispred-branch-retired Op Li ,mask= Ns Ar flags
3059.Pq "TS event"
3060Count mispredicted IA-32 branch instructions.
3061Qualifier
3062.Ar flags
3063can take the following value (which is also the default):
3064.Pp
3065.Bl -tag -width indent -compact
3066.It Li nbogus
3067Count non-bogus retired branch instructions.
3068.El
3069.It Li p4-machine-clear Op Li ,mask= Ns Ar flags
3070.Pq "TS event"
3071Count the number of pipeline clears seen by the processor.
3072Qualifer
3073.Ar flags
3074is a list of the following strings separated by
3075.Ql +
3076characters:
3077.Pp
3078.Bl -tag -width indent -compact
3079.It Li clear
3080Count for a portion of the many cycles when the machine is being
3081cleared for any reason.
3082.It Li moclear
3083Count machine clears due to memory ordering issues.
3084.It Li smclear
3085Count machine clears due to self-modifying code.
3086.El
3087.Pp
3088Use qualifier
3089.Dq Li edge
3090to get a count of occurrences of machine clears.
3091The default qualifier is
3092.Dq Li clear .
3093.It Li p4-memory-cancel Op Li ,mask= Ns Ar event-list
3094.Pq "TS event"
3095Count the cancelling of various kinds of requests in the data cache
3096address control unit of the CPU.
3097The qualifier
3098.Ar event-list
3099is a list of the following strings separated by
3100.Ql +
3101characters:
3102.Pp
3103.Bl -tag -width indent -compact
3104.It Li st-rb-full
3105Requests cancelled because no store request buffer was available.
3106.It Li 64k-conf
3107Requests that conflict due to 64K aliasing.
3108.El
3109.Pp
3110If
3111.Ar event-list
3112is not specified, then the default is to count both kinds of events.
3113.It Li p4-memory-complete Op Li ,mask= Ns Ar event-list
3114.Pq "TS event"
3115Count the completion of load split, store split, uncacheable split and
3116uncacheable load operations selected by qualifier
3117.Ar event-list .
3118The qualifier
3119.Ar event-list
3120is a
3121.Ql +
3122separated list of the following flags:
3123.Pp
3124.Bl -tag -width indent -compact
3125.It Li lsc
3126Count load splits completed, excluding loads from uncacheable or
3127write-combining areas.
3128.It Li ssc
3129Count any split stores completed.
3130.El
3131.Pp
3132The default is to count both kinds of operations.
3133.It Li p4-mob-load-replay Op Li ,mask= Ns Ar qualifier
3134.Pq "TS event"
3135Count load replays triggered by the memory order buffer.
3136Qualifier
3137.Ar qualifier
3138can be a
3139.Ql +
3140separated list of the following flags:
3141.Pp
3142.Bl -tag -width indent -compact
3143.It Li no-sta
3144Count replays because of unknown store addresses.
3145.It Li no-std
3146Count replays because of unknown store data.
3147.It Li partial-data
3148Count replays because of partially overlapped data accesses between
3149load and store operations.
3150.It Li unalgn-addr
3151Count replays because of mismatches in the lower 4 bits of load and
3152store operations.
3153.El
3154.Pp
3155The default qualifier is
3156.Ar no-sta+no-std+partial-data+unalgn-addr .
3157.It Li p4-packed-dp-uop Op Li ,mask= Ns Ar flags
3158.Pq "TI event"
3159Count packed double-precision uops.
3160Qualifier
3161.Ar flags
3162can take the following value (which is also the default):
3163.Pp
3164.Bl -tag -width indent -compact
3165.It Li all
3166Count all uops operating on packed double-precision operands.
3167.El
3168.It Li p4-packed-sp-uop Op Li ,mask= Ns Ar flags
3169.Pq "TI event"
3170Count packed single-precision uops.
3171Qualifier
3172.Ar flags
3173can take the following value (which is also the default):
3174.Pp
3175.Bl -tag -width indent -compact
3176.It Li all
3177Count all uops operating on packed single-precision operands.
3178.El
3179.It Li p4-page-walk-type Op Li ,mask= Ns Ar qualifier
3180.Pq "TI event"
3181Count page walks performed by the page miss handler.
3182Qualifier
3183.Ar qualifier
3184can be a
3185.Ql +
3186separated list of the following keywords:
3187.Pp
3188.Bl -tag -width indent -compact
3189.It Li dtmiss
3190Count page walks for data TLB misses.
3191.It Li itmiss
3192Count page walks for instruction TLB misses.
3193.El
3194.Pp
3195The default value for
3196.Ar qualifier
3197is
3198.Dq Li dtmiss+itmiss .
3199.It Li p4-replay-event Op Li ,mask= Ns Ar flags
3200.Pq "TS event"
3201Count the retirement of tagged uops selected through the replay
3202tagging mechanism.
3203Qualifier
3204.Ar flags
3205contains a
3206.Ql +
3207separated set of the following strings:
3208.Pp
3209.Bl -tag -width indent -compact
3210.It Li nbogus
3211The marked uops are not bogus.
3212.It Li bogus
3213The marked uops are bogus.
3214.El
3215.Pp
3216This event requires additional (upstream) events to be allocated to
3217perform the desired uop tagging.
3218The default qualifier counts both kinds of uops.
3219This event can be used for precise event based sampling.
3220.It Li p4-resource-stall Op Li ,mask= Ns Ar flags
3221.Pq "TS event"
3222Count the occurrence or latency of stalls in the allocator.
3223Qualifier
3224.Ar flags
3225can take the following value (which is also the default):
3226.Pp
3227.Bl -tag -width indent -compact
3228.It Li sbfull
3229A stall due to the lack of store buffers.
3230.El
3231.It Li p4-response
3232.Pq "TI event"
3233Count different types of responses.
3234Further documentation on this event is not available.
3235.It Li p4-retired-branch-type Op Li ,mask= Ns Ar flags
3236.Pq "TS event"
3237Count branches retired.
3238Qualifier
3239.Ar flags
3240contains a
3241.Ql +
3242separated list of strings:
3243.Pp
3244.Bl -tag -width indent -compact
3245.It Li conditional
3246Count conditional jumps.
3247.It Li call
3248Count direct and indirect call branches.
3249.It Li return
3250Count return branches.
3251.It Li indirect
3252Count returns, indirect calls or indirect jumps.
3253.El
3254.Pp
3255The default qualifier counts all the above branch types.
3256.It Li p4-retired-mispred-branch-type Op Li ,mask= Ns Ar flags
3257.Pq "TS event"
3258Count mispredicted branches retired.
3259Qualifier
3260.Ar flags
3261contains a
3262.Ql +
3263separated list of strings:
3264.Pp
3265.Bl -tag -width indent -compact
3266.It Li conditional
3267Count conditional jumps.
3268.It Li call
3269Count indirect call branches.
3270.It Li return
3271Count return branches.
3272.It Li indirect
3273Count returns, indirect calls or indirect jumps.
3274.El
3275.Pp
3276The default qualifier counts all the above branch types.
3277.It Li p4-scalar-dp-uop Op Li ,mask= Ns Ar flags
3278.Pq "TI event"
3279Count the number of scalar double-precision uops.
3280Qualifier
3281.Ar flags
3282can take the following value (which is also the default):
3283.Pp
3284.Bl -tag -width indent -compact
3285.It Li all
3286Count the number of scalar double-precision uops.
3287.El
3288.It Li p4-scalar-sp-uop Op Li ,mask= Ns Ar flags
3289.Pq "TI event"
3290Count the number of scalar single-precision uops.
3291Qualifier
3292.Ar flags
3293can take the following value (which is also the default):
3294.Pp
3295.Bl -tag -width indent -compact
3296.It Li all
3297Count all uops operating on scalar single-precision operands.
3298.El
3299.It Li p4-snoop
3300.Pq "TI event"
3301Count snoop traffic.
3302Further documentation on this event is not available.
3303.It Li p4-sse-input-assist Op Li ,mask= Ns Ar flags
3304.Pq "TI event"
3305Count the number of times an assist is required to handle problems
3306with the operands for SSE and SSE2 operations.
3307Qualifier
3308.Ar flags
3309can take the following value (which is also the default):
3310.Pp
3311.Bl -tag -width indent -compact
3312.It Li all
3313Count assists for all SSE and SSE2 uops.
3314.El
3315.It Li p4-store-port-replay Op Li ,mask= Ns Ar qualifier
3316.Pq "TS event"
3317Count events replayed at the store port.
3318Qualifier
3319.Ar qualifier
3320can take on one value:
3321.Pp
3322.Bl -tag -width indent -compact
3323.It Li split-st
3324Count split stores.
3325.El
3326.Pp
3327The default value for
3328.Ar qualifier
3329is
3330.Dq Li split-st .
3331.It Li p4-tc-deliver-mode Op Li ,mask= Ns Ar qualifier
3332.Pq "TI event"
3333Count the duration in cycles of operating modes of the trace cache and
3334decode engine.
3335The desired operating mode is selected by
3336.Ar qualifier ,
3337which is a list of the following strings separated by
3338.Ql +
3339characters:
3340.Pp
3341.Bl -tag -width indent -compact
3342.It Li DD
3343Both logical processors are in deliver mode.
3344.It Li DB
3345Logical processor 0 is in deliver mode while logical processor 1 is in
3346build mode.
3347.It Li DI
3348Logical processor 0 is in deliver mode while logical processor 1 is
3349halted, or in machine clear, or transitioning to a long microcode
3350flow.
3351.It Li BD
3352Logical processor 0 is in build mode while logical processor 1 is in
3353deliver mode.
3354.It Li BB
3355Both logical processors are in build mode.
3356.It Li BI
3357Logical processor 0 is in build mode while logical processor 1 is
3358halted, or in machine clear or transitioning to a long microcode
3359flow.
3360.It Li ID
3361Logical processor 0 is halted, or in machine clear or transitioning to
3362a long microcode flow while logical processor 1 is in deliver mode.
3363.It Li IB
3364Logical processor 0 is halted, or in machine clear or transitioning to
3365a long microcode flow while logical processor 1 is in build mode.
3366.El
3367.Pp
3368If there is only one logical processor in the processor package then
3369the qualifier for logical processor 1 is ignored.
3370If no qualifier is specified, the default qualifier is
3371.Dq Li DD+DB+DI+BD+BB+BI+ID+IB .
3372.It Li p4-tc-ms-xfer Op Li ,mask= Ns Ar flags
3373.Pq "TI event"
3374Count the number of times uop delivery changed from the trace cache to
3375MS ROM.
3376Qualifier
3377.Ar flags
3378can take the following value (which is also the default):
3379.Pp
3380.Bl -tag -width indent -compact
3381.It Li cisc
3382Count TC to MS transfers.
3383.El
3384.It Li p4-uop-queue-writes Op Li ,mask= Ns Ar flags
3385.Pq "TS event"
3386Count the number of valid uops written to the uop queue.
3387Qualifier
3388.Ar flags
3389is a list of the following strings, separated by
3390.Ql +
3391characters:
3392.Pp
3393.Bl -tag -width indent -compact
3394.It Li from-tc-build
3395Count uops being written from the trace cache in build mode.
3396.It Li from-tc-deliver
3397Count uops being written from the trace cache in deliver mode.
3398.It Li from-rom
3399Count uops being written from microcode ROM.
3400.El
3401.Pp
3402The default qualifier counts all the above kinds of uops.
3403.It Li p4-uop-type Op Li ,mask= Ns Ar flags
3404.Pq "TS event"
3405This event is used in conjunction with the front-end at-retirement
3406mechanism to tag load and store uops.
3407Qualifer
3408.Ar flags
3409comprises the following strings separated by
3410.Ql +
3411characters:
3412.Pp
3413.Bl -tag -width indent -compact
3414.It Li tagloads
3415Mark uops that are load operations.
3416.It Li tagstores
3417Mark uops that are store operations.
3418.El
3419.Pp
3420The default qualifier counts both kinds of uops.
3421.It Li p4-uops-retired Op Li ,mask= Ns Ar flags
3422.Pq "TS event"
3423Count uops retired during a clock cycle.
3424Qualifier
3425.Ar flags
3426comprises the following strings separated by
3427.Ql +
3428characters:
3429.Pp
3430.Bl -tag -width indent -compact
3431.It Li nbogus
3432Count marked uops that are not bogus.
3433.It Li bogus
3434Count marked uops that are bogus.
3435.El
3436.Pp
3437The default qualifier counts both kinds of uops.
3438.It Li p4-wc-buffer Op Li ,mask= Ns Ar flags
3439.Pq "TI event"
3440Count write-combining buffer operations.
3441Qualifier
3442.Ar flags
3443contains the following strings separated by
3444.Ql +
3445characters:
3446.Pp
3447.Bl -tag -width indent -compact
3448.It Li wcb-evicts
3449WC buffer evictions due to any cause.
3450.It Li wcb-full-evict
3451WC buffer evictions due to no WC buffer being available.
3452.El
3453.Pp
3454The default qualifer counts both kinds of evictions.
3455.It Li p4-x87-assist Op Li ,mask= Ns Ar flags
3456.Pq "TS event"
3457Count the retirement of x87 instructions that required special
3458handling.
3459Qualifier
3460.Ar flags
3461contains the following strings separated by
3462.Ql +
3463characters:
3464.Pp
3465.Bl -tag -width indent -compact
3466.It Li fpsu
3467Count instructions that saw an FP stack underflow.
3468.It Li fpso
3469Count instructions that saw an FP stack overflow.
3470.It Li poao
3471Count instructions that saw an x87 output overflow.
3472.It Li poau
3473Count instructions that saw an x87 output underflow.
3474.It Li prea
3475Count instructions that needed an x87 input assist.
3476.El
3477.Pp
3478The default qualifier counts all the above types of instruction
3479retirements.
3480.It Li p4-x87-fp-uop Op Li ,mask= Ns Ar flags
3481.Pq "TI event"
3482Count x87 floating-point uops.
3483Qualifier
3484.Ar flags
3485can take the following value (which is also the default):
3486.Pp
3487.Bl -tag -width indent -compact
3488.It Li all
3489Count all x87 floating-point uops.
3490.El
3491.Pp
3492If an instruction contains more than one x87 floating-point uops, then
3493all x87 floating-point uops will be counted.
3494This event does not count x87 floating-point data movement operations.
3495.It Li p4-x87-simd-moves-uop Op Li ,mask= Ns Ar flags
3496.Pq "TI event"
3497Count each x87 FPU, MMX, SSE, or SSE2 uops that load data or store
3498data or perform register-to-register moves.
3499This event does not count integer move uops.
3500Qualifier
3501.Ar flags
3502may contain the following keywords separated by
3503.Ql +
3504characters:
3505.Pp
3506.Bl -tag -width indent -compact
3507.It Li allp0
3508Count all x87 and SIMD store and move uops.
3509.It Li allp2
3510Count all x87 and SIMD load uops.
3511.El
3512.Pp
3513The default is to count all uops.
3514.Pq Errata
3515This event may be affected by processor errata N43.
3516.El
3517.Ss "Cascading P4 PMCs"
3518PMC cascading support is currently poorly implemented.
3519While individual event counters may be allocated with a
3520.Dq Li cascade
3521qualifier, the current API does not offer the ability
3522to name and allocate all the resources needed for a
3523cascaded event counter pair in a single operation.
3524.Ss "Precise Event Based Sampling"
3525Support for precise event based sampling is currently
3526unimplemented.
3527.Sh COMPATIBILITY
3528The interface between the
3529.Nm pmc
3530library and the
3531.Xr hwpmc 4
3532driver is intended to be private to the implementation and may
3533change.
3534In order to ease forward compatibility with future versions of the
3535.Xr hwpmc 4
3536driver, applications are urged to dynamically link with the
3537.Nm pmc
3538library.
3539.Pp
3540The
3541.Nm pmc
3542API is
3543.Ud
3544.Sh SEE ALSO
3545.Xr pmclog 3 ,
3546.Xr hwpmc 4 ,
3547.Xr pmccontrol 8 ,
3548.Xr pmcstat 8
3549.Sh HISTORY
3550The
3551.Nm pmc
3552library first appeared in
3553.Fx 6.0 .
3554.Sh AUTHORS
3555The
3556.Lb libpmc
3557library was written by
3558.An "Joseph Koshy"
3559.Aq jkoshy@FreeBSD.org .
3560