xref: /freebsd/share/man/man9/devstat.9 (revision c9ccf3a32da427475985b85d7df023ccfb138c27)
1.\"
2.\" Copyright (c) 1998, 1999 Kenneth D. Merry.
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\" 3. The name of the author may not be used to endorse or promote products
14.\"    derived from this software without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\" $FreeBSD$
29.\"
30.Dd July 15, 2020
31.Dt DEVSTAT 9
32.Os
33.Sh NAME
34.Nm devstat ,
35.Nm devstat_end_transaction ,
36.Nm devstat_end_transaction_bio ,
37.Nm devstat_end_transaction_bio_bt ,
38.Nm devstat_new_entry ,
39.Nm devstat_remove_entry ,
40.Nm devstat_start_transaction ,
41.Nm devstat_start_transaction_bio
42.Nd kernel interface for keeping device statistics
43.Sh SYNOPSIS
44.In sys/devicestat.h
45.Ft struct devstat *
46.Fo devstat_new_entry
47.Fa "const void *dev_name"
48.Fa "int unit_number"
49.Fa "uint32_t block_size"
50.Fa "devstat_support_flags flags"
51.Fa "devstat_type_flags device_type"
52.Fa "devstat_priority priority"
53.Fc
54.Ft void
55.Fn devstat_remove_entry "struct devstat *ds"
56.Ft void
57.Fo devstat_start_transaction
58.Fa "struct devstat *ds"
59.Fa "const struct bintime *now"
60.Fc
61.Ft void
62.Fo devstat_start_transaction_bio
63.Fa "struct devstat *ds"
64.Fa "struct bio *bp"
65.Fc
66.Ft void
67.Fo devstat_end_transaction
68.Fa "struct devstat *ds"
69.Fa "uint32_t bytes"
70.Fa "devstat_tag_type tag_type"
71.Fa "devstat_trans_flags flags"
72.Fa "const struct bintime *now"
73.Fa "const struct bintime *then"
74.Fc
75.Ft void
76.Fo devstat_end_transaction_bio
77.Fa "struct devstat *ds"
78.Fa "const struct bio *bp"
79.Fc
80.Ft void
81.Fo devstat_end_transaction_bio_bt
82.Fa "struct devstat *ds"
83.Fa "const struct bio *bp"
84.Fa "const struct bintime *now"
85.Fc
86.Sh DESCRIPTION
87The devstat subsystem is an interface for recording device
88statistics, as its name implies.
89The idea is to keep reasonably detailed
90statistics while utilizing a minimum amount of CPU time to record them.
91Thus, no statistical calculations are actually performed in the kernel
92portion of the
93.Nm
94code.
95Instead, that is left for user programs to handle.
96.Pp
97The historical and antiquated
98.Nm
99model assumed a single active IO operation per device, which is not accurate
100for most disk-like drivers in the 2000s and beyond.
101New consumers of the interface should almost certainly use only the "bio"
102variants of the start and end transacation routines.
103.Pp
104.Fn devstat_new_entry
105allocates and initializes
106.Va devstat
107structure and returns a pointer to it.
108.Fn devstat_new_entry
109takes several arguments:
110.Bl -tag -width device_type
111.It dev_name
112The device name, e.g., da, cd, sa.
113.It unit_number
114Device unit number.
115.It block_size
116Block size of the device, if supported.
117If the device does not support a
118block size, or if the blocksize is unknown at the time the device is added
119to the
120.Nm
121list, it should be set to 0.
122.It flags
123Flags indicating operations supported or not supported by the device.
124See below for details.
125.It device_type
126The device type.
127This is broken into three sections: base device type
128(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI
129or other) and a pass-through flag to indicate pas-through devices.
130See below for a complete list of types.
131.It priority
132The device priority.
133The priority is used to determine how devices are
134sorted within
135.Nm devstat Ns 's
136list of devices.
137Devices are sorted first by priority (highest to lowest),
138and then by attach order.
139See below for a complete list of available
140priorities.
141.El
142.Pp
143.Fn devstat_remove_entry
144removes a device from the
145.Nm
146subsystem.
147It takes the devstat structure for the device in question as
148an argument.
149The
150.Nm
151generation number is incremented and the number of devices is decremented.
152.Pp
153.Fn devstat_start_transaction
154registers the start of a transaction with the
155.Nm
156subsystem.
157Optionally, if the caller already has a
158.Fn binuptime
159value available, it may be passed in
160.Fa *now .
161Usually the caller can just pass
162.Dv NULL
163for
164.Fa now ,
165and the routine will gather the current
166.Fn binuptime
167itself.
168The busy count is incremented with each transaction start.
169When a device goes from idle to busy, the system uptime is recorded in the
170.Va busy_from
171field of the
172.Va devstat
173structure.
174.Pp
175.Fn devstat_start_transaction_bio
176records the
177.Fn binuptime
178in the provided bio's
179.Fa bio_t0
180and then invokes
181.Fn devstat_start_transaction .
182.Pp
183.Fn devstat_end_transaction
184registers the end of a transaction with the
185.Nm
186subsystem.
187It takes six arguments:
188.Bl -tag -width tag_type
189.It ds
190The
191.Va devstat
192structure for the device in question.
193.It bytes
194The number of bytes transferred in this transaction.
195.It tag_type
196Transaction tag type.
197See below for tag types.
198.It flags
199Transaction flags indicating whether the transaction was a read, write, or
200whether no data was transferred.
201.It now
202The
203.Fn binuptime
204at the end of the transaction, or
205.Dv NULL .
206.It then
207The
208.Fn binuptime
209at the beginning of the transaction, or
210.Dv NULL .
211.El
212.Pp
213If
214.Fa now
215is
216.Dv NULL ,
217it collects the current time from
218.Fn binuptime .
219If
220.Fa then
221is
222.Dv NULL ,
223the operation is not tracked in the
224.Va devstat
225.Fa duration
226table.
227.Pp
228.Fn devstat_end_transaction_bio
229is a thin wrapper for
230.Fn devstat_end_transaction_bio_bt
231with a
232.Dv NULL
233.Fa now
234parameter.
235.Pp
236.Fn devstat_end_transaction_bio_bt
237is a wrapper for
238.Fn devstat_end_transaction
239which pulls all needed information from a
240.Va "struct bio"
241prepared by
242.Fn devstat_start_transaction_bio .
243The bio must be ready for
244.Fn biodone
245(i.e.,
246.Fa bio_bcount
247and
248.Fa bio_resid
249must be correctly initialized).
250.Pp
251The
252.Va devstat
253structure is composed of the following fields:
254.Bl -tag -width dev_creation_time
255.It sequence0 ,
256.It sequence1
257An implementation detail used to gather consistent snapshots of device
258statistics.
259.It start_count
260Number of operations started.
261.It end_count
262Number of operations completed.
263The
264.Dq busy_count
265can be calculated by subtracting
266.Fa end_count
267from
268.Fa start_count .
269.Fa ( sequence0
270and
271.Fa sequence1
272are used to get a consistent snapshot.)
273This is the current number of outstanding transactions for the device.
274This should never go below zero, and on an idle device it should be zero.
275If either one of these conditions is not true, it indicates a problem.
276.Pp
277There should be one and only one
278transaction start event and one transaction end event for each transaction.
279.It dev_links
280Each
281.Va devstat
282structure is placed in a linked list when it is registered.
283The
284.Va dev_links
285field contains a pointer to the next entry in the list of
286.Va devstat
287structures.
288.It device_number
289The device number is a unique identifier for each device.
290The device
291number is incremented for each new device that is registered.
292The device
293number is currently only a 32-bit integer, but it could be enlarged if
294someone has a system with more than four billion device arrival events.
295.It device_name
296The device name is a text string given by the registering driver to
297identify itself.
298(e.g.,
299.Dq da ,
300.Dq cd ,
301.Dq sa ,
302etc.)
303.It unit_number
304The unit number identifies the particular instance of the peripheral driver
305in question.
306.It bytes[4]
307This array contains the number of bytes that have been read (index
308.Dv DEVSTAT_READ ) ,
309written (index
310.Dv DEVSTAT_WRITE ) ,
311freed or erased (index
312.Dv DEVSTAT_FREE ) ,
313or other (index
314.Dv DEVSTAT_NO_DATA ) .
315All values are unsigned 64-bit integers.
316.It operations[4]
317This array contains the number of operations of a given type that have been
318performed.
319The indices are identical to those for
320.Fa bytes
321above.
322.Dv DEVSTAT_NO_DATA
323or "other" represents the number of transactions to the device which are
324neither reads, writes, nor frees.
325For instance,
326.Tn SCSI
327drivers often send a test unit ready command to
328.Tn SCSI
329devices.
330The test unit ready command does not read or write any data.
331It merely causes the device to return its status.
332.It duration[4]
333This array contains the total bintime corresponding to completed operations of
334a given type.
335The indices are identical to those for
336.Fa bytes
337above.
338(Operations that complete using the historical
339.Fn devstat_end_transaction
340API and do not provide a non-NULL
341.Fa then
342are not accounted for.)
343.It busy_time
344This is the amount of time that the device busy count has been greater than
345zero.
346This is only updated when the busy count returns to zero.
347.It creation_time
348This is the time, as reported by
349.Fn getmicrotime
350that the device was registered.
351.It block_size
352This is the block size of the device, if the device has a block size.
353.It tag_types
354This is an array of counters to record the number of various tag types that
355are sent to a device.
356See below for a list of tag types.
357.It busy_from
358If the device is not busy, this was the time that a transaction last completed.
359If the device is busy, this the most recent of either the time that the device
360became busy, or the time that the last transaction completed.
361.It flags
362These flags indicate which statistics measurements are supported by a
363particular device.
364These flags are primarily intended to serve as an aid
365to userland programs that decipher the statistics.
366.It device_type
367This is the device type.
368It consists of three parts: the device type
369(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE,
370SCSI or other) and whether or not the device in question is a pass-through
371driver.
372See below for a complete list of device types.
373.It priority
374This is the priority.
375This is the first parameter used to determine where
376to insert a device in the
377.Nm
378list.
379The second parameter is attach order.
380See below for a list of available priorities.
381.It id
382Identification for GEOM nodes.
383.El
384.Pp
385Each device is given a device type.
386Pass-through devices have the same underlying device type and interface as the
387device they provide an interface for, but they also have the pass-through flag
388set.
389The base device types are identical to the
390.Tn SCSI
391device type numbers, so with
392.Tn SCSI
393peripherals, the device type returned from an inquiry is usually ORed with the
394.Tn SCSI
395interface type and the pass-through flag if appropriate.
396The device type
397flags are as follows:
398.Bd -literal -offset indent
399typedef enum {
400	DEVSTAT_TYPE_DIRECT	= 0x000,
401	DEVSTAT_TYPE_SEQUENTIAL	= 0x001,
402	DEVSTAT_TYPE_PRINTER	= 0x002,
403	DEVSTAT_TYPE_PROCESSOR	= 0x003,
404	DEVSTAT_TYPE_WORM	= 0x004,
405	DEVSTAT_TYPE_CDROM	= 0x005,
406	DEVSTAT_TYPE_SCANNER	= 0x006,
407	DEVSTAT_TYPE_OPTICAL	= 0x007,
408	DEVSTAT_TYPE_CHANGER	= 0x008,
409	DEVSTAT_TYPE_COMM	= 0x009,
410	DEVSTAT_TYPE_ASC0	= 0x00a,
411	DEVSTAT_TYPE_ASC1	= 0x00b,
412	DEVSTAT_TYPE_STORARRAY	= 0x00c,
413	DEVSTAT_TYPE_ENCLOSURE	= 0x00d,
414	DEVSTAT_TYPE_FLOPPY	= 0x00e,
415	DEVSTAT_TYPE_MASK	= 0x00f,
416	DEVSTAT_TYPE_IF_SCSI	= 0x010,
417	DEVSTAT_TYPE_IF_IDE	= 0x020,
418	DEVSTAT_TYPE_IF_OTHER	= 0x030,
419	DEVSTAT_TYPE_IF_MASK	= 0x0f0,
420	DEVSTAT_TYPE_PASS	= 0x100
421} devstat_type_flags;
422.Ed
423.Pp
424Devices have a priority associated with them, which controls roughly where
425they are placed in the
426.Nm
427list.
428The priorities are as follows:
429.Bd -literal -offset indent
430typedef enum {
431	DEVSTAT_PRIORITY_MIN	= 0x000,
432	DEVSTAT_PRIORITY_OTHER	= 0x020,
433	DEVSTAT_PRIORITY_PASS	= 0x030,
434	DEVSTAT_PRIORITY_FD	= 0x040,
435	DEVSTAT_PRIORITY_WFD	= 0x050,
436	DEVSTAT_PRIORITY_TAPE	= 0x060,
437	DEVSTAT_PRIORITY_CD	= 0x090,
438	DEVSTAT_PRIORITY_DISK	= 0x110,
439	DEVSTAT_PRIORITY_ARRAY	= 0x120,
440	DEVSTAT_PRIORITY_MAX	= 0xfff
441} devstat_priority;
442.Ed
443.Pp
444Each device has associated with it flags to indicate what operations are
445supported or not supported.
446The
447.Va devstat_support_flags
448values are as follows:
449.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS
450.It DEVSTAT_ALL_SUPPORTED
451Every statistic type is supported by the device.
452.It DEVSTAT_NO_BLOCKSIZE
453This device does not have a blocksize.
454.It DEVSTAT_NO_ORDERED_TAGS
455This device does not support ordered tags.
456.It DEVSTAT_BS_UNAVAILABLE
457This device supports a blocksize, but it is currently unavailable.
458This
459flag is most often used with removable media drives.
460.El
461.Pp
462Transactions to a device fall into one of three categories, which are
463represented in the
464.Va flags
465passed into
466.Fn devstat_end_transaction .
467The transaction types are as follows:
468.Bd -literal -offset indent
469typedef enum {
470	DEVSTAT_NO_DATA	= 0x00,
471	DEVSTAT_READ	= 0x01,
472	DEVSTAT_WRITE	= 0x02,
473	DEVSTAT_FREE	= 0x03
474} devstat_trans_flags;
475#define DEVSTAT_N_TRANS_FLAGS   4
476.Ed
477.Pp
478DEVSTAT_NO_DATA is a type of transactions to the device which are neither
479reads or writes.
480For instance,
481.Tn SCSI
482drivers often send a test unit ready command to
483.Tn SCSI
484devices.
485The test unit ready command does not read or write any data.
486It merely causes the device to return its status.
487.Pp
488There are four possible values for the
489.Va tag_type
490argument to
491.Fn devstat_end_transaction :
492.Bl -tag -width DEVSTAT_TAG_ORDERED
493.It DEVSTAT_TAG_SIMPLE
494The transaction had a simple tag.
495.It DEVSTAT_TAG_HEAD
496The transaction had a head of queue tag.
497.It DEVSTAT_TAG_ORDERED
498The transaction had an ordered tag.
499.It DEVSTAT_TAG_NONE
500The device does not support tags.
501.El
502.Pp
503The tag type values correspond to the lower four bits of the
504.Tn SCSI
505tag definitions.
506In CAM, for instance, the
507.Va tag_action
508from the CCB is ORed with 0xf to determine the tag type to pass in to
509.Fn devstat_end_transaction .
510.Pp
511There is a macro,
512.Dv DEVSTAT_VERSION
513that is defined in
514.In sys/devicestat.h .
515This is the current version of the
516.Nm
517subsystem, and it should be incremented each time a change is made that
518would require recompilation of userland programs that access
519.Nm
520statistics.
521Userland programs use this version, via the
522.Va kern.devstat.version
523.Nm sysctl
524variable to determine whether they are in sync with the kernel
525.Nm
526structures.
527.Sh SEE ALSO
528.Xr systat 1 ,
529.Xr devstat 3 ,
530.Xr iostat 8 ,
531.Xr rpc.rstatd 8 ,
532.Xr vmstat 8
533.Sh HISTORY
534The
535.Nm
536statistics system appeared in
537.Fx 3.0 .
538.Sh AUTHORS
539.An Kenneth Merry Aq Mt ken@FreeBSD.org
540.Sh BUGS
541There may be a need for
542.Fn spl
543protection around some of the
544.Nm
545list manipulation code to ensure, for example, that the list of devices
546is not changed while someone is fetching the
547.Va kern.devstat.all
548.Nm sysctl
549variable.
550