xref: /freebsd/share/man/man9/devstat.9 (revision a970610a3af63b3f4df5b69d91c6b4093a00ed8f)
1.\"
2.\" Copyright (c) 1998, 1999 Kenneth D. Merry.
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\" 3. The name of the author may not be used to endorse or promote products
14.\"    derived from this software without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.Dd July 15, 2020
29.Dt DEVSTAT 9
30.Os
31.Sh NAME
32.Nm devstat ,
33.Nm devstat_end_transaction ,
34.Nm devstat_end_transaction_bio ,
35.Nm devstat_end_transaction_bio_bt ,
36.Nm devstat_new_entry ,
37.Nm devstat_remove_entry ,
38.Nm devstat_start_transaction ,
39.Nm devstat_start_transaction_bio
40.Nd kernel interface for keeping device statistics
41.Sh SYNOPSIS
42.In sys/devicestat.h
43.Ft struct devstat *
44.Fo devstat_new_entry
45.Fa "const void *dev_name"
46.Fa "int unit_number"
47.Fa "uint32_t block_size"
48.Fa "devstat_support_flags flags"
49.Fa "devstat_type_flags device_type"
50.Fa "devstat_priority priority"
51.Fc
52.Ft void
53.Fn devstat_remove_entry "struct devstat *ds"
54.Ft void
55.Fo devstat_start_transaction
56.Fa "struct devstat *ds"
57.Fa "const struct bintime *now"
58.Fc
59.Ft void
60.Fo devstat_start_transaction_bio
61.Fa "struct devstat *ds"
62.Fa "struct bio *bp"
63.Fc
64.Ft void
65.Fo devstat_end_transaction
66.Fa "struct devstat *ds"
67.Fa "uint32_t bytes"
68.Fa "devstat_tag_type tag_type"
69.Fa "devstat_trans_flags flags"
70.Fa "const struct bintime *now"
71.Fa "const struct bintime *then"
72.Fc
73.Ft void
74.Fo devstat_end_transaction_bio
75.Fa "struct devstat *ds"
76.Fa "const struct bio *bp"
77.Fc
78.Ft void
79.Fo devstat_end_transaction_bio_bt
80.Fa "struct devstat *ds"
81.Fa "const struct bio *bp"
82.Fa "const struct bintime *now"
83.Fc
84.Sh DESCRIPTION
85The devstat subsystem is an interface for recording device
86statistics, as its name implies.
87The idea is to keep reasonably detailed
88statistics while utilizing a minimum amount of CPU time to record them.
89Thus, no statistical calculations are actually performed in the kernel
90portion of the
91.Nm
92code.
93Instead, that is left for user programs to handle.
94.Pp
95The historical and antiquated
96.Nm
97model assumed a single active IO operation per device, which is not accurate
98for most disk-like drivers in the 2000s and beyond.
99New consumers of the interface should almost certainly use only the "bio"
100variants of the start and end transacation routines.
101.Pp
102.Fn devstat_new_entry
103allocates and initializes
104.Va devstat
105structure and returns a pointer to it.
106.Fn devstat_new_entry
107takes several arguments:
108.Bl -tag -width device_type
109.It dev_name
110The device name, e.g., da, cd, sa.
111.It unit_number
112Device unit number.
113.It block_size
114Block size of the device, if supported.
115If the device does not support a
116block size, or if the blocksize is unknown at the time the device is added
117to the
118.Nm
119list, it should be set to 0.
120.It flags
121Flags indicating operations supported or not supported by the device.
122See below for details.
123.It device_type
124The device type.
125This is broken into three sections: base device type
126(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI
127or other) and a pass-through flag to indicate pas-through devices.
128See below for a complete list of types.
129.It priority
130The device priority.
131The priority is used to determine how devices are
132sorted within
133.Nm devstat Ns 's
134list of devices.
135Devices are sorted first by priority (highest to lowest),
136and then by attach order.
137See below for a complete list of available
138priorities.
139.El
140.Pp
141.Fn devstat_remove_entry
142removes a device from the
143.Nm
144subsystem.
145It takes the devstat structure for the device in question as
146an argument.
147The
148.Nm
149generation number is incremented and the number of devices is decremented.
150.Pp
151.Fn devstat_start_transaction
152registers the start of a transaction with the
153.Nm
154subsystem.
155Optionally, if the caller already has a
156.Fn binuptime
157value available, it may be passed in
158.Fa *now .
159Usually the caller can just pass
160.Dv NULL
161for
162.Fa now ,
163and the routine will gather the current
164.Fn binuptime
165itself.
166The busy count is incremented with each transaction start.
167When a device goes from idle to busy, the system uptime is recorded in the
168.Va busy_from
169field of the
170.Va devstat
171structure.
172.Pp
173.Fn devstat_start_transaction_bio
174records the
175.Fn binuptime
176in the provided bio's
177.Fa bio_t0
178and then invokes
179.Fn devstat_start_transaction .
180.Pp
181.Fn devstat_end_transaction
182registers the end of a transaction with the
183.Nm
184subsystem.
185It takes six arguments:
186.Bl -tag -width tag_type
187.It ds
188The
189.Va devstat
190structure for the device in question.
191.It bytes
192The number of bytes transferred in this transaction.
193.It tag_type
194Transaction tag type.
195See below for tag types.
196.It flags
197Transaction flags indicating whether the transaction was a read, write, or
198whether no data was transferred.
199.It now
200The
201.Fn binuptime
202at the end of the transaction, or
203.Dv NULL .
204.It then
205The
206.Fn binuptime
207at the beginning of the transaction, or
208.Dv NULL .
209.El
210.Pp
211If
212.Fa now
213is
214.Dv NULL ,
215it collects the current time from
216.Fn binuptime .
217If
218.Fa then
219is
220.Dv NULL ,
221the operation is not tracked in the
222.Va devstat
223.Fa duration
224table.
225.Pp
226.Fn devstat_end_transaction_bio
227is a thin wrapper for
228.Fn devstat_end_transaction_bio_bt
229with a
230.Dv NULL
231.Fa now
232parameter.
233.Pp
234.Fn devstat_end_transaction_bio_bt
235is a wrapper for
236.Fn devstat_end_transaction
237which pulls all needed information from a
238.Va "struct bio"
239prepared by
240.Fn devstat_start_transaction_bio .
241The bio must be ready for
242.Fn biodone
243(i.e.,
244.Fa bio_bcount
245and
246.Fa bio_resid
247must be correctly initialized).
248.Pp
249The
250.Va devstat
251structure is composed of the following fields:
252.Bl -tag -width dev_creation_time
253.It sequence0 ,
254.It sequence1
255An implementation detail used to gather consistent snapshots of device
256statistics.
257.It start_count
258Number of operations started.
259.It end_count
260Number of operations completed.
261The
262.Dq busy_count
263can be calculated by subtracting
264.Fa end_count
265from
266.Fa start_count .
267.Fa ( sequence0
268and
269.Fa sequence1
270are used to get a consistent snapshot.)
271This is the current number of outstanding transactions for the device.
272This should never go below zero, and on an idle device it should be zero.
273If either one of these conditions is not true, it indicates a problem.
274.Pp
275There should be one and only one
276transaction start event and one transaction end event for each transaction.
277.It dev_links
278Each
279.Va devstat
280structure is placed in a linked list when it is registered.
281The
282.Va dev_links
283field contains a pointer to the next entry in the list of
284.Va devstat
285structures.
286.It device_number
287The device number is a unique identifier for each device.
288The device
289number is incremented for each new device that is registered.
290The device
291number is currently only a 32-bit integer, but it could be enlarged if
292someone has a system with more than four billion device arrival events.
293.It device_name
294The device name is a text string given by the registering driver to
295identify itself.
296(e.g.,
297.Dq da ,
298.Dq cd ,
299.Dq sa ,
300etc.)
301.It unit_number
302The unit number identifies the particular instance of the peripheral driver
303in question.
304.It bytes[4]
305This array contains the number of bytes that have been read (index
306.Dv DEVSTAT_READ ) ,
307written (index
308.Dv DEVSTAT_WRITE ) ,
309freed or erased (index
310.Dv DEVSTAT_FREE ) ,
311or other (index
312.Dv DEVSTAT_NO_DATA ) .
313All values are unsigned 64-bit integers.
314.It operations[4]
315This array contains the number of operations of a given type that have been
316performed.
317The indices are identical to those for
318.Fa bytes
319above.
320.Dv DEVSTAT_NO_DATA
321or "other" represents the number of transactions to the device which are
322neither reads, writes, nor frees.
323For instance,
324.Tn SCSI
325drivers often send a test unit ready command to
326.Tn SCSI
327devices.
328The test unit ready command does not read or write any data.
329It merely causes the device to return its status.
330.It duration[4]
331This array contains the total bintime corresponding to completed operations of
332a given type.
333The indices are identical to those for
334.Fa bytes
335above.
336(Operations that complete using the historical
337.Fn devstat_end_transaction
338API and do not provide a non-NULL
339.Fa then
340are not accounted for.)
341.It busy_time
342This is the amount of time that the device busy count has been greater than
343zero.
344This is only updated when the busy count returns to zero.
345.It creation_time
346This is the time, as reported by
347.Fn getmicrotime
348that the device was registered.
349.It block_size
350This is the block size of the device, if the device has a block size.
351.It tag_types
352This is an array of counters to record the number of various tag types that
353are sent to a device.
354See below for a list of tag types.
355.It busy_from
356If the device is not busy, this was the time that a transaction last completed.
357If the device is busy, this the most recent of either the time that the device
358became busy, or the time that the last transaction completed.
359.It flags
360These flags indicate which statistics measurements are supported by a
361particular device.
362These flags are primarily intended to serve as an aid
363to userland programs that decipher the statistics.
364.It device_type
365This is the device type.
366It consists of three parts: the device type
367(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE,
368SCSI or other) and whether or not the device in question is a pass-through
369driver.
370See below for a complete list of device types.
371.It priority
372This is the priority.
373This is the first parameter used to determine where
374to insert a device in the
375.Nm
376list.
377The second parameter is attach order.
378See below for a list of available priorities.
379.It id
380Identification for GEOM nodes.
381.El
382.Pp
383Each device is given a device type.
384Pass-through devices have the same underlying device type and interface as the
385device they provide an interface for, but they also have the pass-through flag
386set.
387The base device types are identical to the
388.Tn SCSI
389device type numbers, so with
390.Tn SCSI
391peripherals, the device type returned from an inquiry is usually ORed with the
392.Tn SCSI
393interface type and the pass-through flag if appropriate.
394The device type
395flags are as follows:
396.Bd -literal -offset indent
397typedef enum {
398	DEVSTAT_TYPE_DIRECT	= 0x000,
399	DEVSTAT_TYPE_SEQUENTIAL	= 0x001,
400	DEVSTAT_TYPE_PRINTER	= 0x002,
401	DEVSTAT_TYPE_PROCESSOR	= 0x003,
402	DEVSTAT_TYPE_WORM	= 0x004,
403	DEVSTAT_TYPE_CDROM	= 0x005,
404	DEVSTAT_TYPE_SCANNER	= 0x006,
405	DEVSTAT_TYPE_OPTICAL	= 0x007,
406	DEVSTAT_TYPE_CHANGER	= 0x008,
407	DEVSTAT_TYPE_COMM	= 0x009,
408	DEVSTAT_TYPE_ASC0	= 0x00a,
409	DEVSTAT_TYPE_ASC1	= 0x00b,
410	DEVSTAT_TYPE_STORARRAY	= 0x00c,
411	DEVSTAT_TYPE_ENCLOSURE	= 0x00d,
412	DEVSTAT_TYPE_FLOPPY	= 0x00e,
413	DEVSTAT_TYPE_MASK	= 0x00f,
414	DEVSTAT_TYPE_IF_SCSI	= 0x010,
415	DEVSTAT_TYPE_IF_IDE	= 0x020,
416	DEVSTAT_TYPE_IF_OTHER	= 0x030,
417	DEVSTAT_TYPE_IF_NVME	= 0x040,
418	DEVSTAT_TYPE_IF_MASK	= 0x0f0,
419	DEVSTAT_TYPE_PASS	= 0x100
420} devstat_type_flags;
421.Ed
422.Pp
423Devices have a priority associated with them, which controls roughly where
424they are placed in the
425.Nm
426list.
427The priorities are as follows:
428.Bd -literal -offset indent
429typedef enum {
430	DEVSTAT_PRIORITY_MIN	= 0x000,
431	DEVSTAT_PRIORITY_OTHER	= 0x020,
432	DEVSTAT_PRIORITY_PASS	= 0x030,
433	DEVSTAT_PRIORITY_FD	= 0x040,
434	DEVSTAT_PRIORITY_WFD	= 0x050,
435	DEVSTAT_PRIORITY_TAPE	= 0x060,
436	DEVSTAT_PRIORITY_CD	= 0x090,
437	DEVSTAT_PRIORITY_DISK	= 0x110,
438	DEVSTAT_PRIORITY_ARRAY	= 0x120,
439	DEVSTAT_PRIORITY_MAX	= 0xfff
440} devstat_priority;
441.Ed
442.Pp
443Each device has associated with it flags to indicate what operations are
444supported or not supported.
445The
446.Va devstat_support_flags
447values are as follows:
448.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS
449.It DEVSTAT_ALL_SUPPORTED
450Every statistic type is supported by the device.
451.It DEVSTAT_NO_BLOCKSIZE
452This device does not have a blocksize.
453.It DEVSTAT_NO_ORDERED_TAGS
454This device does not support ordered tags.
455.It DEVSTAT_BS_UNAVAILABLE
456This device supports a blocksize, but it is currently unavailable.
457This
458flag is most often used with removable media drives.
459.El
460.Pp
461Transactions to a device fall into one of three categories, which are
462represented in the
463.Va flags
464passed into
465.Fn devstat_end_transaction .
466The transaction types are as follows:
467.Bd -literal -offset indent
468typedef enum {
469	DEVSTAT_NO_DATA	= 0x00,
470	DEVSTAT_READ	= 0x01,
471	DEVSTAT_WRITE	= 0x02,
472	DEVSTAT_FREE	= 0x03
473} devstat_trans_flags;
474#define DEVSTAT_N_TRANS_FLAGS   4
475.Ed
476.Pp
477DEVSTAT_NO_DATA is a type of transactions to the device which are neither
478reads or writes.
479For instance,
480.Tn SCSI
481drivers often send a test unit ready command to
482.Tn SCSI
483devices.
484The test unit ready command does not read or write any data.
485It merely causes the device to return its status.
486.Pp
487There are four possible values for the
488.Va tag_type
489argument to
490.Fn devstat_end_transaction :
491.Bl -tag -width DEVSTAT_TAG_ORDERED
492.It DEVSTAT_TAG_SIMPLE
493The transaction had a simple tag.
494.It DEVSTAT_TAG_HEAD
495The transaction had a head of queue tag.
496.It DEVSTAT_TAG_ORDERED
497The transaction had an ordered tag.
498.It DEVSTAT_TAG_NONE
499The device does not support tags.
500.El
501.Pp
502The tag type values correspond to the lower four bits of the
503.Tn SCSI
504tag definitions.
505In CAM, for instance, the
506.Va tag_action
507from the CCB is ORed with 0xf to determine the tag type to pass in to
508.Fn devstat_end_transaction .
509.Pp
510There is a macro,
511.Dv DEVSTAT_VERSION
512that is defined in
513.In sys/devicestat.h .
514This is the current version of the
515.Nm
516subsystem, and it should be incremented each time a change is made that
517would require recompilation of userland programs that access
518.Nm
519statistics.
520Userland programs use this version, via the
521.Va kern.devstat.version
522.Nm sysctl
523variable to determine whether they are in sync with the kernel
524.Nm
525structures.
526.Sh SEE ALSO
527.Xr systat 1 ,
528.Xr devstat 3 ,
529.Xr iostat 8 ,
530.Xr rpc.rstatd 8 ,
531.Xr vmstat 8
532.Sh HISTORY
533The
534.Nm
535statistics system appeared in
536.Fx 3.0 .
537.Sh AUTHORS
538.An Kenneth Merry Aq Mt ken@FreeBSD.org
539.Sh BUGS
540There may be a need for
541.Fn spl
542protection around some of the
543.Nm
544list manipulation code to ensure, for example, that the list of devices
545is not changed while someone is fetching the
546.Va kern.devstat.all
547.Nm sysctl
548variable.
549