1.\" 2.\" Copyright (c) 1998, 1999 Kenneth D. Merry. 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 3. The name of the author may not be used to endorse or promote products 14.\" derived from this software without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" $FreeBSD$ 29.\" 30.Dd July 15, 2020 31.Dt DEVSTAT 9 32.Os 33.Sh NAME 34.Nm devstat , 35.Nm devstat_end_transaction , 36.Nm devstat_end_transaction_bio , 37.Nm devstat_end_transaction_bio_bt , 38.Nm devstat_new_entry , 39.Nm devstat_remove_entry , 40.Nm devstat_start_transaction , 41.Nm devstat_start_transaction_bio 42.Nd kernel interface for keeping device statistics 43.Sh SYNOPSIS 44.In sys/devicestat.h 45.Ft struct devstat * 46.Fo devstat_new_entry 47.Fa "const void *dev_name" 48.Fa "int unit_number" 49.Fa "uint32_t block_size" 50.Fa "devstat_support_flags flags" 51.Fa "devstat_type_flags device_type" 52.Fa "devstat_priority priority" 53.Fc 54.Ft void 55.Fn devstat_remove_entry "struct devstat *ds" 56.Ft void 57.Fo devstat_start_transaction 58.Fa "struct devstat *ds" 59.Fa "const struct bintime *now" 60.Fc 61.Ft void 62.Fo devstat_start_transaction_bio 63.Fa "struct devstat *ds" 64.Fa "struct bio *bp" 65.Fc 66.Ft void 67.Fo devstat_end_transaction 68.Fa "struct devstat *ds" 69.Fa "uint32_t bytes" 70.Fa "devstat_tag_type tag_type" 71.Fa "devstat_trans_flags flags" 72.Fa "const struct bintime *now" 73.Fa "const struct bintime *then" 74.Fc 75.Ft void 76.Fo devstat_end_transaction_bio 77.Fa "struct devstat *ds" 78.Fa "const struct bio *bp" 79.Fc 80.Ft void 81.Fo devstat_end_transaction_bio_bt 82.Fa "struct devstat *ds" 83.Fa "const struct bio *bp" 84.Fa "const struct bintime *now" 85.Fc 86.Sh DESCRIPTION 87The devstat subsystem is an interface for recording device 88statistics, as its name implies. 89The idea is to keep reasonably detailed 90statistics while utilizing a minimum amount of CPU time to record them. 91Thus, no statistical calculations are actually performed in the kernel 92portion of the 93.Nm 94code. 95Instead, that is left for user programs to handle. 96.Pp 97The historical and antiquated 98.Nm 99model assumed a single active IO operation per device, which is not accurate 100for most disk-like drivers in the 2000s and beyond. 101New consumers of the interface should almost certainly use only the "bio" 102variants of the start and end transacation routines. 103.Pp 104.Fn devstat_new_entry 105allocates and initializes 106.Va devstat 107structure and returns a pointer to it. 108.Fn devstat_new_entry 109takes several arguments: 110.Bl -tag -width device_type 111.It dev_name 112The device name, e.g., da, cd, sa. 113.It unit_number 114Device unit number. 115.It block_size 116Block size of the device, if supported. 117If the device does not support a 118block size, or if the blocksize is unknown at the time the device is added 119to the 120.Nm 121list, it should be set to 0. 122.It flags 123Flags indicating operations supported or not supported by the device. 124See below for details. 125.It device_type 126The device type. 127This is broken into three sections: base device type 128(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI 129or other) and a pass-through flag to indicate pas-through devices. 130See below for a complete list of types. 131.It priority 132The device priority. 133The priority is used to determine how devices are 134sorted within 135.Nm devstat Ns 's 136list of devices. 137Devices are sorted first by priority (highest to lowest), 138and then by attach order. 139See below for a complete list of available 140priorities. 141.El 142.Pp 143.Fn devstat_remove_entry 144removes a device from the 145.Nm 146subsystem. 147It takes the devstat structure for the device in question as 148an argument. 149The 150.Nm 151generation number is incremented and the number of devices is decremented. 152.Pp 153.Fn devstat_start_transaction 154registers the start of a transaction with the 155.Nm 156subsystem. 157Optionally, if the caller already has a 158.Fn binuptime 159value available, it may be passed in 160.Fa *now . 161Usually the caller can just pass 162.Dv NULL 163for 164.Fa now , 165and the routine will gather the current 166.Fn binuptime 167itself. 168The busy count is incremented with each transaction start. 169When a device goes from idle to busy, the system uptime is recorded in the 170.Va busy_from 171field of the 172.Va devstat 173structure. 174.Pp 175.Fn devstat_start_transaction_bio 176records the 177.Fn binuptime 178in the provided bio's 179.Fa bio_t0 180and then invokes 181.Fn devstat_start_transaction . 182.Pp 183.Fn devstat_end_transaction 184registers the end of a transaction with the 185.Nm 186subsystem. 187It takes six arguments: 188.Bl -tag -width tag_type 189.It ds 190The 191.Va devstat 192structure for the device in question. 193.It bytes 194The number of bytes transferred in this transaction. 195.It tag_type 196Transaction tag type. 197See below for tag types. 198.It flags 199Transaction flags indicating whether the transaction was a read, write, or 200whether no data was transferred. 201.It now 202The 203.Fn binuptime 204at the end of the transaction, or 205.Dv NULL . 206.It then 207The 208.Fn binuptime 209at the beginning of the transaction, or 210.Dv NULL . 211.El 212.Pp 213If 214.Fa now 215is 216.Dv NULL , 217it collects the current time from 218.Fn binuptime . 219If 220.Fa then 221is 222.Dv NULL , 223the operation is not tracked in the 224.Va devstat 225.Fa duration 226table. 227.Pp 228.Fn devstat_end_transaction_bio 229is a thin wrapper for 230.Fn devstat_end_transaction_bio_bt 231with a 232.Dv NULL 233.Fa now 234parameter. 235.Pp 236.Fn devstat_end_transaction_bio_bt 237is a wrapper for 238.Fn devstat_end_transaction 239which pulls all needed information from a 240.Va "struct bio" 241prepared by 242.Fn devstat_start_transaction_bio . 243The bio must be ready for 244.Fn biodone 245(i.e., 246.Fa bio_bcount 247and 248.Fa bio_resid 249must be correctly initialized). 250.Pp 251The 252.Va devstat 253structure is composed of the following fields: 254.Bl -tag -width dev_creation_time 255.It sequence0 , 256.It sequence1 257An implementation detail used to gather consistent snapshots of device 258statistics. 259.It start_count 260Number of operations started. 261.It end_count 262Number of operations completed. 263The 264.Dq busy_count 265can be calculated by subtracting 266.Fa end_count 267from 268.Fa start_count . 269.Fa ( sequence0 270and 271.Fa sequence1 272are used to get a consistent snapshot.) 273This is the current number of outstanding transactions for the device. 274This should never go below zero, and on an idle device it should be zero. 275If either one of these conditions is not true, it indicates a problem. 276.Pp 277There should be one and only one 278transaction start event and one transaction end event for each transaction. 279.It dev_links 280Each 281.Va devstat 282structure is placed in a linked list when it is registered. 283The 284.Va dev_links 285field contains a pointer to the next entry in the list of 286.Va devstat 287structures. 288.It device_number 289The device number is a unique identifier for each device. 290The device 291number is incremented for each new device that is registered. 292The device 293number is currently only a 32-bit integer, but it could be enlarged if 294someone has a system with more than four billion device arrival events. 295.It device_name 296The device name is a text string given by the registering driver to 297identify itself. 298(e.g., 299.Dq da , 300.Dq cd , 301.Dq sa , 302etc.) 303.It unit_number 304The unit number identifies the particular instance of the peripheral driver 305in question. 306.It bytes[4] 307This array contains the number of bytes that have been read (index 308.Dv DEVSTAT_READ ) , 309written (index 310.Dv DEVSTAT_WRITE ) , 311freed or erased (index 312.Dv DEVSTAT_FREE ) , 313or other (index 314.Dv DEVSTAT_NO_DATA ) . 315All values are unsigned 64-bit integers. 316.It operations[4] 317This array contains the number of operations of a given type that have been 318performed. 319The indices are identical to those for 320.Fa bytes 321above. 322.Dv DEVSTAT_NO_DATA 323or "other" represents the number of transactions to the device which are 324neither reads, writes, nor frees. 325For instance, 326.Tn SCSI 327drivers often send a test unit ready command to 328.Tn SCSI 329devices. 330The test unit ready command does not read or write any data. 331It merely causes the device to return its status. 332.It duration[4] 333This array contains the total bintime corresponding to completed operations of 334a given type. 335The indices are identical to those for 336.Fa bytes 337above. 338(Operations that complete using the historical 339.Fn devstat_end_transaction 340API and do not provide a non-NULL 341.Fa then 342are not accounted for.) 343.It busy_time 344This is the amount of time that the device busy count has been greater than 345zero. 346This is only updated when the busy count returns to zero. 347.It creation_time 348This is the time, as reported by 349.Fn getmicrotime 350that the device was registered. 351.It block_size 352This is the block size of the device, if the device has a block size. 353.It tag_types 354This is an array of counters to record the number of various tag types that 355are sent to a device. 356See below for a list of tag types. 357.It busy_from 358If the device is not busy, this was the time that a transaction last completed. 359If the device is busy, this the most recent of either the time that the device 360became busy, or the time that the last transaction completed. 361.It flags 362These flags indicate which statistics measurements are supported by a 363particular device. 364These flags are primarily intended to serve as an aid 365to userland programs that decipher the statistics. 366.It device_type 367This is the device type. 368It consists of three parts: the device type 369(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE, 370SCSI or other) and whether or not the device in question is a pass-through 371driver. 372See below for a complete list of device types. 373.It priority 374This is the priority. 375This is the first parameter used to determine where 376to insert a device in the 377.Nm 378list. 379The second parameter is attach order. 380See below for a list of available priorities. 381.It id 382Identification for GEOM nodes. 383.El 384.Pp 385Each device is given a device type. 386Pass-through devices have the same underlying device type and interface as the 387device they provide an interface for, but they also have the pass-through flag 388set. 389The base device types are identical to the 390.Tn SCSI 391device type numbers, so with 392.Tn SCSI 393peripherals, the device type returned from an inquiry is usually ORed with the 394.Tn SCSI 395interface type and the pass-through flag if appropriate. 396The device type 397flags are as follows: 398.Bd -literal -offset indent 399typedef enum { 400 DEVSTAT_TYPE_DIRECT = 0x000, 401 DEVSTAT_TYPE_SEQUENTIAL = 0x001, 402 DEVSTAT_TYPE_PRINTER = 0x002, 403 DEVSTAT_TYPE_PROCESSOR = 0x003, 404 DEVSTAT_TYPE_WORM = 0x004, 405 DEVSTAT_TYPE_CDROM = 0x005, 406 DEVSTAT_TYPE_SCANNER = 0x006, 407 DEVSTAT_TYPE_OPTICAL = 0x007, 408 DEVSTAT_TYPE_CHANGER = 0x008, 409 DEVSTAT_TYPE_COMM = 0x009, 410 DEVSTAT_TYPE_ASC0 = 0x00a, 411 DEVSTAT_TYPE_ASC1 = 0x00b, 412 DEVSTAT_TYPE_STORARRAY = 0x00c, 413 DEVSTAT_TYPE_ENCLOSURE = 0x00d, 414 DEVSTAT_TYPE_FLOPPY = 0x00e, 415 DEVSTAT_TYPE_MASK = 0x00f, 416 DEVSTAT_TYPE_IF_SCSI = 0x010, 417 DEVSTAT_TYPE_IF_IDE = 0x020, 418 DEVSTAT_TYPE_IF_OTHER = 0x030, 419 DEVSTAT_TYPE_IF_MASK = 0x0f0, 420 DEVSTAT_TYPE_PASS = 0x100 421} devstat_type_flags; 422.Ed 423.Pp 424Devices have a priority associated with them, which controls roughly where 425they are placed in the 426.Nm 427list. 428The priorities are as follows: 429.Bd -literal -offset indent 430typedef enum { 431 DEVSTAT_PRIORITY_MIN = 0x000, 432 DEVSTAT_PRIORITY_OTHER = 0x020, 433 DEVSTAT_PRIORITY_PASS = 0x030, 434 DEVSTAT_PRIORITY_FD = 0x040, 435 DEVSTAT_PRIORITY_WFD = 0x050, 436 DEVSTAT_PRIORITY_TAPE = 0x060, 437 DEVSTAT_PRIORITY_CD = 0x090, 438 DEVSTAT_PRIORITY_DISK = 0x110, 439 DEVSTAT_PRIORITY_ARRAY = 0x120, 440 DEVSTAT_PRIORITY_MAX = 0xfff 441} devstat_priority; 442.Ed 443.Pp 444Each device has associated with it flags to indicate what operations are 445supported or not supported. 446The 447.Va devstat_support_flags 448values are as follows: 449.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS 450.It DEVSTAT_ALL_SUPPORTED 451Every statistic type is supported by the device. 452.It DEVSTAT_NO_BLOCKSIZE 453This device does not have a blocksize. 454.It DEVSTAT_NO_ORDERED_TAGS 455This device does not support ordered tags. 456.It DEVSTAT_BS_UNAVAILABLE 457This device supports a blocksize, but it is currently unavailable. 458This 459flag is most often used with removable media drives. 460.El 461.Pp 462Transactions to a device fall into one of three categories, which are 463represented in the 464.Va flags 465passed into 466.Fn devstat_end_transaction . 467The transaction types are as follows: 468.Bd -literal -offset indent 469typedef enum { 470 DEVSTAT_NO_DATA = 0x00, 471 DEVSTAT_READ = 0x01, 472 DEVSTAT_WRITE = 0x02, 473 DEVSTAT_FREE = 0x03 474} devstat_trans_flags; 475#define DEVSTAT_N_TRANS_FLAGS 4 476.Ed 477.Pp 478DEVSTAT_NO_DATA is a type of transactions to the device which are neither 479reads or writes. 480For instance, 481.Tn SCSI 482drivers often send a test unit ready command to 483.Tn SCSI 484devices. 485The test unit ready command does not read or write any data. 486It merely causes the device to return its status. 487.Pp 488There are four possible values for the 489.Va tag_type 490argument to 491.Fn devstat_end_transaction : 492.Bl -tag -width DEVSTAT_TAG_ORDERED 493.It DEVSTAT_TAG_SIMPLE 494The transaction had a simple tag. 495.It DEVSTAT_TAG_HEAD 496The transaction had a head of queue tag. 497.It DEVSTAT_TAG_ORDERED 498The transaction had an ordered tag. 499.It DEVSTAT_TAG_NONE 500The device does not support tags. 501.El 502.Pp 503The tag type values correspond to the lower four bits of the 504.Tn SCSI 505tag definitions. 506In CAM, for instance, the 507.Va tag_action 508from the CCB is ORed with 0xf to determine the tag type to pass in to 509.Fn devstat_end_transaction . 510.Pp 511There is a macro, 512.Dv DEVSTAT_VERSION 513that is defined in 514.In sys/devicestat.h . 515This is the current version of the 516.Nm 517subsystem, and it should be incremented each time a change is made that 518would require recompilation of userland programs that access 519.Nm 520statistics. 521Userland programs use this version, via the 522.Va kern.devstat.version 523.Nm sysctl 524variable to determine whether they are in sync with the kernel 525.Nm 526structures. 527.Sh SEE ALSO 528.Xr systat 1 , 529.Xr devstat 3 , 530.Xr iostat 8 , 531.Xr rpc.rstatd 8 , 532.Xr vmstat 8 533.Sh HISTORY 534The 535.Nm 536statistics system appeared in 537.Fx 3.0 . 538.Sh AUTHORS 539.An Kenneth Merry Aq Mt ken@FreeBSD.org 540.Sh BUGS 541There may be a need for 542.Fn spl 543protection around some of the 544.Nm 545list manipulation code to ensure, for example, that the list of devices 546is not changed while someone is fetching the 547.Va kern.devstat.all 548.Nm sysctl 549variable. 550