1.\" 2.\" Copyright (c) 1998, 1999 Kenneth D. Merry. 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 3. The name of the author may not be used to endorse or promote products 14.\" derived from this software without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.Dd July 15, 2020 29.Dt DEVSTAT 9 30.Os 31.Sh NAME 32.Nm devstat , 33.Nm devstat_end_transaction , 34.Nm devstat_end_transaction_bio , 35.Nm devstat_end_transaction_bio_bt , 36.Nm devstat_new_entry , 37.Nm devstat_remove_entry , 38.Nm devstat_start_transaction , 39.Nm devstat_start_transaction_bio 40.Nd kernel interface for keeping device statistics 41.Sh SYNOPSIS 42.In sys/devicestat.h 43.Ft struct devstat * 44.Fo devstat_new_entry 45.Fa "const void *dev_name" 46.Fa "int unit_number" 47.Fa "uint32_t block_size" 48.Fa "devstat_support_flags flags" 49.Fa "devstat_type_flags device_type" 50.Fa "devstat_priority priority" 51.Fc 52.Ft void 53.Fn devstat_remove_entry "struct devstat *ds" 54.Ft void 55.Fo devstat_start_transaction 56.Fa "struct devstat *ds" 57.Fa "const struct bintime *now" 58.Fc 59.Ft void 60.Fo devstat_start_transaction_bio 61.Fa "struct devstat *ds" 62.Fa "struct bio *bp" 63.Fc 64.Ft void 65.Fo devstat_end_transaction 66.Fa "struct devstat *ds" 67.Fa "uint32_t bytes" 68.Fa "devstat_tag_type tag_type" 69.Fa "devstat_trans_flags flags" 70.Fa "const struct bintime *now" 71.Fa "const struct bintime *then" 72.Fc 73.Ft void 74.Fo devstat_end_transaction_bio 75.Fa "struct devstat *ds" 76.Fa "const struct bio *bp" 77.Fc 78.Ft void 79.Fo devstat_end_transaction_bio_bt 80.Fa "struct devstat *ds" 81.Fa "const struct bio *bp" 82.Fa "const struct bintime *now" 83.Fc 84.Sh DESCRIPTION 85The devstat subsystem is an interface for recording device 86statistics, as its name implies. 87The idea is to keep reasonably detailed 88statistics while utilizing a minimum amount of CPU time to record them. 89Thus, no statistical calculations are actually performed in the kernel 90portion of the 91.Nm 92code. 93Instead, that is left for user programs to handle. 94.Pp 95The historical and antiquated 96.Nm 97model assumed a single active IO operation per device, which is not accurate 98for most disk-like drivers in the 2000s and beyond. 99New consumers of the interface should almost certainly use only the "bio" 100variants of the start and end transacation routines. 101.Pp 102.Fn devstat_new_entry 103allocates and initializes 104.Va devstat 105structure and returns a pointer to it. 106.Fn devstat_new_entry 107takes several arguments: 108.Bl -tag -width device_type 109.It dev_name 110The device name, e.g., da, cd, sa. 111.It unit_number 112Device unit number. 113.It block_size 114Block size of the device, if supported. 115If the device does not support a 116block size, or if the blocksize is unknown at the time the device is added 117to the 118.Nm 119list, it should be set to 0. 120.It flags 121Flags indicating operations supported or not supported by the device. 122See below for details. 123.It device_type 124The device type. 125This is broken into three sections: base device type 126(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI 127or other) and a pass-through flag to indicate pas-through devices. 128See below for a complete list of types. 129.It priority 130The device priority. 131The priority is used to determine how devices are 132sorted within 133.Nm devstat Ns 's 134list of devices. 135Devices are sorted first by priority (highest to lowest), 136and then by attach order. 137See below for a complete list of available 138priorities. 139.El 140.Pp 141.Fn devstat_remove_entry 142removes a device from the 143.Nm 144subsystem. 145It takes the devstat structure for the device in question as 146an argument. 147The 148.Nm 149generation number is incremented and the number of devices is decremented. 150.Pp 151.Fn devstat_start_transaction 152registers the start of a transaction with the 153.Nm 154subsystem. 155Optionally, if the caller already has a 156.Fn binuptime 157value available, it may be passed in 158.Fa *now . 159Usually the caller can just pass 160.Dv NULL 161for 162.Fa now , 163and the routine will gather the current 164.Fn binuptime 165itself. 166The busy count is incremented with each transaction start. 167When a device goes from idle to busy, the system uptime is recorded in the 168.Va busy_from 169field of the 170.Va devstat 171structure. 172.Pp 173.Fn devstat_start_transaction_bio 174records the 175.Fn binuptime 176in the provided bio's 177.Fa bio_t0 178and then invokes 179.Fn devstat_start_transaction . 180.Pp 181.Fn devstat_end_transaction 182registers the end of a transaction with the 183.Nm 184subsystem. 185It takes six arguments: 186.Bl -tag -width tag_type 187.It ds 188The 189.Va devstat 190structure for the device in question. 191.It bytes 192The number of bytes transferred in this transaction. 193.It tag_type 194Transaction tag type. 195See below for tag types. 196.It flags 197Transaction flags indicating whether the transaction was a read, write, or 198whether no data was transferred. 199.It now 200The 201.Fn binuptime 202at the end of the transaction, or 203.Dv NULL . 204.It then 205The 206.Fn binuptime 207at the beginning of the transaction, or 208.Dv NULL . 209.El 210.Pp 211If 212.Fa now 213is 214.Dv NULL , 215it collects the current time from 216.Fn binuptime . 217If 218.Fa then 219is 220.Dv NULL , 221the operation is not tracked in the 222.Va devstat 223.Fa duration 224table. 225.Pp 226.Fn devstat_end_transaction_bio 227is a thin wrapper for 228.Fn devstat_end_transaction_bio_bt 229with a 230.Dv NULL 231.Fa now 232parameter. 233.Pp 234.Fn devstat_end_transaction_bio_bt 235is a wrapper for 236.Fn devstat_end_transaction 237which pulls all needed information from a 238.Va "struct bio" 239prepared by 240.Fn devstat_start_transaction_bio . 241The bio must be ready for 242.Fn biodone 243(i.e., 244.Fa bio_bcount 245and 246.Fa bio_resid 247must be correctly initialized). 248.Pp 249The 250.Va devstat 251structure is composed of the following fields: 252.Bl -tag -width dev_creation_time 253.It sequence0 , 254.It sequence1 255An implementation detail used to gather consistent snapshots of device 256statistics. 257.It start_count 258Number of operations started. 259.It end_count 260Number of operations completed. 261The 262.Dq busy_count 263can be calculated by subtracting 264.Fa end_count 265from 266.Fa start_count . 267.Fa ( sequence0 268and 269.Fa sequence1 270are used to get a consistent snapshot.) 271This is the current number of outstanding transactions for the device. 272This should never go below zero, and on an idle device it should be zero. 273If either one of these conditions is not true, it indicates a problem. 274.Pp 275There should be one and only one 276transaction start event and one transaction end event for each transaction. 277.It dev_links 278Each 279.Va devstat 280structure is placed in a linked list when it is registered. 281The 282.Va dev_links 283field contains a pointer to the next entry in the list of 284.Va devstat 285structures. 286.It device_number 287The device number is a unique identifier for each device. 288The device 289number is incremented for each new device that is registered. 290The device 291number is currently only a 32-bit integer, but it could be enlarged if 292someone has a system with more than four billion device arrival events. 293.It device_name 294The device name is a text string given by the registering driver to 295identify itself. 296(e.g., 297.Dq da , 298.Dq cd , 299.Dq sa , 300etc.) 301.It unit_number 302The unit number identifies the particular instance of the peripheral driver 303in question. 304.It bytes[4] 305This array contains the number of bytes that have been read (index 306.Dv DEVSTAT_READ ) , 307written (index 308.Dv DEVSTAT_WRITE ) , 309freed or erased (index 310.Dv DEVSTAT_FREE ) , 311or other (index 312.Dv DEVSTAT_NO_DATA ) . 313All values are unsigned 64-bit integers. 314.It operations[4] 315This array contains the number of operations of a given type that have been 316performed. 317The indices are identical to those for 318.Fa bytes 319above. 320.Dv DEVSTAT_NO_DATA 321or "other" represents the number of transactions to the device which are 322neither reads, writes, nor frees. 323For instance, 324.Tn SCSI 325drivers often send a test unit ready command to 326.Tn SCSI 327devices. 328The test unit ready command does not read or write any data. 329It merely causes the device to return its status. 330.It duration[4] 331This array contains the total bintime corresponding to completed operations of 332a given type. 333The indices are identical to those for 334.Fa bytes 335above. 336(Operations that complete using the historical 337.Fn devstat_end_transaction 338API and do not provide a non-NULL 339.Fa then 340are not accounted for.) 341.It busy_time 342This is the amount of time that the device busy count has been greater than 343zero. 344This is only updated when the busy count returns to zero. 345.It creation_time 346This is the time, as reported by 347.Fn getmicrotime 348that the device was registered. 349.It block_size 350This is the block size of the device, if the device has a block size. 351.It tag_types 352This is an array of counters to record the number of various tag types that 353are sent to a device. 354See below for a list of tag types. 355.It busy_from 356If the device is not busy, this was the time that a transaction last completed. 357If the device is busy, this the most recent of either the time that the device 358became busy, or the time that the last transaction completed. 359.It flags 360These flags indicate which statistics measurements are supported by a 361particular device. 362These flags are primarily intended to serve as an aid 363to userland programs that decipher the statistics. 364.It device_type 365This is the device type. 366It consists of three parts: the device type 367(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE, 368SCSI or other) and whether or not the device in question is a pass-through 369driver. 370See below for a complete list of device types. 371.It priority 372This is the priority. 373This is the first parameter used to determine where 374to insert a device in the 375.Nm 376list. 377The second parameter is attach order. 378See below for a list of available priorities. 379.It id 380Identification for GEOM nodes. 381.El 382.Pp 383Each device is given a device type. 384Pass-through devices have the same underlying device type and interface as the 385device they provide an interface for, but they also have the pass-through flag 386set. 387The base device types are identical to the 388.Tn SCSI 389device type numbers, so with 390.Tn SCSI 391peripherals, the device type returned from an inquiry is usually ORed with the 392.Tn SCSI 393interface type and the pass-through flag if appropriate. 394The device type 395flags are as follows: 396.Bd -literal -offset indent 397typedef enum { 398 DEVSTAT_TYPE_DIRECT = 0x000, 399 DEVSTAT_TYPE_SEQUENTIAL = 0x001, 400 DEVSTAT_TYPE_PRINTER = 0x002, 401 DEVSTAT_TYPE_PROCESSOR = 0x003, 402 DEVSTAT_TYPE_WORM = 0x004, 403 DEVSTAT_TYPE_CDROM = 0x005, 404 DEVSTAT_TYPE_SCANNER = 0x006, 405 DEVSTAT_TYPE_OPTICAL = 0x007, 406 DEVSTAT_TYPE_CHANGER = 0x008, 407 DEVSTAT_TYPE_COMM = 0x009, 408 DEVSTAT_TYPE_ASC0 = 0x00a, 409 DEVSTAT_TYPE_ASC1 = 0x00b, 410 DEVSTAT_TYPE_STORARRAY = 0x00c, 411 DEVSTAT_TYPE_ENCLOSURE = 0x00d, 412 DEVSTAT_TYPE_FLOPPY = 0x00e, 413 DEVSTAT_TYPE_MASK = 0x00f, 414 DEVSTAT_TYPE_IF_SCSI = 0x010, 415 DEVSTAT_TYPE_IF_IDE = 0x020, 416 DEVSTAT_TYPE_IF_OTHER = 0x030, 417 DEVSTAT_TYPE_IF_NVME = 0x040, 418 DEVSTAT_TYPE_IF_MASK = 0x0f0, 419 DEVSTAT_TYPE_PASS = 0x100 420} devstat_type_flags; 421.Ed 422.Pp 423Devices have a priority associated with them, which controls roughly where 424they are placed in the 425.Nm 426list. 427The priorities are as follows: 428.Bd -literal -offset indent 429typedef enum { 430 DEVSTAT_PRIORITY_MIN = 0x000, 431 DEVSTAT_PRIORITY_OTHER = 0x020, 432 DEVSTAT_PRIORITY_PASS = 0x030, 433 DEVSTAT_PRIORITY_FD = 0x040, 434 DEVSTAT_PRIORITY_WFD = 0x050, 435 DEVSTAT_PRIORITY_TAPE = 0x060, 436 DEVSTAT_PRIORITY_CD = 0x090, 437 DEVSTAT_PRIORITY_DISK = 0x110, 438 DEVSTAT_PRIORITY_ARRAY = 0x120, 439 DEVSTAT_PRIORITY_MAX = 0xfff 440} devstat_priority; 441.Ed 442.Pp 443Each device has associated with it flags to indicate what operations are 444supported or not supported. 445The 446.Va devstat_support_flags 447values are as follows: 448.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS 449.It DEVSTAT_ALL_SUPPORTED 450Every statistic type is supported by the device. 451.It DEVSTAT_NO_BLOCKSIZE 452This device does not have a blocksize. 453.It DEVSTAT_NO_ORDERED_TAGS 454This device does not support ordered tags. 455.It DEVSTAT_BS_UNAVAILABLE 456This device supports a blocksize, but it is currently unavailable. 457This 458flag is most often used with removable media drives. 459.El 460.Pp 461Transactions to a device fall into one of three categories, which are 462represented in the 463.Va flags 464passed into 465.Fn devstat_end_transaction . 466The transaction types are as follows: 467.Bd -literal -offset indent 468typedef enum { 469 DEVSTAT_NO_DATA = 0x00, 470 DEVSTAT_READ = 0x01, 471 DEVSTAT_WRITE = 0x02, 472 DEVSTAT_FREE = 0x03 473} devstat_trans_flags; 474#define DEVSTAT_N_TRANS_FLAGS 4 475.Ed 476.Pp 477DEVSTAT_NO_DATA is a type of transactions to the device which are neither 478reads or writes. 479For instance, 480.Tn SCSI 481drivers often send a test unit ready command to 482.Tn SCSI 483devices. 484The test unit ready command does not read or write any data. 485It merely causes the device to return its status. 486.Pp 487There are four possible values for the 488.Va tag_type 489argument to 490.Fn devstat_end_transaction : 491.Bl -tag -width DEVSTAT_TAG_ORDERED 492.It DEVSTAT_TAG_SIMPLE 493The transaction had a simple tag. 494.It DEVSTAT_TAG_HEAD 495The transaction had a head of queue tag. 496.It DEVSTAT_TAG_ORDERED 497The transaction had an ordered tag. 498.It DEVSTAT_TAG_NONE 499The device does not support tags. 500.El 501.Pp 502The tag type values correspond to the lower four bits of the 503.Tn SCSI 504tag definitions. 505In CAM, for instance, the 506.Va tag_action 507from the CCB is ORed with 0xf to determine the tag type to pass in to 508.Fn devstat_end_transaction . 509.Pp 510There is a macro, 511.Dv DEVSTAT_VERSION 512that is defined in 513.In sys/devicestat.h . 514This is the current version of the 515.Nm 516subsystem, and it should be incremented each time a change is made that 517would require recompilation of userland programs that access 518.Nm 519statistics. 520Userland programs use this version, via the 521.Va kern.devstat.version 522.Nm sysctl 523variable to determine whether they are in sync with the kernel 524.Nm 525structures. 526.Sh SEE ALSO 527.Xr systat 1 , 528.Xr devstat 3 , 529.Xr iostat 8 , 530.Xr rpc.rstatd 8 , 531.Xr vmstat 8 532.Sh HISTORY 533The 534.Nm 535statistics system appeared in 536.Fx 3.0 . 537.Sh AUTHORS 538.An Kenneth Merry Aq Mt ken@FreeBSD.org 539.Sh BUGS 540There may be a need for 541.Fn spl 542protection around some of the 543.Nm 544list manipulation code to ensure, for example, that the list of devices 545is not changed while someone is fetching the 546.Va kern.devstat.all 547.Nm sysctl 548variable. 549