1.\" 2.\" Copyright (c) 1998, 1999 Kenneth D. Merry. 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 3. The name of the author may not be used to endorse or promote products 14.\" derived from this software without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" $FreeBSD$ 29.\" 30.Dd August 22, 2018 31.Dt DEVSTAT 9 32.Os 33.Sh NAME 34.Nm devstat , 35.Nm devstat_add_entry , 36.Nm devstat_end_transaction , 37.Nm devstat_end_transaction_bio , 38.Nm devstat_end_transaction_bio_bt , 39.Nm devstat_remove_entry , 40.Nm devstat_start_transaction , 41.Nm devstat_start_transaction_bio 42.Nd kernel interface for keeping device statistics 43.Sh SYNOPSIS 44.In sys/devicestat.h 45.Ft void 46.Fo devstat_add_entry 47.Fa "struct devstat *ds" 48.Fa "const char *dev_name" 49.Fa "int unit_number" 50.Fa "uint32_t block_size" 51.Fa "devstat_support_flags flags" 52.Fa "devstat_type_flags device_type" 53.Fa "devstat_priority priority" 54.Fc 55.Ft void 56.Fn devstat_remove_entry "struct devstat *ds" 57.Ft void 58.Fo devstat_start_transaction 59.Fa "struct devstat *ds" 60.Fa "const struct bintime *now" 61.Fc 62.Ft void 63.Fo devstat_start_transaction_bio 64.Fa "struct devstat *ds" 65.Fa "struct bio *bp" 66.Fc 67.Ft void 68.Fo devstat_end_transaction 69.Fa "struct devstat *ds" 70.Fa "uint32_t bytes" 71.Fa "devstat_tag_type tag_type" 72.Fa "devstat_trans_flags flags" 73.Fa "const struct bintime *now" 74.Fa "const struct bintime *then" 75.Fc 76.Ft void 77.Fo devstat_end_transaction_bio 78.Fa "struct devstat *ds" 79.Fa "const struct bio *bp" 80.Fc 81.Fc 82.Ft void 83.Fo devstat_end_transaction_bio_bt 84.Fa "struct devstat *ds" 85.Fa "const struct bio *bp" 86.Fa "const struct bintime *now" 87.Fc 88.Sh DESCRIPTION 89The devstat subsystem is an interface for recording device 90statistics, as its name implies. 91The idea is to keep reasonably detailed 92statistics while utilizing a minimum amount of CPU time to record them. 93Thus, no statistical calculations are actually performed in the kernel 94portion of the 95.Nm 96code. 97Instead, that is left for user programs to handle. 98.Pp 99The historical and antiquated 100.Nm 101model assumed a single active IO operation per device, which is not accurate 102for most disk-like drivers in the 2000s and beyond. 103New consumers of the interface should almost certainly use only the "bio" 104variants of the start and end transacation routines. 105.Pp 106.Fn devstat_add_entry 107registers a device with the 108.Nm 109subsystem. 110The caller is expected to have already allocated \fBand zeroed\fR 111the devstat structure before calling this function. 112.Fn devstat_add_entry 113takes several arguments: 114.Bl -tag -width device_type 115.It ds 116The 117.Va devstat 118structure, allocated and zeroed by the client. 119.It dev_name 120The device name, e.g., da, cd, sa. 121.It unit_number 122Device unit number. 123.It block_size 124Block size of the device, if supported. 125If the device does not support a 126block size, or if the blocksize is unknown at the time the device is added 127to the 128.Nm 129list, it should be set to 0. 130.It flags 131Flags indicating operations supported or not supported by the device. 132See below for details. 133.It device_type 134The device type. 135This is broken into three sections: base device type 136(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI 137or other) and a pass-through flag to indicate pas-through devices. 138See below for a complete list of types. 139.It priority 140The device priority. 141The priority is used to determine how devices are 142sorted within 143.Nm devstat Ns 's 144list of devices. 145Devices are sorted first by priority (highest to lowest), 146and then by attach order. 147See below for a complete list of available 148priorities. 149.El 150.Pp 151.Fn devstat_remove_entry 152removes a device from the 153.Nm 154subsystem. 155It takes the devstat structure for the device in question as 156an argument. 157The 158.Nm 159generation number is incremented and the number of devices is decremented. 160.Pp 161.Fn devstat_start_transaction 162registers the start of a transaction with the 163.Nm 164subsystem. 165Optionally, if the caller already has a 166.Fn binuptime 167value available, it may be passed in 168.Fa *now . 169Usually the caller can just pass 170.Dv NULL 171for 172.Fa now , 173and the routine will gather the current 174.Fn binuptime 175itself. 176The busy count is incremented with each transaction start. 177When a device goes from idle to busy, the system uptime is recorded in the 178.Va busy_from 179field of the 180.Va devstat 181structure. 182.Pp 183.Fn devstat_start_transaction_bio 184records the 185.Fn binuptime 186in the provided bio's 187.Fa bio_t0 188and then invokes 189.Fn devstat_start_transaction . 190.Pp 191.Fn devstat_end_transaction 192registers the end of a transaction with the 193.Nm 194subsystem. 195It takes six arguments: 196.Bl -tag -width tag_type 197.It ds 198The 199.Va devstat 200structure for the device in question. 201.It bytes 202The number of bytes transferred in this transaction. 203.It tag_type 204Transaction tag type. 205See below for tag types. 206.It flags 207Transaction flags indicating whether the transaction was a read, write, or 208whether no data was transferred. 209.It now 210The 211.Fn binuptime 212at the end of the transaction, or 213.Dv NULL . 214.It then 215The 216.Fn binuptime 217at the beginning of the transaction, or 218.Dv NULL . 219.El 220.Pp 221If 222.Fa now 223is 224.Dv NULL , 225it collects the current time from 226.Fn binuptime . 227If 228.Fa then 229is 230.Dv NULL , 231the operation is not tracked in the 232.Va devstat 233.Fa duration 234table. 235.Pp 236.Fn devstat_end_transaction_bio 237is a thin wrapper for 238.Fn devstat_end_transaction_bio_bt 239with a 240.Dv NULL 241.Fa now 242parameter. 243.Pp 244.Fn devstat_end_transaction_bio_bt 245is a wrapper for 246.Fn devstat_end_transaction 247which pulls all needed information from a 248.Va "struct bio" 249prepared by 250.Fn devstat_start_transaction_bio . 251The bio must be ready for 252.Fn biodone 253(i.e., 254.Fa bio_bcount 255and 256.Fa bio_resid 257must be correctly initialized). 258.Pp 259The 260.Va devstat 261structure is composed of the following fields: 262.Bl -tag -width dev_creation_time 263.It sequence0 , 264.It sequence1 265An implementation detail used to gather consistent snapshots of device 266statistics. 267.It start_count 268Number of operations started. 269.It end_count 270Number of operations completed. 271The 272.Dq busy_count 273can be calculated by subtracting 274.Fa end_count 275from 276.Fa start_count . 277.Fa ( sequence0 278and 279.Fa sequence1 280are used to get a consistent snapshot.) 281This is the current number of outstanding transactions for the device. 282This should never go below zero, and on an idle device it should be zero. 283If either one of these conditions is not true, it indicates a problem. 284.Pp 285There should be one and only one 286transaction start event and one transaction end event for each transaction. 287.It dev_links 288Each 289.Va devstat 290structure is placed in a linked list when it is registered. 291The 292.Va dev_links 293field contains a pointer to the next entry in the list of 294.Va devstat 295structures. 296.It device_number 297The device number is a unique identifier for each device. 298The device 299number is incremented for each new device that is registered. 300The device 301number is currently only a 32-bit integer, but it could be enlarged if 302someone has a system with more than four billion device arrival events. 303.It device_name 304The device name is a text string given by the registering driver to 305identify itself. 306(e.g., 307.Dq da , 308.Dq cd , 309.Dq sa , 310etc.) 311.It unit_number 312The unit number identifies the particular instance of the peripheral driver 313in question. 314.It bytes[4] 315This array contains the number of bytes that have been read (index 316.Dv DEVSTAT_READ ) , 317written (index 318.Dv DEVSTAT_WRITE ) , 319freed or erased (index 320.Dv DEVSTAT_FREE ) , 321or other (index 322.Dv DEVSTAT_NO_DATA ) . 323All values are unsigned 64-bit integers. 324.It operations[4] 325This array contains the number of operations of a given type that have been 326performed. 327The indices are identical to those for 328.Fa bytes 329above. 330.Dv DEVSTAT_NO_DATA 331or "other" represents the number of transactions to the device which are 332neither reads, writes, nor frees. 333For instance, 334.Tn SCSI 335drivers often send a test unit ready command to 336.Tn SCSI 337devices. 338The test unit ready command does not read or write any data. 339It merely causes the device to return its status. 340.It duration[4] 341This array contains the total bintime corresponding to completed operations of 342a given type. 343The indices are identical to those for 344.Fa bytes 345above. 346(Operations that complete using the historical 347.Fn devstat_end_transaction 348API and do not provide a non-NULL 349.Fa then 350are not accounted for.) 351.It busy_time 352This is the amount of time that the device busy count has been greater than 353zero. 354This is only updated when the busy count returns to zero. 355.It creation_time 356This is the time, as reported by 357.Fn getmicrotime 358that the device was registered. 359.It block_size 360This is the block size of the device, if the device has a block size. 361.It tag_types 362This is an array of counters to record the number of various tag types that 363are sent to a device. 364See below for a list of tag types. 365.It busy_from 366If the device is not busy, this was the time that a transaction last completed. 367If the device is busy, this the most recent of either the time that the device 368became busy, or the time that the last transaction completed. 369.It flags 370These flags indicate which statistics measurements are supported by a 371particular device. 372These flags are primarily intended to serve as an aid 373to userland programs that decipher the statistics. 374.It device_type 375This is the device type. 376It consists of three parts: the device type 377(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE, 378SCSI or other) and whether or not the device in question is a pass-through 379driver. 380See below for a complete list of device types. 381.It priority 382This is the priority. 383This is the first parameter used to determine where 384to insert a device in the 385.Nm 386list. 387The second parameter is attach order. 388See below for a list of available priorities. 389.El 390.Pp 391Each device is given a device type. 392Pass-through devices have the same underlying device type and interface as the 393device they provide an interface for, but they also have the pass-through flag 394set. 395The base device types are identical to the 396.Tn SCSI 397device type numbers, so with 398.Tn SCSI 399peripherals, the device type returned from an inquiry is usually ORed with the 400.Tn SCSI 401interface type and the pass-through flag if appropriate. 402The device type 403flags are as follows: 404.Bd -literal -offset indent 405typedef enum { 406 DEVSTAT_TYPE_DIRECT = 0x000, 407 DEVSTAT_TYPE_SEQUENTIAL = 0x001, 408 DEVSTAT_TYPE_PRINTER = 0x002, 409 DEVSTAT_TYPE_PROCESSOR = 0x003, 410 DEVSTAT_TYPE_WORM = 0x004, 411 DEVSTAT_TYPE_CDROM = 0x005, 412 DEVSTAT_TYPE_SCANNER = 0x006, 413 DEVSTAT_TYPE_OPTICAL = 0x007, 414 DEVSTAT_TYPE_CHANGER = 0x008, 415 DEVSTAT_TYPE_COMM = 0x009, 416 DEVSTAT_TYPE_ASC0 = 0x00a, 417 DEVSTAT_TYPE_ASC1 = 0x00b, 418 DEVSTAT_TYPE_STORARRAY = 0x00c, 419 DEVSTAT_TYPE_ENCLOSURE = 0x00d, 420 DEVSTAT_TYPE_FLOPPY = 0x00e, 421 DEVSTAT_TYPE_MASK = 0x00f, 422 DEVSTAT_TYPE_IF_SCSI = 0x010, 423 DEVSTAT_TYPE_IF_IDE = 0x020, 424 DEVSTAT_TYPE_IF_OTHER = 0x030, 425 DEVSTAT_TYPE_IF_MASK = 0x0f0, 426 DEVSTAT_TYPE_PASS = 0x100 427} devstat_type_flags; 428.Ed 429.Pp 430Devices have a priority associated with them, which controls roughly where 431they are placed in the 432.Nm 433list. 434The priorities are as follows: 435.Bd -literal -offset indent 436typedef enum { 437 DEVSTAT_PRIORITY_MIN = 0x000, 438 DEVSTAT_PRIORITY_OTHER = 0x020, 439 DEVSTAT_PRIORITY_PASS = 0x030, 440 DEVSTAT_PRIORITY_FD = 0x040, 441 DEVSTAT_PRIORITY_WFD = 0x050, 442 DEVSTAT_PRIORITY_TAPE = 0x060, 443 DEVSTAT_PRIORITY_CD = 0x090, 444 DEVSTAT_PRIORITY_DISK = 0x110, 445 DEVSTAT_PRIORITY_ARRAY = 0x120, 446 DEVSTAT_PRIORITY_MAX = 0xfff 447} devstat_priority; 448.Ed 449.Pp 450Each device has associated with it flags to indicate what operations are 451supported or not supported. 452The 453.Va devstat_support_flags 454values are as follows: 455.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS 456.It DEVSTAT_ALL_SUPPORTED 457Every statistic type is supported by the device. 458.It DEVSTAT_NO_BLOCKSIZE 459This device does not have a blocksize. 460.It DEVSTAT_NO_ORDERED_TAGS 461This device does not support ordered tags. 462.It DEVSTAT_BS_UNAVAILABLE 463This device supports a blocksize, but it is currently unavailable. 464This 465flag is most often used with removable media drives. 466.El 467.Pp 468Transactions to a device fall into one of three categories, which are 469represented in the 470.Va flags 471passed into 472.Fn devstat_end_transaction . 473The transaction types are as follows: 474.Bd -literal -offset indent 475typedef enum { 476 DEVSTAT_NO_DATA = 0x00, 477 DEVSTAT_READ = 0x01, 478 DEVSTAT_WRITE = 0x02, 479 DEVSTAT_FREE = 0x03 480} devstat_trans_flags; 481.Ed 482.Pp 483There are four possible values for the 484.Va tag_type 485argument to 486.Fn devstat_end_transaction : 487.Bl -tag -width DEVSTAT_TAG_ORDERED 488.It DEVSTAT_TAG_SIMPLE 489The transaction had a simple tag. 490.It DEVSTAT_TAG_HEAD 491The transaction had a head of queue tag. 492.It DEVSTAT_TAG_ORDERED 493The transaction had an ordered tag. 494.It DEVSTAT_TAG_NONE 495The device does not support tags. 496.El 497.Pp 498The tag type values correspond to the lower four bits of the 499.Tn SCSI 500tag definitions. 501In CAM, for instance, the 502.Va tag_action 503from the CCB is ORed with 0xf to determine the tag type to pass in to 504.Fn devstat_end_transaction . 505.Pp 506There is a macro, 507.Dv DEVSTAT_VERSION 508that is defined in 509.In sys/devicestat.h . 510This is the current version of the 511.Nm 512subsystem, and it should be incremented each time a change is made that 513would require recompilation of userland programs that access 514.Nm 515statistics. 516Userland programs use this version, via the 517.Va kern.devstat.version 518.Nm sysctl 519variable to determine whether they are in sync with the kernel 520.Nm 521structures. 522.Sh SEE ALSO 523.Xr systat 1 , 524.Xr devstat 3 , 525.Xr iostat 8 , 526.Xr rpc.rstatd 8 , 527.Xr vmstat 8 528.Sh HISTORY 529The 530.Nm 531statistics system appeared in 532.Fx 3.0 . 533.Sh AUTHORS 534.An Kenneth Merry Aq Mt ken@FreeBSD.org 535.Sh BUGS 536There may be a need for 537.Fn spl 538protection around some of the 539.Nm 540list manipulation code to ensure, for example, that the list of devices 541is not changed while someone is fetching the 542.Va kern.devstat.all 543.Nm sysctl 544variable. 545