1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2024 Oxide Computer Company 13.\" 14.Dd May 23, 2024 15.Dt INTRO 9E 16.Os 17.Sh NAME 18.Nm Intro 19.Nd introduction to device driver entry points 20.Sh DESCRIPTION 21Section 9E of the manual describes the entry points and building blocks that are 22used to build and implement all kinds of device drivers and kernel modules. 23Often times, modules and device drivers are talked about interchangeably. 24The operating system is built around the idea of loadable kernel modules. 25Device drivers are the primary type that we think about; however, there are 26loadable kernel modules for file systems, STREAMS devices, and even system 27calls! 28.Pp 29The vast majority of this section focuses on documenting device 30.Pq and STREAMS 31drivers. 32Device driver are further broken down into different categories depending on 33what they are targeting. 34For example, there are dedicated frameworks for SCSI/SAS HBA drivers, networking 35drivers, USB drivers, and then general character and block device drivers. 36While most of the time we think about device drivers as corresponding to a piece 37of physical hardware, there are also pseudo-device drivers which are device 38drivers that provide functionality, but aren't backed by any hardware. 39For example, 40.Xr dtrace 4D 41and 42.Xr lofi 4D 43are both pseudo-device drivers. 44.Pp 45To help understand the relationship between these different types of things, 46consider the following image: 47.Bd -literal 48 +--------------------+ 49 | | 50 | Loadable Modules | 51 | | 52 +--------------------+ 53 | +--------------+ +------------+ 54 | | | | | 55 +------------------------->| Cryptography | ... | Scheduling | ... 56 | | | | | 57 | +--------------+ +------------+ 58 | +----------------+ +--------------+ +--------------+ 59 | | | | | | | 60 +-->| Device Drivers | ... | File Systems | ... | System Calls | ... 61 | | | | | | 62 +----------------+ +--------------+ +--------------+ 63 v 64 +-----------+ 65 | 66 | +------------+ +---------+ +-----------+ +-----------+ 67 +-->| Networking |->| igb(4D) | ... | mlxcx(4D) | ... | cxgbe(4D) | ... 68 | +------------+ +---------+ +-----------+ +-----------+ 69 | 70 | +-------+ +----------+ +-------------+ +----------+ 71 +-->| HBA |------>| smrt(4D) | ... | mpt_sas(4D) | ... | ahci(4D) | ... 72 | +-------+ +----------+ +-------------+ +----------+ 73 | 74 | +-------+ +--------------+ +----------+ +---------+ 75 +-->| USB |------>| scsa2usb(4D) | ... | ccid(4D) | ... | hid(4D) | ... 76 | +-------+ +--------------+ +----------+ +---------+ 77 | 78 | +---------+ +-------------+ +-------------+ 79 +-->| Sensors |---->| smntemp(4D) | ... | pchtemp(4D) | ... 80 | +---------+ +-------------+ +-------------+ 81 | 82 +-------+-------------+-----------+----------+ 83 | v V | 84 v +-----------+ +-----+ v 85 +-------+ | Character | | USB | +-------+ 86 | Audio | | and Block | | HCD | | Nexus | ... 87 +-------+ | Devices | +-----+ +-------+ 88 +-----------+ 89.Ed 90.Pp 91The above diagram attempts to explain some of the relationships that were 92mentioned above at a high level. 93All device drivers are loadable modules that leverage the 94.Xr modldrv 9S 95structure and implement similar 96.Xr _init 9E 97and 98.Xr _fini 9E 99entry points. 100.Pp 101Some hardware implements more than one type of thing. 102The most common example here would be a NIC that implements a temperature sensor 103or a current sensor. 104Many devices also implement and leverage the kernel statistics framework called 105.Dq kstats . 106A device driver is not strictly limited to only a single class of thing. 107For example, many USB client devices are networking device drivers. 108In the subsequent sections we'll go into the functions and structures that are 109related to creating the different device drivers and their associated 110functions. 111.Ss Kernel Initialization 112To begin with, all loadable modules in the system are required to implement 113three entry points. 114If these entry points are not present, then the module cannot be installed in 115the system. 116These entry points are 117.Xr _init 9E , 118.Xr _fini 9E , 119and 120.Xr _info 9E . 121.Pp 122The 123.Xr _init 9E 124entry point will be the first thing called in the module and this is where 125any global initialization should be taken care of. 126Once all global state has been successfully created, the driver should call 127.Xr mod_install 9F 128to actually register with the system. 129Conversely, 130.Xr _fini 9E 131is used to tear down the module. 132The driver uses 133.Xr mod_remove 9F 134to first remove the driver from the system and then it can tear down any global 135state that was added there. 136.Pp 137While we mention global state here, this isn't widely used in most device 138drivers. 139A device driver can have multiple instances instantiated, one for each instance 140of a hardware device that is found and most state is tied to those instances. 141We'll discuss that more in the next section. 142.Pp 143The 144.Xr _info 9E 145entry point these days just calls 146.Xr mod_info 9F 147directly and can return it. 148.Pp 149All of these entry points directly or indirectly require a 150.Vt "struct modlinkage" . 151This structure is used by all types of loadable kernel modules and is filled in 152with information that varies based on the type of module one is creating. 153Here, everything that we're creating is going to use a 154.Vt "struct modldrv" , 155which describes a loadable driver. 156Every device driver will declare a static global variable for these and fill 157them out. 158They are documented in 159.Xr modlinkage 9S 160and 161.Xr modldrv 9S 162respectively. 163.Pp 164The following is an example of these structures borrowed from 165.Xr igc 4D : 166.Bd -literal 167static struct modldrv igc_modldrv = { 168 .drv_modops = &mod_driverops, 169 .drv_linkinfo = "Intel I226/226 Ethernet Controller", 170 .drv_dev_ops = &igc_dev_ops 171}; 172 173static struct modlinkage igc_modlinkage = { 174 .ml_rev = MODREV_1, 175 .ml_linkage = { &igc_modldrv, NULL } 176}; 177.Ed 178.Pp 179From this there are a few important things to take away. 180A single kernel module may implement more than one type of linkage, though this 181is the exception and not the norm. 182The second part to call out here is that while the 183.Fa drv_modops 184will be the same for all drivers that use the 185.Vt "struct modldrv" , 186the 187.Fa drv_linkinfo 188and 189.Fa drv_dev_ops 190will be unique to each driver. 191The next section discusses the 192.Vt "struct dev_ops" . 193.Ss The Devices Tree and Instances 194Device drivers have a unique challenge that makes them different from other 195kinds of loadable modules: there may be very well more than a single instance of 196the hardware that they support. 197Consider a few examples: a user can plug in two distinct USB mass storage 198devices or keyboards. 199A system may have more than one NIC present or the hardware may expose multiple 200physical ports as distinct devices. 201Many systems have more than one disk device. 202Conversely, if a given piece of hardware isn't present then there's no reason 203for the driver for it to be loaded. 204There is nothing that the Intel 1 GbE Ethernet NIC driver, 205.Xr igb 4D , 206can do if there are no supported devices plugged in. 207.Pp 208Devices are organized into a tree that is full of parent and child 209relationships. 210This tree is what you see when you run 211.Xr prtconf 8 . 212As an example, a USB device is plugged into a port on a hub, which may be 213plugged into another hub, and then is eventually plugged into a PCI device that 214is the USB host controller, which itself may be under a PCI-PCI bridge, and this 215chain continues all the way up to the root of the tree, which we call 216.Dq rootnex . 217Device drivers that can enumerate children and provide operations for them are 218called 219.Dq nexus 220drivers. 221.Pp 222The system automatically fills out the device tree through a combination of 223built-in mechanisms and through operations on other nexus drivers. 224When a new hardware unit is discovered, a 225.Vt dev_info_t 226structure, the device information, is created for it and it is linked into the 227tree. 228Generally, the system can then use automatic information embedded in the device 229to determine what driver is responsible for the piece of hardware through the 230use of the 231.Dq compatible 232property which the systems and nexus drivers set up on their children. 233For example, PCI and PCIe drivers automatically set up the compatible property 234based on information discovered in PCI configuration space like the device's 235vendor, device ID, and class IDs. 236The same is true of USB. 237.Pp 238When a device driver is packaged, it contains metadata that indicates which 239devices it supports. 240For example, the aforementioned igb driver will have a rule that it matches 241.Dq pciex8086,10a7 . 242When the kernel discovers a device with this alias present, it will know that it 243should assign it to the igb driver and then it will assign the 244.Vt dev_info_t 245structure a new instance number. 246.Pp 247To emphasize here, each time the device is discovered in the tree, it will have 248an independent instance number and an independent 249.Vt dev_info_t 250that accompanies it. 251Each instance has an independent life time too. 252The most obvious way to think about this is with something that can be 253physically removed while the system is on, like a USB device. 254Just because you pull one USB keyboard doesn't mean it impacts the other one 255there. 256They are inherently different devices 257.Po 258albeit if they were plugged into the same HUB and the HUB was removed, then they 259both would be removed; however, each would be acted on independently 260.Pc . 261.Pp 262Here is a slimmed down example from a system's 263.Xr prtconf 8 264output: 265.Bd -literal 266Oxide,Gimlet (driver name: rootnex) 267 scsi_vhci, instance #0 (driver name: scsi_vhci) 268 pci, instance #0 (driver name: npe) 269 pci1022,1480, instance #13 (driver name: amdzen_stub) 270 pci1022,164f 271 pci1022,1482 272 pci1de,fff9, instance #0 (driver name: pcieb) 273 pci1344,3100, instance #4 (driver name: nvme) 274 blkdev, instance #10 (driver name: blkdev) 275 pci1022,1482 276 pci1022,1482 277 pci1de,fff9, instance #1 (driver name: pcieb) 278 pci1b96,0, instance #7 (driver name: nvme) 279 blkdev, instance #0 (driver name: blkdev) 280 pci1de,fff9, instance #2 (driver name: pcieb) 281 pci1b96,0, instance #8 (driver name: nvme) 282 blkdev, instance #4 (driver name: blkdev) 283 pci1de,fff9, instance #3 (driver name: pcieb) 284 pci1b96,0, instance #10 (driver name: nvme) 285 blkdev, instance #1 (driver name: blkdev) 286.Ed 287.Pp 288From this we can see that there are multiple instances of the NVMe 289.Pq nvme , 290PCIe bridge 291.Pq pcieb , 292and 293generic block device 294.Pq blkdev 295driver present. 296Each of these has their own 297.Vt dev_info_t 298and has their various entry points called in parallel. 299With that, let's dig into the specifics of what the 300.Vt "struct dev_ops" 301actually is and the different operations to be aware. 302.Ss struct dev_ops 303The device operations structure, 304.Vt "struct dev_ops" , 305controls all of the basic entry points that a loadable device contains. 306This is something that every driver has to implement, no matter the type. 307The most important things that will be present are the 308.Fa devo_attach 309and 310.Fa devo_detach 311members which are used to create and destroy instances of the driver and then a 312pointer to any subsequent operations that exist, such as the 313.Fa devo_cb_ops , 314which is used for character and block device drivers and the 315.Fa devo_bus_ops , 316which is used for nexus drivers. 317.Pp 318Attach and detach are the most important entry points in this structure. 319This could be practically thought of as the 320.Dq main 321function entry point for a device driver. 322This is where any initialization of the instance will occur. 323This would include many traditional things like setting up access to registers, 324allocating and assigning interrupts, and interfacing with the various other 325device driver frameworks such as 326.Xr mac 9E . 327.Pp 328The actions taken here are generally device-specific, while certain classes of 329devices 330.Pq e.g. PCI, USB, etc. 331will have overlapping concerns. 332In addition, this is where the driver will take care of creating anything like a 333minor node which will be used to access it by userland software if it's a 334character or block device driver. 335.Pp 336There is generally a per-instance data structure that a driver creates. 337It may do this by calling 338.Xr kmem_zalloc 9F 339and assigning the structure with the 340.Xr ddi_set_driver_private 9F 341entry point or it may use the DDI's soft state management functions rooted in 342.Xr ddi_soft_state_init 9F . 343A driver should try to tie as much state to the instance as possible, where 344possible. 345There should not be anything like a fixed size global array of possible 346instances. 347Someone usually finds a way to attach many more instances of some type of 348hardware than you might expect! 349.Pp 350The 351.Xr attach 9E 352and 353.Xr detach 9E 354entry points both have a unique command argument that is used to describe a 355specific action that is going on. 356This action may be a normal attach or it could be related to putting the system 357into the ACPI S3 sleep or similar state with the suspend and resume commands. 358.Pp 359The following table are the common functions that most drivers end up having to 360think a little bit about: 361.Vt "struct dev_ops" : 362.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 363.It Xr attach 9E Ta Xr detach 9E 364.It Xr getinfo 9E Ta Xr quiesce 9E 365.El 366.Pp 367Briefly, the 368.Xr getinfo 9E 369entry point is used to map between instances of a device driver and the minor 370nodes it creates. 371Drivers that participate in a framework like the SCSI HBA, Networking, or 372related don't usually end up implementing this. 373However, drivers that manually create minor nodes generally do. 374The 375.Xr quiesce 9E 376entry point is used as part of the fast reboot operation. 377It is basically intended to stop and/or reset the hardware and discard any 378ongoing I/O. 379For pseudo-device drivers or drivers which do not perform I/O, they can use the 380symbol 381.Ql ddi_quiesce_not_needed 382in lieu of a standard implementation. 383.Pp 384In addition, the following additional entry points exist, but are less commonly 385required either because the system generally takes care of it, such as 386.Xr probe 9E . 387.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 388.It Xr identify 9E Ta Xr power 9E 389.It Xr probe 9E Ta 390.El 391.Pp 392For more information on the structure, see also 393.Xr dev_ops 9S . 394The following are a few examples of the 395.Vt "struct dev_ops" 396structure from a few drivers. 397We recommend using the C99 style for all new instances. 398.Bd -literal 399static struct dev_ops ksensor_dev_ops = { 400 .devo_rev = DEVO_REV, 401 .devo_refcnt = 0, 402 .devo_getinfo = ksensor_getinfo, 403 .devo_identify = nulldev, 404 .devo_probe = nulldev, 405 .devo_attach = ksensor_attach, 406 .devo_detach = ksensor_detach, 407 .devo_reset = nodev, 408 .devo_power = ddi_power, 409 .devo_quiesce = ddi_quiesce_not_needed, 410 .devo_cb_ops = &ksensor_cb_ops 411}; 412 413static struct dev_ops igc_dev_ops = { 414 .devo_rev = DEVO_REV, 415 .devo_refcnt = 0, 416 .devo_getinfo = NULL, 417 .devo_identify = nulldev, 418 .devo_probe = nulldev, 419 .devo_attach = igc_attach, 420 .devo_detach = igc_detach, 421 .devo_reset = nodev, 422 .devo_quiesce = ddi_quiesce_not_supported, 423 .devo_cb_ops = &igc_cb_ops 424}; 425 426static struct dev_ops pchtemp_dev_ops = { 427 .devo_rev = DEVO_REV, 428 .devo_refcnt = 0, 429 .devo_getinfo = nodev, 430 .devo_identify = nulldev, 431 .devo_probe = nulldev, 432 .devo_attach = pchtemp_attach, 433 .devo_detach = pchtemp_detach, 434 .devo_reset = nodev, 435 .devo_quiesce = ddi_quiesce_not_needed 436}; 437.Ed 438.Ss Character and Block Operations 439In the history of UNIX, the most common device drivers that were created were 440for block and character devices. 441The interfaces in block and character devices are usually in service of common 442I/O patterns that the system exposes. 443For example, when you call 444.Xr open 2 , 445.Xr ioctl 2 , 446or 447.Xr read 2 448on a device, it goes through the device's corresponding entry point here. 449Both block and character devices operate on the shared 450.Vt "struct cb_ops" 451structure, with different members being expected for both of them. 452While they both require that someone implement the 453.Fa cb_open 454and 455.Fa cb_close 456members, block devices perform I/O through the 457.Xr strategy 9E 458entry point and support the 459.Xr dump 9E 460entry point for kernel crash dumps, while character devices implement the more 461historically familiar 462.Xr read 9E , 463.Xr write 9E, 464and the 465.Xr devmap 9E 466entry point for supporting memory-mapping. 467.Pp 468While the device operations structures worked with the 469.Vt dev_info_t 470structure and there was one per-instance, character and block operations work 471with minor nodes: named entities that exist in the file system. 472UNIX has long had the idea of a major and minor number that is encoded in the 473.Vt dev_t 474which is embedded in the file system, which is what you see in the 475.Fa st_rdev 476member of stat structure when you call 477.Xr stat 2 . 478The major number is assigned to the driver 479.Em as a whole , 480not an instance. 481The minor number space is shared between all instances of a driver. 482Minor node numbers are assigned by the driver when it calls 483.Xr ddi_create_minor_node 9F 484to create a minor node and when one of its character or block entry points are 485called, it will get this minor number back and it must translate it to the 486corresponding instance on its own. 487.Pp 488A special property of the 489.Xr open 9E 490entry point is that it can change the minor number a client gets during its call 491to open which it will use for all subsequent calls. 492This is called a 493.Dq cloning 494open. 495Whether this is used or not depends on the type of driver that you are creating. 496For example, many pseudo-device drivers like DTrace will use this so each client 497has its own state. 498Similarly, devices that have certain internal locking and transaction schemes 499will give each caller a unique minor. 500The 501.Xr ccid 4D 502and 503.Xr nvme 4D 504driver are examples of this. 505However, many drivers will have just a single minor node per instance and just 506say that the minor node's number is the instance number, making it very simple 507to figure out the mapping. 508When it's not so simple, often an AVL tree or some other structure is used to 509help map this together. 510.Pp 511The following entry points are generally used for character devices: 512.Bl -tag -width Ds 513.It Xr ioctl 9E 514The I/O control or ioctl entry point is used extensively throughout the system 515to perform different kinds of operations. 516These operations are often driver specific, though there are also some which are 517also common operations that are used across multiple devices like the disk 518operations described in 519.Xr dkio 4I 520or the ioctls that are used under the hood by 521.Xr cfgadm 8 522and friends. 523.Pp 524Whether a driver supports ioctls or not depends on it. 525If it does, it is up to the driver to always perform any requisite privilege and 526permission checking as well as take care in copying in and out any kind of 527memory from the user process through calls like 528.Xr ddi_copyin 9F 529and 530.Xr ddi_copyout 9F . 531.Pp 532The ioctl interface gives the driver writer great flexibility to create equally 533useful or hard to consume interfaces. 534When crafting a new committed interface over an ioctl, take care to ensure there 535is an ability to version the structure or use something that has more 536flexibility like a 537.Vt nvlist_t . 538See the 539.Sq Copying Data to and from Userland 540section of 541.Xr Intro 9F 542for more information. 543.It Xr read 9E , Xr write 9E , Xr aread 9E , and Xr awrite 9E 544These are the classic I/O routines of the system. 545A driver's read and write routines operate on a 546.Xr uio 9S 547structure which describes the I/O that is occurring, the offset into the 548device that the I/O should occur at, and has various flags that 549describe properties of the I/O request, such as whether or not it is a 550non-blocking request. 551.Pp 552The majority of device drivers that implement these entry points are using them 553to create some kind of file-like abstraction for a device. 554For example, the 555.Xr ccid 4D 556driver uses these interfaces for submitting commands and reading responses back 557from an underlying device. 558.Pp 559For most use cases 560.Xr read 9E 561and 562.Xr write 9E 563are sufficient; however, the 564.Xr aread 9E 565and 566.Xr awrite 9E 567are versions that tie into the kernel's asynchronous I/O engine. 568.It Xr chpoll 9E 569This entry point allows a device to be polled by user code for an event of 570interest and connects through the kernel to different polling mechanisms such as 571.Xr poll 2 , 572.Xr port_get 3C , 573and many others. 574Currently this interface only allows a driver to define the classic poll style 575events such as 576.Dv POLLIN , 577.Dv POLLOUT, and 578.Dv POLLHUP . 579The exact semantics of these are up to the driver; however, it is expected that 580the read and write oriented semantics of the various events will be honored by 581the device driver. 582.It Xr devmap 9E and Xr segmap 9E 583These are entry points that are used to set up memory mappings for a device and 584replace the older 585.Xr mmap 9E 586entry point. 587When a function calls 588.Xr mmap 2 589on a device, it'll reach these, starting with the 590.Xr devmap 9E 591entry point. 592The driver is responsible for confirming that the mappings request and its 593semantics are sensible, after which it will set up memory for consumption. 594The 595.Xr devmap 9E 596manual page has more details on the specifics here and the related entry points 597that can be implemented as part of the 598.Xr devmap_callback_ctl 9S 599structures such as 600.Xr devmap_access 9E . 601The segment mapping is an optional part that provides some additional controls 602for a driver such as assigning certain mapping attributes or wanting to maintain 603separate contexts for different mappings. 604See 605.Xr segmap 9E 606for more information. 607It is common for drivers to just provide a 608.Xr devmap 9E 609entry point. 610.It Xr prop_op 9E 611This entry point is used for drive's to manage and deal with property creation. 612While this is its own entry point, most callers can just specify 613.Xr ddi_prop_op 9F 614for this and don't need any special handling. 615.El 616.Pp 617The following entry points are used uniquely used for block devices: 618.Bl -tag -width Ds 619.It Xr strategy 9E 620A driver's strategy entry point is used to actually perform I/O as described by 621the 622.Xr buf 9S 623structure. 624It is responsible for allocating all resources and then initiating the actual 625request. 626The actual request will finish potentially asynchronously through calls to 627.Xr biodone 9F 628or 629.Xr bioerror 9F . 630HBA or blkdev-based drivers do not usually end up implementing this interface. 631.It Xr dump 9E 632A driver's dump implementation is used when the operating system has had a fatal 633error and is trying to persist a crash dump to disk. 634This is a delicate operation as the system has already failed, which means many 635normal operations like interrupt handlers, timeouts, and blocking will no longer 636work. 637.El 638.Pp 639In general, the 640.Xr print 9E 641entry point for block devices is vestigial and users should fill in 642.Xr nodev 9F 643there instead. 644.Pp 645The following are some examples of different character device operations 646structures that drivers have employed. 647Note that using C99 structure definitions is preferred: 648.Bd -literal 649static struct cb_ops ksensor_cb_ops = { 650 .cb_open = ksensor_open, 651 .cb_close = ksensor_close, 652 .cb_strategy = nodev, 653 .cb_print = nodev, 654 .cb_dump = nodev, 655 .cb_read = nodev, 656 .cb_write = nodev, 657 .cb_ioctl = ksensor_ioctl, 658 .cb_devmap = nodev, 659 .cb_mmap = nodev, 660 .cb_segmap = nodev, 661 .cb_chpoll = nochpoll, 662 .cb_prop_op = ddi_prop_op, 663 .cb_flag = D_MP, 664 .cb_rev = CB_REV, 665 .cb_aread = nodev, 666 .cb_awrite = nodev 667}; 668 669static struct cb_ops vio9p_cb_ops = { 670 .cb_rev = CB_REV, 671 .cb_flag = D_NEW | D_MP, 672 .cb_open = vio9p_open, 673 .cb_close = vio9p_close, 674 .cb_read = vio9p_read, 675 .cb_write = vio9p_write, 676 .cb_ioctl = vio9p_ioctl, 677 .cb_strategy = nodev, 678 .cb_print = nodev, 679 .cb_dump = nodev, 680 .cb_devmap = nodev, 681 .cb_mmap = nodev, 682 .cb_segmap = nodev, 683 .cb_chpoll = nochpoll, 684 .cb_prop_op = ddi_prop_op, 685 .cb_str = NULL, 686 .cb_aread = nodev, 687 .cb_awrite = nodev, 688}; 689 690static struct cb_ops bd_cb_ops = { 691 bd_open, /* open */ 692 bd_close, /* close */ 693 bd_strategy, /* strategy */ 694 nodev, /* print */ 695 bd_dump, /* dump */ 696 bd_read, /* read */ 697 bd_write, /* write */ 698 bd_ioctl, /* ioctl */ 699 nodev, /* devmap */ 700 nodev, /* mmap */ 701 nodev, /* segmap */ 702 nochpoll, /* poll */ 703 bd_prop_op, /* cb_prop_op */ 704 0, /* streamtab */ 705 D_64BIT | D_MP, /* Driver compatibility flag */ 706 CB_REV, /* cb_rev */ 707 bd_aread, /* async read */ 708 bd_awrite /* async write */ 709}; 710.Ed 711.Ss Networking Drivers 712Networking device drivers come in many forms and flavors. 713They may interface to the host via PCIe, USB, be a pseudo-device, or use 714something entirely different like SPI 715.Pq Serial Peripheral Interface . 716The system provides a dedicated networking interface driver framework that is 717documented in 718.Xr mac 9E . 719This framework is sometimes also referred to as GLDv3 720.Pq Generic LAN Device version 3 . 721.Pp 722All networking drivers will still implement a basic 723.Vt "struct dev_ops" 724and a minimal 725.Vt "struct cb_ops" . 726The 727.Xr mac 9E 728framework takes care of implementing all of the standard character device entry 729points at the end of the day and instead provides a number of different 730networking-specific entry points that take care of things like getting and 731setting properties, installing and removing MAC addresses and filters, and 732actually transmitting and providing callbacks for receiving packets. 733.Pp 734Each instance of a device driver will generally have a separate registration 735with 736.Xr mac 9E . 737In other words, there is usually a one to one relationship between a driver 738having its 739.Xr attach 9E 740entry point called and it registering with the 741.Xr mac 9E 742framework. 743.Ss STREAMS Modules 744STREAMS modules are a historical way to provide certain services in the kernel. 745For networking device drivers, instead see the prior section and 746.Xr mac 9E . 747Conceptually STREAMS break things into queues, with one side being designed for 748a module to read data and another side for it write or produce data. 749These modules are arranged in a stack, with additional modules being pushed on 750for additional processing. 751For example, the TTY subsystem has a serial console as a base STREAMS module, 752but it then pushes on additional modules like the pseudo-terminal emulation 753.Po 754.Xr ptem 4M 755.Pc , 756the standard line discipline 757.Po 758.Xr ldterm 4M 759.Pc , 760etc. 761.Pp 762STREAMS drivers don't use the normal character device entry points 763.Pq though sometimes they do define them 764or even the 765.Vt "struct modldrv" . 766Instead they use the 767.Vt "struct modlstrmod" 768which is discussed in 769.Xr modlstrmod 9S , 770which in turn requires one to fill out the 771.Xr fmodsw 9S , 772.Xr streamtab 9S , 773and 774.Xr qinit 9S 775structures. 776The latter of these has two of the more common entry points: 777.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 778.It Xr put 9E Ta Xr srv 9E 779.El 780.Pp 781These entry points are used when different kinds of messages are received by the 782device driver on a queue. 783In addition, those entry points define an alternative set of entry points for 784.Xr open 9E 785and 786.Xr close 9E 787as STREAMS modules open and close routines all operate in the context of a given 788.Vt queue_t . 789There are other differences here. 790An ioctl is not a dedicated entry point, but rather a specific message type 791.Po 792.Dv M_IOCTL 793.Pc 794that is 795received in a driver's 796.Xr put 9E 797routine. 798.Pp 799Finally, it's worth noting the 800.Xr mt-streams 9F 801manual page which discusses several concurrency related considerations for 802STREAMS related drivers. 803.Ss HBA Drivers 804Host bus adapters are used to interface with the various SCSI and SAS 805controllers. 806Like with networking, the kernel provides a framework under the name of SCSA. 807HBA drivers still often implement character device entry points; however, they 808generally end up calling into shared framework entry points for 809.Xr open 9E , 810.Xr ioctl 9E , 811and 812.Xr close 9E . 813For several of the concepts related with the 3rd version for the framework, see 814.Xr iport 9 . 815.Pp 816The following entry points are associated with HBA drivers: 817.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 818.It Xr tran_abort 9E Ta Xr tran_bus_reset 9E 819.It Xr tran_dmafree 9E Ta Xr tran_getcap 9E 820.It Xr tran_init_pkt 9E Ta Xr tran_quiesce 9E 821.It Xr tran_reset 9E Ta Xr tran_reset_notify 9E 822.It Xr tran_setup_pkt 9E Ta Xr tran_start 9E 823.It Xr tran_sync_pkt 9E Ta Xr tran_tgt_free 9E 824.It Xr tran_tgt_init 9E Ta Xr tran_tgt_probe 9E 825.El 826.Pp 827In addition to these, when using SCSAv3 with iports, drivers will call 828.Xr scsi_hba_iport_register 9F 829to create various iports. 830This has the unique effect of causing the driver's top-level 831.Xr attach 9E 832entry point to be called again, but referring to the iport instead of the main 833hardware instance. 834.Ss USB Drivers 835The kernel provides a framework for USB client devices to access various USB 836services such as getting access to device and configuration descriptors, issuing 837control, bulk, interrupt, and isochronous requests, and being notified when they 838are removed from the system. 839Generally a USB device driver leverages a framework of some kind, like 840.Xr mac 9E 841in addition to the USB pieces. 842As such, there are no entry points specific to USB device drivers; however, 843there are plenty of provided functions. 844.Pp 845To get started with a USB device driver, one will generally perform some of the 846following steps: 847.Bl -enum 848.It 849Register with the USB framework by calling 850.Xr usb_client_attach 9F . 851.It 852Ask the kernel to fetch all of the device and class descriptors that are 853appropriate with the 854.Xr usb_get_dev_data 9F 855function. 856.It 857Parse the relevant descriptors to figure out which endpoints to attach. 858.It 859Open up pipes to the specific USB endpoints by using 860.Xr usb_lookup_ep_data 9F , 861.Xr usb_ep_xdescr_fill 9F , 862and 863.Xr usb_pipe_xopen 9F . 864.It 865Proceed with the rest of device initialization and service. 866.El 867.Ss Sensors 868Many devices embed sensors in them, such as a networking ASIC that tracks its 869junction temperature. 870The kernel provides the 871.Xr ksensor 9E 872.Pq kernel sensor 873framework to allow device drivers to implement sensors with a minimal set of 874callback functions. 875Any device driver, whether it's providing services through another framework or 876not, can implement the ksensor operations. 877Drivers do not need to implement any character device operations directly. 878They are instead provided via the 879.Xr ksensor 4D 880driver. 881.Pp 882A driver registers with the ksensor framework during its 883.Xr attach 9E 884entry point 885and must implement the functions described in 886.Xr ksensor_ops 9E 887for each sensor that it creates. 888These interfaces include: 889.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 890.It Xr kso_kind 9E Ta Xr kso_scalar 9E 891.El 892.Ss Virtio Drivers 893The kernel provides an uncommitted interface for Virtio device drivers, which is 894discussed in some detail in 895.Pa uts/common/io/virtio/virtio.h . 896A client device driver will register with the framework through and then use 897that to begin feature and interrupt negotiation. 898As part of that, they are given the ability to set up virtqueues which can be 899used for communicating to and from the hypervisor. 900.Ss Kernel Statistics 901Drivers have the ability to export kstats 902.Pq kernel statistics 903that will appear in the 904.Xr kstat 8 905command. 906Any kind of module in the system can create and register a kstat, it is not 907strictly tied to anything like a 908.Vt dev_info_t . 909kstats have different types that they come in. 910The most common kstat type is the 911.Dv KSTAT_TYPE_NAMED 912which allows for multiple, typed name-value pairs to be part of the stat. 913This is what the kernel uses under the hood for many things such as the various 914.Xr mac 9E 915statistics that are managed on behalf of drivers. 916.Pp 917To create a kstat, a driver utilizes the 918.Xr kstat_create 9F 919function, after which it has a chance to set up the kstat and make choices about 920which entry points that it will implement. 921A kstat will not be made visible until the caller calls 922.Xr kstat_install 9F 923on it. 924The two entry points that a driver may implement are: 925.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver" 926.It Xr ks_snapshot 9E Ta Xr ks_update 9E 927.El 928.Pp 929First, let's discuss the 930.Xr ks_update 9E 931entry point. 932A kstat may be updated in one of two ways: either by having its 933.Xr ks_update 9E 934function called or by having the system update information as it goes in the 935kstat's data. 936One would use the former when it involves doing something like going out to 937hardware and reading registers, where as the latter approach might be used when 938operations can be tracked as part of a normal flow, such as the number of errors 939or particular requests a driver has encountered. 940The 941.Xr ks_snapshot 9E 942entry point is not as commonly used by comparison and allows a caller to 943interpose on the data marshalling process for copying out to userland. 944.Ss Upgradable Firmware Modules 945The UFM 946.Pq Upgradable Firmware Module 947system in the kernel allows a device driver to provide information about the 948firmware modules that are present on a device and is generally used as 949supplementary information about a device. 950The UFM framework allows a driver to declare a given number of modules that 951exist on a given 952.Vt dev_info_t . 953Each module has some number of slots with different versions. 954This information is automatically exported into various consumers such as 955.Xr fwflash 8 , 956the Fault Management Architecture, 957and the 958.Xr ufm 4D 959driver's specific ioctls. 960.Pp 961A driver fills in the operations vector discussed in 962.Xr ddi_ufm 9E 963and registers it with the kernel by calling 964.Xr ddi_ufm_init 9F . 965These interfaces have entry points include: 966.Bl -column -offset -indent "ddi_ufm_op_fill_image(9E)" "ddi_ufm_op_fill_image(9E)" 967.It Xr ddi_ufm_op_getcaps 9E Ta Xr ddi_ufm_op_nimages 9E 968.It Xr ddi_ufm_op_fill_image 9E Ta Xr ddi_ufm_op_fill_slot 9E 969.It Xr ddi_ufm_op_readimg 9E Ta 970.El 971.Pp 972The 973.Xr ddi_ufm_op_getcaps 9E 974entry point describes the capabilities of the device and what other entry points 975the kernel and callers can expect to exist. 976The 977.Xr ddi_ufm_op_nimages 9E 978entry point tells the system how many images there are and if it is not 979implemented, then the system assumes there is a single slot. 980The 981.Xr ddi_ufm_op_fill_image 9E 982and 983.Xr ddi_ufm_op_fill_slot 9E 984entry points are used to fill in information about slots and images 985respectively, while the 986.Xr ddi_ufm_op_readimg 9E 987entry point is used to read an image from the device for the operating system. 988That entry point is often supported when dealing with EEPROMs as many devices do 989not have a way of retrieving the actual current firmware. 990.Ss USB Host Interface Drivers 991Opposite of USB device drivers are the device drivers that make the USB 992abstractions work: USB host interface controllers. 993The kernel provides a private framework for these, which is discussed in 994.Xr usba_hcdi 9E . 995A HCDI driver is a character device driver and ends up also instantiating a root 996hub as part of its operation and forwards many of its open, close, and ioctl 997routines to the corresponding usba hubdi functions. 998.Pp 999To get started with the framework, a driver will need to call 1000.Xr usba_hcdi_register 9F 1001with a filled out 1002.Xr usba_hcdi_register_args_t 9S 1003structure. 1004That registration structure includes the operation vector of callbacks that the 1005driver fills in, which involve opening and closing pipes 1006.Po 1007.Xr usba_hcdi_pipe_open 9E 1008.Pc , 1009issuing the various ctrl, interrupt, bulk, and isochronous transfers 1010.Po 1011.Xr usba_hcdi_pipe_bulk_xfer 9E , 1012etc. 1013.Pc , 1014and more. 1015.Sh DTRACE PROBES 1016By default, the DTrace 1017.Xr fbt 4D , 1018function boundary tracing, 1019provider will create DTrace probes based on the entry and return points 1020of most functions in a module 1021.Pq the primary exception being for some hand-written assembler . 1022While this is very powerful, there are often times that driver writers 1023want to define their own semantic probes. 1024The 1025.Xr sdt 4D , 1026statically defined tracing, provider can be used for this. 1027.Pp 1028To define an SDT probe, a driver should include 1029.In sys/sdt.h , 1030which defines several macros for probes based on the number of arguments 1031that are present. 1032Each probe takes a name, which is constrained by the rules of a C 1033identifier. 1034If two underscore characters are present in a row 1035.Pq Sq _ 1036they will be transformed into a hyphen 1037.Pq Sq - . 1038That is a probe declared with a name of 1039.Sq hello__world 1040will be named 1041.Sq hello-world 1042and accessible as the DTrace probe 1043.Ql sdt:::hello-world . 1044.Pp 1045Each probe can present a varying number of arguments in DTrace, ranging 1046from 0-8. 1047For each DTrace probe argument, one passes both the type of the argument 1048and the actual value. 1049The following example from the 1050.Xr igc 4D 1051driver shows a DTrace probe that provides four arguments and would be 1052accessible using the probe 1053.Ql sdt:::igc-context-desc : 1054.Bd -literal -offset indent 1055DTRACE_PROBE4(igc__context__desc, igc_t *, igc, igc_tx_ring_t *, 1056 ring, igc_tx_state_t *, tx, struct igc_adv_tx_context_desc *, 1057 ctx); 1058.Ed 1059.Pp 1060In the above example, 1061.Fa igc , 1062.Fa ring , 1063.Fa tx , 1064and 1065.Fa ctx 1066are local variables and function parameters. 1067.Pp 1068By default SDT probes are considered 1069.Sy Volatile , 1070in other words they can change at any time and disappear. 1071This is used to encourage widespread use of SDT probes for what may be 1072useful for a particular problem or issue that is being investigated. 1073SDT probes that are stabilized are transformed into their own first 1074class provider. 1075.Sh SEE ALSO 1076.Xr Intro 9 , 1077.Xr Intro 9F , 1078.Xr Intro 9S 1079