xref: /illumos-gate/usr/src/man/man9e/Intro.9e (revision 2833423dc59f4c35fe4713dbb942950c82df0437)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2024 Oxide Computer Company
13.\"
14.Dd May 23, 2024
15.Dt INTRO 9E
16.Os
17.Sh NAME
18.Nm Intro
19.Nd introduction to device driver entry points
20.Sh DESCRIPTION
21Section 9E of the manual describes the entry points and building blocks that are
22used to build and implement all kinds of device drivers and kernel modules.
23Often times, modules and device drivers are talked about interchangeably.
24The operating system is built around the idea of loadable kernel modules.
25Device drivers are the primary type that we think about; however, there are
26loadable kernel modules for file systems, STREAMS devices, and even system
27calls!
28.Pp
29The vast majority of this section focuses on documenting device
30.Pq and STREAMS
31drivers.
32Device driver are further broken down into different categories depending on
33what they are targeting.
34For example, there are dedicated frameworks for SCSI/SAS HBA drivers, networking
35drivers, USB drivers, and then general character and block device drivers.
36While most of the time we think about device drivers as corresponding to a piece
37of physical hardware, there are also pseudo-device drivers which are device
38drivers that provide functionality, but aren't backed by any hardware.
39For example,
40.Xr dtrace 4D
41and
42.Xr lofi 4D
43are both pseudo-device drivers.
44.Pp
45To help understand the relationship between these different types of things,
46consider the following image:
47.Bd -literal
48  +--------------------+
49  |                    |
50  |  Loadable Modules  |
51  |                    |
52  +--------------------+
53    |                          +--------------+      +------------+
54    |                          |              |      |            |
55    +------------------------->| Cryptography | ...  | Scheduling |  ...
56    |                          |              |      |            |
57    |                          +--------------+      +------------+
58    |   +----------------+     +--------------+     +--------------+
59    |   |                |     |              |     |              |
60    +-->| Device Drivers | ... | File Systems | ... | System Calls | ...
61        |                |     |              |     |              |
62        +----------------+     +--------------+     +--------------+
63                v
64    +-----------+
65    |
66    |   +------------+  +---------+     +-----------+     +-----------+
67    +-->| Networking |->| igb(4D) | ... | mlxcx(4D) | ... | cxgbe(4D) | ...
68    |   +------------+  +---------+     +-----------+     +-----------+
69    |
70    |   +-------+       +----------+     +-------------+     +----------+
71    +-->|  HBA  |------>| smrt(4D) | ... | mpt_sas(4D) | ... | ahci(4D) | ...
72    |   +-------+       +----------+     +-------------+     +----------+
73    |
74    |   +-------+       +--------------+     +----------+     +---------+
75    +-->|  USB  |------>| scsa2usb(4D) | ... | ccid(4D) | ... | hid(4D) | ...
76    |   +-------+       +--------------+     +----------+     +---------+
77    |
78    |   +---------+     +-------------+     +-------------+
79    +-->| Sensors |---->| smntemp(4D) | ... | pchtemp(4D) | ...
80    |   +---------+     +-------------+     +-------------+
81    |
82    +-------+-------------+-----------+----------+
83            |             v           V          |
84            v       +-----------+  +-----+       v
85        +-------+   | Character |  | USB |   +-------+
86        | Audio |   | and Block |  | HCD |   | Nexus |  ...
87        +-------+   |  Devices  |  +-----+   +-------+
88                    +-----------+
89.Ed
90.Pp
91The above diagram attempts to explain some of the relationships that were
92mentioned above at a high level.
93All device drivers are loadable modules that leverage the
94.Xr modldrv 9S
95structure and implement similar
96.Xr _init 9E
97and
98.Xr _fini 9E
99entry points.
100.Pp
101Some hardware implements more than one type of thing.
102The most common example here would be a NIC that implements a temperature sensor
103or a current sensor.
104Many devices also implement and leverage the kernel statistics framework called
105.Dq kstats .
106A device driver is not strictly limited to only a single class of thing.
107For example, many USB client devices are networking device drivers.
108In the subsequent sections we'll go into the functions and structures that are
109related to creating the different device drivers and their associated
110functions.
111.Ss Kernel Initialization
112To begin with, all loadable modules in the system are required to implement
113three entry points.
114If these entry points are not present, then the module cannot be installed in
115the system.
116These entry points are
117.Xr _init 9E ,
118.Xr _fini 9E ,
119and
120.Xr _info 9E .
121.Pp
122The
123.Xr _init 9E
124entry point will be the first thing called in the module and this is where
125any global initialization should be taken care of.
126Once all global state has been successfully created, the driver should call
127.Xr mod_install 9F
128to actually register with the system.
129Conversely,
130.Xr _fini 9E
131is used to tear down the module.
132The driver uses
133.Xr mod_remove 9F
134to first remove the driver from the system and then it can tear down any global
135state that was added there.
136.Pp
137While we mention global state here, this isn't widely used in most device
138drivers.
139A device driver can have multiple instances instantiated, one for each instance
140of a hardware device that is found and most state is tied to those instances.
141We'll discuss that more in the next section.
142.Pp
143The
144.Xr _info 9E
145entry point these days just calls
146.Xr mod_info 9F
147directly and can return it.
148.Pp
149All of these entry points directly or indirectly require a
150.Vt "struct modlinkage" .
151This structure is used by all types of loadable kernel modules and is filled in
152with information that varies based on the type of module one is creating.
153Here, everything that we're creating is going to use a
154.Vt "struct modldrv" ,
155which describes a loadable driver.
156Every device driver will declare a static global variable for these and fill
157them out.
158They are documented in
159.Xr modlinkage 9S
160and
161.Xr modldrv 9S
162respectively.
163.Pp
164The following is an example of these structures borrowed from
165.Xr igc 4D :
166.Bd -literal
167static struct modldrv igc_modldrv = {
168        .drv_modops = &mod_driverops,
169        .drv_linkinfo = "Intel I226/226 Ethernet Controller",
170        .drv_dev_ops = &igc_dev_ops
171};
172
173static struct modlinkage igc_modlinkage = {
174        .ml_rev = MODREV_1,
175        .ml_linkage = { &igc_modldrv, NULL }
176};
177.Ed
178.Pp
179From this there are a few important things to take away.
180A single kernel module may implement more than one type of linkage, though this
181is the exception and not the norm.
182The second part to call out here is that while the
183.Fa drv_modops
184will be the same for all drivers that use the
185.Vt "struct modldrv" ,
186the
187.Fa drv_linkinfo
188and
189.Fa drv_dev_ops
190will be unique to each driver.
191The next section discusses the
192.Vt "struct dev_ops" .
193.Ss The Devices Tree and Instances
194Device drivers have a unique challenge that makes them different from other
195kinds of loadable modules: there may be very well more than a single instance of
196the hardware that they support.
197Consider a few examples: a user can plug in two distinct USB mass storage
198devices or keyboards.
199A system may have more than one NIC present or the hardware may expose multiple
200physical ports as distinct devices.
201Many systems have more than one disk device.
202Conversely, if a given piece of hardware isn't present then there's no reason
203for the driver for it to be loaded.
204There is nothing that the Intel 1 GbE Ethernet NIC driver,
205.Xr igb 4D ,
206can do if there are no supported devices plugged in.
207.Pp
208Devices are organized into a tree that is full of parent and child
209relationships.
210This tree is what you see when you run
211.Xr prtconf 8 .
212As an example, a USB device is plugged into a port on a hub, which may be
213plugged into another hub, and then is eventually plugged into a PCI device that
214is the USB host controller, which itself may be under a PCI-PCI bridge, and this
215chain continues all the way up to the root of the tree, which we call
216.Dq rootnex .
217Device drivers that can enumerate children and provide operations for them are
218called
219.Dq nexus
220drivers.
221.Pp
222The system automatically fills out the device tree through a combination of
223built-in mechanisms and through operations on other nexus drivers.
224When a new hardware unit is discovered, a
225.Vt dev_info_t
226structure, the device information, is created for it and it is linked into the
227tree.
228Generally, the system can then use automatic information embedded in the device
229to determine what driver is responsible for the piece of hardware through the
230use of the
231.Dq compatible
232property which the systems and nexus drivers set up on their children.
233For example, PCI and PCIe drivers automatically set up the compatible property
234based on information discovered in PCI configuration space like the device's
235vendor, device ID, and class IDs.
236The same is true of USB.
237.Pp
238When a device driver is packaged, it contains metadata that indicates which
239devices it supports.
240For example, the aforementioned igb driver will have a rule that it matches
241.Dq pciex8086,10a7 .
242When the kernel discovers a device with this alias present, it will know that it
243should assign it to the igb driver and then it will assign the
244.Vt dev_info_t
245structure a new instance number.
246.Pp
247To emphasize here, each time the device is discovered in the tree, it will have
248an independent instance number and an independent
249.Vt dev_info_t
250that accompanies it.
251Each instance has an independent life time too.
252The most obvious way to think about this is with something that can be
253physically removed while the system is on, like a USB device.
254Just because you pull one USB keyboard doesn't mean it impacts the other one
255there.
256They are inherently different devices
257.Po
258albeit if they were plugged into the same HUB and the HUB was removed, then they
259both would be removed; however, each would be acted on independently
260.Pc .
261.Pp
262Here is a slimmed down example from a system's
263.Xr prtconf 8
264output:
265.Bd -literal
266Oxide,Gimlet (driver name: rootnex)
267    scsi_vhci, instance #0 (driver name: scsi_vhci)
268    pci, instance #0 (driver name: npe)
269        pci1022,1480, instance #13 (driver name: amdzen_stub)
270        pci1022,164f
271        pci1022,1482
272        pci1de,fff9, instance #0 (driver name: pcieb)
273            pci1344,3100, instance #4 (driver name: nvme)
274                blkdev, instance #10 (driver name: blkdev)
275        pci1022,1482
276        pci1022,1482
277        pci1de,fff9, instance #1 (driver name: pcieb)
278            pci1b96,0, instance #7 (driver name: nvme)
279                blkdev, instance #0 (driver name: blkdev)
280        pci1de,fff9, instance #2 (driver name: pcieb)
281            pci1b96,0, instance #8 (driver name: nvme)
282                blkdev, instance #4 (driver name: blkdev)
283        pci1de,fff9, instance #3 (driver name: pcieb)
284            pci1b96,0, instance #10 (driver name: nvme)
285                blkdev, instance #1 (driver name: blkdev)
286.Ed
287.Pp
288From this we can see that there are multiple instances of the NVMe
289.Pq nvme ,
290PCIe bridge
291.Pq pcieb ,
292and
293generic block device
294.Pq blkdev
295driver present.
296Each of these has their own
297.Vt dev_info_t
298and has their various entry points called in parallel.
299With that, let's dig into the specifics of what the
300.Vt "struct dev_ops"
301actually is and the different operations to be aware.
302.Ss struct dev_ops
303The device operations structure,
304.Vt "struct dev_ops" ,
305controls all of the basic entry points that a loadable device contains.
306This is something that every driver has to implement, no matter the type.
307The most important things that will be present are the
308.Fa devo_attach
309and
310.Fa devo_detach
311members which are used to create and destroy instances of the driver and then a
312pointer to any subsequent operations that exist, such as the
313.Fa devo_cb_ops ,
314which is used for character and block device drivers and the
315.Fa devo_bus_ops ,
316which is used for nexus drivers.
317.Pp
318Attach and detach are the most important entry points in this structure.
319This could be practically thought of as the
320.Dq main
321function entry point for a device driver.
322This is where any initialization of the instance will occur.
323This would include many traditional things like setting up access to registers,
324allocating and assigning interrupts, and interfacing with the various other
325device driver frameworks such as
326.Xr mac 9E .
327.Pp
328The actions taken here are generally device-specific, while certain classes of
329devices
330.Pq e.g. PCI, USB, etc.
331will have overlapping concerns.
332In addition, this is where the driver will take care of creating anything like a
333minor node which will be used to access it by userland software if it's a
334character or block device driver.
335.Pp
336There is generally a per-instance data structure that a driver creates.
337It may do this by calling
338.Xr kmem_zalloc 9F
339and assigning the structure with the
340.Xr ddi_set_driver_private 9F
341entry point or it may use the DDI's soft state management functions rooted in
342.Xr ddi_soft_state_init 9F .
343A driver should try to tie as much state to the instance as possible, where
344possible.
345There should not be anything like a fixed size global array of possible
346instances.
347Someone usually finds a way to attach many more instances of some type of
348hardware than you might expect!
349.Pp
350The
351.Xr attach 9E
352and
353.Xr detach 9E
354entry points both have a unique command argument that is used to describe a
355specific action that is going on.
356This action may be a normal attach or it could be related to putting the system
357into the ACPI S3 sleep or similar state with the suspend and resume commands.
358.Pp
359The following table are the common functions that most drivers end up having to
360think a little bit about:
361.Vt "struct dev_ops" :
362.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
363.It Xr attach 9E Ta Xr detach 9E
364.It Xr getinfo 9E Ta Xr quiesce 9E
365.El
366.Pp
367Briefly, the
368.Xr getinfo 9E
369entry point is used to map between instances of a device driver and the minor
370nodes it creates.
371Drivers that participate in a framework like the SCSI HBA, Networking, or
372related don't usually end up implementing this.
373However, drivers that manually create minor nodes generally do.
374The
375.Xr quiesce 9E
376entry point is used as part of the fast reboot operation.
377It is basically intended to stop and/or reset the hardware and discard any
378ongoing I/O.
379For pseudo-device drivers or drivers which do not perform I/O, they can use the
380symbol
381.Ql ddi_quiesce_not_needed
382in lieu of a standard implementation.
383.Pp
384In addition, the following additional entry points exist, but are less commonly
385required either because the system generally takes care of it, such as
386.Xr probe 9E .
387.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
388.It Xr identify 9E Ta Xr power 9E
389.It Xr probe 9E Ta
390.El
391.Pp
392For more information on the structure, see also
393.Xr dev_ops 9S .
394The following are a few examples of the
395.Vt "struct dev_ops"
396structure from a few drivers.
397We recommend using the C99 style for all new instances.
398.Bd -literal
399static struct dev_ops ksensor_dev_ops = {
400        .devo_rev = DEVO_REV,
401        .devo_refcnt = 0,
402        .devo_getinfo = ksensor_getinfo,
403        .devo_identify = nulldev,
404        .devo_probe = nulldev,
405        .devo_attach = ksensor_attach,
406        .devo_detach = ksensor_detach,
407        .devo_reset = nodev,
408        .devo_power = ddi_power,
409        .devo_quiesce = ddi_quiesce_not_needed,
410        .devo_cb_ops = &ksensor_cb_ops
411};
412
413static struct dev_ops igc_dev_ops = {
414        .devo_rev = DEVO_REV,
415        .devo_refcnt = 0,
416        .devo_getinfo = NULL,
417        .devo_identify = nulldev,
418        .devo_probe = nulldev,
419        .devo_attach = igc_attach,
420        .devo_detach = igc_detach,
421        .devo_reset = nodev,
422        .devo_quiesce = ddi_quiesce_not_supported,
423        .devo_cb_ops = &igc_cb_ops
424};
425
426static struct dev_ops pchtemp_dev_ops = {
427        .devo_rev = DEVO_REV,
428        .devo_refcnt = 0,
429        .devo_getinfo = nodev,
430        .devo_identify = nulldev,
431        .devo_probe = nulldev,
432        .devo_attach = pchtemp_attach,
433        .devo_detach = pchtemp_detach,
434        .devo_reset = nodev,
435        .devo_quiesce = ddi_quiesce_not_needed
436};
437.Ed
438.Ss Character and Block Operations
439In the history of UNIX, the most common device drivers that were created were
440for block and character devices.
441The interfaces in block and character devices are usually in service of common
442I/O patterns that the system exposes.
443For example, when you call
444.Xr open 2 ,
445.Xr ioctl 2 ,
446or
447.Xr read 2
448on a device, it goes through the device's corresponding entry point here.
449Both block and character devices operate on the shared
450.Vt "struct cb_ops"
451structure, with different members being expected for both of them.
452While they both require that someone implement the
453.Fa cb_open
454and
455.Fa cb_close
456members, block devices perform I/O through the
457.Xr strategy 9E
458entry point and support the
459.Xr dump 9E
460entry point for kernel crash dumps, while character devices implement the more
461historically familiar
462.Xr read 9E ,
463.Xr write 9E,
464and the
465.Xr devmap 9E
466entry point for supporting memory-mapping.
467.Pp
468While the device operations structures worked with the
469.Vt dev_info_t
470structure and there was one per-instance, character and block operations work
471with minor nodes: named entities that exist in the file system.
472UNIX has long had the idea of a major and minor number that is encoded in the
473.Vt dev_t
474which is embedded in the file system, which is what you see in the
475.Fa st_rdev
476member of stat structure when you call
477.Xr stat 2 .
478The major number is assigned to the driver
479.Em as a whole ,
480not an instance.
481The minor number space is shared between all instances of a driver.
482Minor node numbers are assigned by the driver when it calls
483.Xr ddi_create_minor_node 9F
484to create a minor node and when one of its character or block entry points are
485called, it will get this minor number back and it must translate it to the
486corresponding instance on its own.
487.Pp
488A special property of the
489.Xr open 9E
490entry point is that it can change the minor number a client gets during its call
491to open which it will use for all subsequent calls.
492This is called a
493.Dq cloning
494open.
495Whether this is used or not depends on the type of driver that you are creating.
496For example, many pseudo-device drivers like DTrace will use this so each client
497has its own state.
498Similarly, devices that have certain internal locking and transaction schemes
499will give each caller a unique minor.
500The
501.Xr ccid 4D
502and
503.Xr nvme 4D
504driver are examples of this.
505However, many drivers will have just a single minor node per instance and just
506say that the minor node's number is the instance number, making it very simple
507to figure out the mapping.
508When it's not so simple, often an AVL tree or some other structure is used to
509help map this together.
510.Pp
511The following entry points are generally used for character devices:
512.Bl -tag -width Ds
513.It Xr ioctl 9E
514The I/O control or ioctl entry point is used extensively throughout the system
515to perform different kinds of operations.
516These operations are often driver specific, though there are also some which are
517also common operations that are used across multiple devices like the disk
518operations described in
519.Xr dkio 4I
520or the ioctls that are used under the hood by
521.Xr cfgadm 8
522and friends.
523.Pp
524Whether a driver supports ioctls or not depends on it.
525If it does, it is up to the driver to always perform any requisite privilege and
526permission checking as well as take care in copying in and out any kind of
527memory from the user process through calls like
528.Xr ddi_copyin 9F
529and
530.Xr ddi_copyout 9F .
531.Pp
532The ioctl interface gives the driver writer great flexibility to create equally
533useful or hard to consume interfaces.
534When crafting a new committed interface over an ioctl, take care to ensure there
535is an ability to version the structure or use something that has more
536flexibility like a
537.Vt nvlist_t .
538See the
539.Sq Copying Data to and from Userland
540section of
541.Xr Intro 9F
542for more information.
543.It Xr read 9E , Xr write 9E , Xr aread 9E , and Xr awrite 9E
544These are the classic I/O routines of the system.
545A driver's read and write routines operate on a
546.Xr uio 9S
547structure which describes the I/O that is occurring, the offset into the
548device that the I/O should occur at, and has various flags that
549describe properties of the I/O request, such as whether or not it is a
550non-blocking request.
551.Pp
552The majority of device drivers that implement these entry points are using them
553to create some kind of file-like abstraction for a device.
554For example, the
555.Xr ccid 4D
556driver uses these interfaces for submitting commands and reading responses back
557from an underlying device.
558.Pp
559For most use cases
560.Xr read 9E
561and
562.Xr write 9E
563are sufficient; however, the
564.Xr aread 9E
565and
566.Xr awrite 9E
567are versions that tie into the kernel's asynchronous I/O engine.
568.It Xr chpoll 9E
569This entry point allows a device to be polled by user code for an event of
570interest and connects through the kernel to different polling mechanisms such as
571.Xr poll 2 ,
572.Xr port_get 3C ,
573and many others.
574Currently this interface only allows a driver to define the classic poll style
575events such as
576.Dv POLLIN ,
577.Dv POLLOUT, and
578.Dv POLLHUP .
579The exact semantics of these are up to the driver; however, it is expected that
580the read and write oriented semantics of the various events will be honored by
581the device driver.
582.It Xr devmap 9E and Xr segmap 9E
583These are entry points that are used to set up memory mappings for a device and
584replace the older
585.Xr mmap 9E
586entry point.
587When a function calls
588.Xr mmap 2
589on a device, it'll reach these, starting with the
590.Xr devmap 9E
591entry point.
592The driver is responsible for confirming that the mappings request and its
593semantics are sensible, after which it will set up memory for consumption.
594The
595.Xr devmap 9E
596manual page has more details on the specifics here and the related entry points
597that can be implemented as part of the
598.Xr devmap_callback_ctl 9S
599structures such as
600.Xr devmap_access 9E .
601The segment mapping is an optional part that provides some additional controls
602for a driver such as assigning certain mapping attributes or wanting to maintain
603separate contexts for different mappings.
604See
605.Xr segmap 9E
606for more information.
607It is common for drivers to just provide a
608.Xr devmap 9E
609entry point.
610.It Xr prop_op 9E
611This entry point is used for drive's to manage and deal with property creation.
612While this is its own entry point, most callers can just specify
613.Xr ddi_prop_op 9F
614for this and don't need any special handling.
615.El
616.Pp
617The following entry points are used uniquely used for block devices:
618.Bl -tag -width Ds
619.It Xr strategy 9E
620A driver's strategy entry point is used to actually perform I/O as described by
621the
622.Xr buf 9S
623structure.
624It is responsible for allocating all resources and then initiating the actual
625request.
626The actual request will finish potentially asynchronously through calls to
627.Xr biodone 9F
628or
629.Xr bioerror 9F .
630HBA or blkdev-based drivers do not usually end up implementing this interface.
631.It Xr dump 9E
632A driver's dump implementation is used when the operating system has had a fatal
633error and is trying to persist a crash dump to disk.
634This is a delicate operation as the system has already failed, which means many
635normal operations like interrupt handlers, timeouts, and blocking will no longer
636work.
637.El
638.Pp
639In general, the
640.Xr print 9E
641entry point for block devices is vestigial and users should fill in
642.Xr nodev 9F
643there instead.
644.Pp
645The following are some examples of different character device operations
646structures that drivers have employed.
647Note that using C99 structure definitions is preferred:
648.Bd -literal
649static struct cb_ops ksensor_cb_ops = {
650        .cb_open = ksensor_open,
651        .cb_close = ksensor_close,
652        .cb_strategy = nodev,
653        .cb_print = nodev,
654        .cb_dump = nodev,
655        .cb_read = nodev,
656        .cb_write = nodev,
657        .cb_ioctl = ksensor_ioctl,
658        .cb_devmap = nodev,
659        .cb_mmap = nodev,
660        .cb_segmap = nodev,
661        .cb_chpoll = nochpoll,
662        .cb_prop_op = ddi_prop_op,
663        .cb_flag = D_MP,
664        .cb_rev = CB_REV,
665        .cb_aread = nodev,
666        .cb_awrite = nodev
667};
668
669static struct cb_ops vio9p_cb_ops = {
670        .cb_rev =                       CB_REV,
671        .cb_flag =                      D_NEW | D_MP,
672        .cb_open =                      vio9p_open,
673        .cb_close =                     vio9p_close,
674        .cb_read =                      vio9p_read,
675        .cb_write =                     vio9p_write,
676        .cb_ioctl =                     vio9p_ioctl,
677        .cb_strategy =                  nodev,
678        .cb_print =                     nodev,
679        .cb_dump =                      nodev,
680        .cb_devmap =                    nodev,
681        .cb_mmap =                      nodev,
682        .cb_segmap =                    nodev,
683        .cb_chpoll =                    nochpoll,
684        .cb_prop_op =                   ddi_prop_op,
685        .cb_str =                       NULL,
686        .cb_aread =                     nodev,
687        .cb_awrite =                    nodev,
688};
689
690static struct cb_ops bd_cb_ops = {
691        bd_open,                /* open */
692        bd_close,               /* close */
693        bd_strategy,            /* strategy */
694        nodev,                  /* print */
695        bd_dump,                /* dump */
696        bd_read,                /* read */
697        bd_write,               /* write */
698        bd_ioctl,               /* ioctl */
699        nodev,                  /* devmap */
700        nodev,                  /* mmap */
701        nodev,                  /* segmap */
702        nochpoll,               /* poll */
703        bd_prop_op,             /* cb_prop_op */
704        0,                      /* streamtab  */
705        D_64BIT | D_MP,         /* Driver compatibility flag */
706        CB_REV,                 /* cb_rev */
707        bd_aread,               /* async read */
708        bd_awrite               /* async write */
709};
710.Ed
711.Ss Networking Drivers
712Networking device drivers come in many forms and flavors.
713They may interface to the host via PCIe, USB, be a pseudo-device, or use
714something entirely different like SPI
715.Pq Serial Peripheral Interface .
716The system provides a dedicated networking interface driver framework that is
717documented in
718.Xr mac 9E  .
719This framework is sometimes also referred to as GLDv3
720.Pq Generic LAN Device version 3 .
721.Pp
722All networking drivers will still implement a basic
723.Vt "struct dev_ops"
724and a minimal
725.Vt "struct cb_ops" .
726The
727.Xr mac 9E
728framework takes care of implementing all of the standard character device entry
729points at the end of the day and instead provides a number of different
730networking-specific entry points that take care of things like getting and
731setting properties, installing and removing MAC addresses and filters, and
732actually transmitting and providing callbacks for receiving packets.
733.Pp
734Each instance of a device driver will generally have a separate registration
735with
736.Xr mac 9E .
737In other words, there is usually a one to one relationship between a driver
738having its
739.Xr attach 9E
740entry point called and it registering with the
741.Xr mac 9E
742framework.
743.Ss STREAMS Modules
744STREAMS modules are a historical way to provide certain services in the kernel.
745For networking device drivers, instead see the prior section and
746.Xr mac 9E .
747Conceptually STREAMS break things into queues, with one side being designed for
748a module to read data and another side for it write or produce data.
749These modules are arranged in a stack, with additional modules being pushed on
750for additional processing.
751For example, the TTY subsystem has a serial console as a base STREAMS module,
752but it then pushes on additional modules like the pseudo-terminal emulation
753.Po
754.Xr ptem 4M
755.Pc ,
756the standard line discipline
757.Po
758.Xr ldterm 4M
759.Pc ,
760etc.
761.Pp
762STREAMS drivers don't use the normal character device entry points
763.Pq though sometimes they do define them
764or even the
765.Vt "struct modldrv" .
766Instead they use the
767.Vt "struct modlstrmod"
768which is discussed in
769.Xr modlstrmod 9S ,
770which in turn requires one to fill out the
771.Xr fmodsw 9S ,
772.Xr streamtab 9S ,
773and
774.Xr qinit 9S
775structures.
776The latter of these has two of the more common entry points:
777.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
778.It Xr put 9E Ta Xr srv 9E
779.El
780.Pp
781These entry points are used when different kinds of messages are received by the
782device driver on a queue.
783In addition, those entry points define an alternative set of entry points for
784.Xr open 9E
785and
786.Xr close 9E
787as STREAMS modules open and close routines all operate in the context of a given
788.Vt queue_t .
789There are other differences here.
790An ioctl is not a dedicated entry point, but rather a specific message type
791.Po
792.Dv M_IOCTL
793.Pc
794that is
795received in a driver's
796.Xr put 9E
797routine.
798.Pp
799Finally, it's worth noting the
800.Xr mt-streams 9F
801manual page which discusses several concurrency related considerations for
802STREAMS related drivers.
803.Ss HBA Drivers
804Host bus adapters are used to interface with the various SCSI and SAS
805controllers.
806Like with networking, the kernel provides a framework under the name of SCSA.
807HBA drivers still often implement character device entry points; however, they
808generally end up calling into shared framework entry points for
809.Xr open 9E ,
810.Xr ioctl 9E ,
811and
812.Xr close 9E .
813For several of the concepts related with the 3rd version for the framework, see
814.Xr iport 9 .
815.Pp
816The following entry points are associated with HBA drivers:
817.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
818.It Xr tran_abort 9E Ta Xr tran_bus_reset 9E
819.It Xr tran_dmafree 9E Ta Xr tran_getcap 9E
820.It Xr tran_init_pkt 9E Ta Xr tran_quiesce 9E
821.It Xr tran_reset 9E Ta Xr tran_reset_notify 9E
822.It Xr tran_setup_pkt 9E Ta Xr tran_start 9E
823.It Xr tran_sync_pkt 9E Ta Xr tran_tgt_free 9E
824.It Xr tran_tgt_init 9E Ta Xr tran_tgt_probe 9E
825.El
826.Pp
827In addition to these, when using SCSAv3 with iports, drivers will call
828.Xr scsi_hba_iport_register 9F
829to create various iports.
830This has the unique effect of causing the driver's top-level
831.Xr attach 9E
832entry point to be called again, but referring to the iport instead of the main
833hardware instance.
834.Ss USB Drivers
835The kernel provides a framework for USB client devices to access various USB
836services such as getting access to device and configuration descriptors, issuing
837control, bulk, interrupt, and isochronous requests, and being notified when they
838are removed from the system.
839Generally a USB device driver leverages a framework of some kind, like
840.Xr mac 9E
841in addition to the USB pieces.
842As such, there are no entry points specific to USB device drivers; however,
843there are plenty of provided functions.
844.Pp
845To get started with a USB device driver, one will generally perform some of the
846following steps:
847.Bl -enum
848.It
849Register with the USB framework by calling
850.Xr usb_client_attach 9F .
851.It
852Ask the kernel to fetch all of the device and class descriptors that are
853appropriate with the
854.Xr usb_get_dev_data 9F
855function.
856.It
857Parse the relevant descriptors to figure out which endpoints to attach.
858.It
859Open up pipes to the specific USB endpoints by using
860.Xr usb_lookup_ep_data 9F ,
861.Xr usb_ep_xdescr_fill 9F ,
862and
863.Xr usb_pipe_xopen 9F .
864.It
865Proceed with the rest of device initialization and service.
866.El
867.Ss Sensors
868Many devices embed sensors in them, such as a networking ASIC that tracks its
869junction temperature.
870The kernel provides the
871.Xr ksensor 9E
872.Pq kernel sensor
873framework to allow device drivers to implement sensors with a minimal set of
874callback functions.
875Any device driver, whether it's providing services through another framework or
876not, can implement the ksensor operations.
877Drivers do not need to implement any character device operations directly.
878They are instead provided via the
879.Xr ksensor 4D
880driver.
881.Pp
882A driver registers with the ksensor framework during its
883.Xr attach 9E
884entry point
885and must implement the functions described in
886.Xr ksensor_ops 9E
887for each sensor that it creates.
888These interfaces include:
889.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
890.It Xr kso_kind 9E Ta Xr kso_scalar 9E
891.El
892.Ss Virtio Drivers
893The kernel provides an uncommitted interface for Virtio device drivers, which is
894discussed in some detail in
895.Pa uts/common/io/virtio/virtio.h .
896A client device driver will register with the framework through and then use
897that to begin feature and interrupt negotiation.
898As part of that, they are given the ability to set up virtqueues which can be
899used for communicating to and from the hypervisor.
900.Ss Kernel Statistics
901Drivers have the ability to export kstats
902.Pq kernel statistics
903that will appear in the
904.Xr kstat 8
905command.
906Any kind of module in the system can create and register a kstat, it is not
907strictly tied to anything like a
908.Vt dev_info_t .
909kstats have different types that they come in.
910The most common kstat type is the
911.Dv KSTAT_TYPE_NAMED
912which allows for multiple, typed name-value pairs to be part of the stat.
913This is what the kernel uses under the hood for many things such as the various
914.Xr mac 9E
915statistics that are managed on behalf of drivers.
916.Pp
917To create a kstat, a driver utilizes the
918.Xr kstat_create 9F
919function, after which it has a chance to set up the kstat and make choices about
920which entry points that it will implement.
921A kstat will not be made visible until the caller calls
922.Xr kstat_install 9F
923on it.
924The two entry points that a driver may implement are:
925.Bl -column -offset -indent "mac_capab_transceiver" "mac_capab_transceiver"
926.It Xr ks_snapshot 9E Ta Xr ks_update 9E
927.El
928.Pp
929First, let's discuss the
930.Xr ks_update 9E
931entry point.
932A kstat may be updated in one of two ways: either by having its
933.Xr ks_update 9E
934function called or by having the system update information as it goes in the
935kstat's data.
936One would use the former when it involves doing something like going out to
937hardware and reading registers, where as the latter approach might be used when
938operations can be tracked as part of a normal flow, such as the number of errors
939or particular requests a driver has encountered.
940The
941.Xr ks_snapshot 9E
942entry point is not as commonly used by comparison and allows a caller to
943interpose on the data marshalling process for copying out to userland.
944.Ss Upgradable Firmware Modules
945The UFM
946.Pq Upgradable Firmware Module
947system in the kernel allows a device driver to provide information about the
948firmware modules that are present on a device and is generally used as
949supplementary information about a device.
950The UFM framework allows a driver to declare a given number of modules that
951exist on a given
952.Vt dev_info_t .
953Each module has some number of slots with different versions.
954This information is automatically exported into various consumers such as
955.Xr fwflash 8 ,
956the Fault Management Architecture,
957and the
958.Xr ufm 4D
959driver's specific ioctls.
960.Pp
961A driver fills in the operations vector discussed in
962.Xr ddi_ufm 9E
963and registers it with the kernel by calling
964.Xr ddi_ufm_init 9F .
965These interfaces have entry points include:
966.Bl -column -offset -indent "ddi_ufm_op_fill_image(9E)" "ddi_ufm_op_fill_image(9E)"
967.It Xr ddi_ufm_op_getcaps 9E Ta Xr ddi_ufm_op_nimages 9E
968.It Xr ddi_ufm_op_fill_image 9E Ta Xr ddi_ufm_op_fill_slot 9E
969.It Xr ddi_ufm_op_readimg 9E Ta
970.El
971.Pp
972The
973.Xr ddi_ufm_op_getcaps 9E
974entry point describes the capabilities of the device and what other entry points
975the kernel and callers can expect to exist.
976The
977.Xr ddi_ufm_op_nimages 9E
978entry point tells the system how many images there are and if it is not
979implemented, then the system assumes there is a single slot.
980The
981.Xr ddi_ufm_op_fill_image 9E
982and
983.Xr ddi_ufm_op_fill_slot 9E
984entry points are used to fill in information about slots and images
985respectively, while the
986.Xr ddi_ufm_op_readimg 9E
987entry point is used to read an image from the device for the operating system.
988That entry point is often supported when dealing with EEPROMs as many devices do
989not have a way of retrieving the actual current firmware.
990.Ss USB Host Interface Drivers
991Opposite of USB device drivers are the device drivers that make the USB
992abstractions work: USB host interface controllers.
993The kernel provides a private framework for these, which is discussed in
994.Xr usba_hcdi 9E .
995A HCDI driver is a character device driver and ends up also instantiating a root
996hub as part of its operation and forwards many of its open, close, and ioctl
997routines to the corresponding usba hubdi functions.
998.Pp
999To get started with the framework, a driver will need to call
1000.Xr usba_hcdi_register 9F
1001with a filled out
1002.Xr usba_hcdi_register_args_t 9S
1003structure.
1004That registration structure includes the operation vector of callbacks that the
1005driver fills in, which involve opening and closing pipes
1006.Po
1007.Xr usba_hcdi_pipe_open 9E
1008.Pc ,
1009issuing the various ctrl, interrupt, bulk, and isochronous transfers
1010.Po
1011.Xr usba_hcdi_pipe_bulk_xfer 9E ,
1012etc.
1013.Pc ,
1014and more.
1015.Sh DTRACE PROBES
1016By default, the DTrace
1017.Xr fbt 4D ,
1018function boundary tracing,
1019provider will create DTrace probes based on the entry and return points
1020of most functions in a module
1021.Pq the primary exception being for some hand-written assembler .
1022While this is very powerful, there are often times that driver writers
1023want to define their own semantic probes.
1024The
1025.Xr sdt 4D ,
1026statically defined tracing, provider can be used for this.
1027.Pp
1028To define an SDT probe, a driver should include
1029.In sys/sdt.h ,
1030which defines several macros for probes based on the number of arguments
1031that are present.
1032Each probe takes a name, which is constrained by the rules of a C
1033identifier.
1034If two underscore characters are present in a row
1035.Pq Sq _
1036they will be transformed into a hyphen
1037.Pq Sq - .
1038That is a probe declared with a name of
1039.Sq hello__world
1040will be named
1041.Sq hello-world
1042and accessible as the DTrace probe
1043.Ql sdt:::hello-world .
1044.Pp
1045Each probe can present a varying number of arguments in DTrace, ranging
1046from 0-8.
1047For each DTrace probe argument, one passes both the type of the argument
1048and the actual value.
1049The following example from the
1050.Xr igc 4D
1051driver shows a DTrace probe that provides four arguments and would be
1052accessible using the probe
1053.Ql sdt:::igc-context-desc :
1054.Bd -literal -offset indent
1055DTRACE_PROBE4(igc__context__desc, igc_t *, igc, igc_tx_ring_t *,
1056    ring, igc_tx_state_t *, tx, struct igc_adv_tx_context_desc *,
1057    ctx);
1058.Ed
1059.Pp
1060In the above example,
1061.Fa igc ,
1062.Fa ring ,
1063.Fa tx ,
1064and
1065.Fa ctx
1066are local variables and function parameters.
1067.Pp
1068By default SDT probes are considered
1069.Sy Volatile ,
1070in other words they can change at any time and disappear.
1071This is used to encourage widespread use of SDT probes for what may be
1072useful for a particular problem or issue that is being investigated.
1073SDT probes that are stabilized are transformed into their own first
1074class provider.
1075.Sh SEE ALSO
1076.Xr Intro 9 ,
1077.Xr Intro 9F ,
1078.Xr Intro 9S
1079