1.\" 2.\" Copyright (c) 1999 Kenneth D. Merry. 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. The name of the author may not be used to endorse or promote products 11.\" derived from this software without specific prior written permission. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd July 27, 2021 28.Dt PCI 4 29.Os 30.Sh NAME 31.Nm pci 32.Nd generic PCI/PCIe bus driver 33.Sh SYNOPSIS 34To compile the PCI bus driver into the kernel, 35place the following line in your 36kernel configuration file: 37.Bd -ragged -offset indent 38.Cd device pci 39.Ed 40.Pp 41To compile in support for Single Root I/O Virtualization 42.Pq SR-IOV : 43.Bd -ragged -offset indent 44.Cd options PCI_IOV 45.Ed 46.Pp 47To compile in support for native PCI-express HotPlug: 48.Bd -ragged -offset indent 49.Cd options PCI_HP 50.Ed 51.Sh DESCRIPTION 52The 53.Nm 54driver provides support for 55.Tn PCI 56and 57.Tn PCIe 58devices in the kernel and limited access to 59.Tn PCI 60devices for userland. 61.Pp 62The 63.Nm 64driver provides a 65.Pa /dev/pci 66character device that can be used by userland programs to read and write 67.Tn PCI 68configuration registers. 69Programs can also use this device to get a list of all 70.Tn PCI 71devices, or all 72.Tn PCI 73devices that match various patterns. 74.Pp 75Since the 76.Nm 77driver provides a write interface for 78.Tn PCI 79configuration registers, system administrators should exercise caution when 80granting access to the 81.Nm 82device. 83If used improperly, this driver can allow userland applications to 84crash a machine or cause data loss. 85In particular, driver only allows operations on the opened 86.Pa /dev/pci 87to modify system state if the file descriptor was opened for writing. 88For instance, the 89.Dv PCIOCREAD 90and 91.Dv PCIOCBARMMAP 92operations require a writeable descriptor, because reading a config register 93or a BAR read access could have function-specific side-effects. 94.Pp 95The 96.Nm 97driver implements the 98.Tn PCI 99bus in the kernel. 100It enumerates any devices on the 101.Tn PCI 102bus and gives 103.Tn PCI 104client drivers the chance to attach to them. 105It assigns resources to children, when the BIOS does not. 106It takes care of routing interrupts when necessary. 107It reprobes the unattached 108.Tn PCI 109children when 110.Tn PCI 111client drivers are dynamically 112loaded at runtime. 113The 114.Nm 115driver also includes support for PCI-PCI bridges, 116various platform-specific Host-PCI bridges, 117and basic support for 118.Tn PCI 119VGA adapters. 120.Sh IOCTLS 121The following 122.Xr ioctl 2 123calls are supported by the 124.Nm 125driver. 126They are defined in the header file 127.In sys/pciio.h . 128.Bl -tag -width 012345678901234 129.It PCIOCGETCONF 130This 131.Xr ioctl 2 132takes a 133.Va pci_conf_io 134structure. 135It allows the user to retrieve information on all 136.Tn PCI 137devices in the system, or on 138.Tn PCI 139devices matching patterns supplied by the user. 140The call may set 141.Va errno 142to any value specified in either 143.Xr copyin 9 144or 145.Xr copyout 9 . 146The 147.Va pci_conf_io 148structure consists of a number of fields: 149.Bl -tag -width match_buf_len 150.It pat_buf_len 151The length, in bytes, of the buffer filled with user-supplied patterns. 152.It num_patterns 153The number of user-supplied patterns. 154.It patterns 155Pointer to a buffer filled with user-supplied patterns. 156.Va patterns 157is a pointer to 158.Va num_patterns 159.Va pci_match_conf 160structures. 161The 162.Va pci_match_conf 163structure consists of the following elements: 164.Bl -tag -width pd_vendor 165.It pc_sel 166.Tn PCI 167domain, bus, slot and function. 168.It pd_name 169.Tn PCI 170device driver name. 171.It pd_unit 172.Tn PCI 173device driver unit number. 174.It pc_vendor 175.Tn PCI 176vendor ID. 177.It pc_device 178.Tn PCI 179device ID. 180.It pc_class 181.Tn PCI 182device class. 183.It flags 184The flags describe which of the fields the kernel should match against. 185A device must match all specified fields in order to be returned. 186The match flags are enumerated in the 187.Va pci_getconf_flags 188structure. 189Hopefully the flag values are obvious enough that they do not need to 190described in detail. 191.El 192.It match_buf_len 193Length of the 194.Va matches 195buffer allocated by the user to hold the results of the 196.Dv PCIOCGETCONF 197query. 198.It num_matches 199Number of matches returned by the kernel. 200.It matches 201Buffer containing matching devices returned by the kernel. 202The items in this buffer are of type 203.Va pci_conf , 204which consists of the following items: 205.Bl -tag -width pc_subvendor 206.It pc_sel 207.Tn PCI 208domain, bus, slot and function. 209.It pc_hdr 210.Tn PCI 211header type. 212.It pc_subvendor 213.Tn PCI 214subvendor ID. 215.It pc_subdevice 216.Tn PCI 217subdevice ID. 218.It pc_vendor 219.Tn PCI 220vendor ID. 221.It pc_device 222.Tn PCI 223device ID. 224.It pc_class 225.Tn PCI 226device class. 227.It pc_subclass 228.Tn PCI 229device subclass. 230.It pc_progif 231.Tn PCI 232device programming interface. 233.It pc_revid 234.Tn PCI 235revision ID. 236.It pd_name 237Driver name. 238.It pd_unit 239Driver unit number. 240.El 241.It offset 242The offset is passed in by the user to tell the kernel where it should 243start traversing the device list. 244The value passed out by the kernel 245points to the record immediately after the last one returned. 246The user may 247pass the value returned by the kernel in subsequent calls to the 248.Dv PCIOCGETCONF 249ioctl. 250If the user does not intend to use the offset, it must be set to zero. 251.It generation 252.Tn PCI 253configuration generation. 254This value only needs to be set if the offset is set. 255The kernel will compare the current generation number of its internal 256device list to the generation passed in by the user to determine whether 257its device list has changed since the user last called the 258.Dv PCIOCGETCONF 259ioctl. 260If the device list has changed, a status of 261.Va PCI_GETCONF_LIST_CHANGED 262will be passed back. 263.It status 264The status tells the user the disposition of his request for a device list. 265The possible status values are: 266.Bl -ohang 267.It PCI_GETCONF_LAST_DEVICE 268This means that there are no more devices in the PCI device list matching 269the specified criteria after the 270ones returned in the 271.Va matches 272buffer. 273.It PCI_GETCONF_LIST_CHANGED 274This status tells the user that the 275.Tn PCI 276device list has changed since his last call to the 277.Dv PCIOCGETCONF 278ioctl and he must reset the 279.Va offset 280and 281.Va generation 282to zero to start over at the beginning of the list. 283.It PCI_GETCONF_MORE_DEVS 284This tells the user that his buffer was not large enough to hold all of the 285remaining devices in the device list that match his criteria. 286.It PCI_GETCONF_ERROR 287This indicates a general error while servicing the user's request. 288If the 289.Va pat_buf_len 290is not equal to 291.Va num_patterns 292times 293.Fn sizeof "struct pci_match_conf" , 294.Va errno 295will be set to 296.Er EINVAL . 297.El 298.El 299.It PCIOCREAD 300This 301.Xr ioctl 2 302reads the 303.Tn PCI 304configuration registers specified by the passed-in 305.Va pci_io 306structure. 307The 308.Va pci_io 309structure consists of the following fields: 310.Bl -tag -width pi_width 311.It pi_sel 312A 313.Va pcisel 314structure which specifies the domain, bus, slot and function the user would 315like to query. 316If the specific bus is not found, errno will be set to ENODEV and -1 returned 317from the ioctl. 318.It pi_reg 319The 320.Tn PCI 321configuration registers the user would like to access. 322.It pi_width 323The width, in bytes, of the data the user would like to read. 324This value 325may be either 1, 2, or 4. 3263-byte reads and reads larger than 4 bytes are 327not supported. 328If an invalid width is passed, errno will be set to EINVAL. 329.It pi_data 330The data returned by the kernel. 331.El 332.It PCIOCWRITE 333This 334.Xr ioctl 2 335allows users to write to the 336.Tn PCI 337configuration registers specified in the passed-in 338.Va pci_io 339structure. 340The 341.Va pci_io 342structure is described above. 343The limitations on data width described for 344reading registers, above, also apply to writing 345.Tn PCI 346configuration registers. 347.It PCIOCATTACHED 348This 349.Xr ioctl 2 350allows users to query if a driver is attached to the 351.Tn PCI 352device specified in the passed-in 353.Va pci_io 354structure. 355The 356.Va pci_io 357structure is described above, however, the 358.Va pi_reg 359and 360.Va pi_width 361fields are not used. 362The status of the device is stored in the 363.Va pi_data 364field. 365A value of 0 indicates no driver is attached, while a value larger than 0 366indicates that a driver is attached. 367.It PCIOCBARMMAP 368This 369.Xr ioctl 2 370command allows userspace processes to 371.Xr mmap 2 372the memory-mapped PCI BAR into its address space. 373The input parameters and results are passed in the 374.Va pci_bar_mmap 375structure, which has the following fields: 376.Bl -tag -width Vt struct pcise pbm_sel 377.It Vt uint64_t pbm_map_base 378Reports the established mapping base to the caller. 379If 380.Va PCIIO_BAR_MMAP_FIXED 381flag was specified, then this field must be filled before the call 382with the desired address for the mapping. 383.It Vt uint64_t pbm_map_length 384Reports the mapped length of the BAR, in bytes. 385Its .Vt uint64_t value is always multiple of machine pages. 386.It Vt int64_t pbm_bar_length 387Reports length of the bar as exposed by the device. 388.It Vt int pbm_bar_off 389Reports offset from the mapped base to the start of the 390first register in the bar. 391.It Vt struct pcisel pbm_sel 392Should be filled before the call. 393Describes the device to operate on. 394.It Vt int pbm_reg 395The BAR index to mmap. 396.It Vt int pbm_flags 397Flags which augments the operation. 398See below. 399.It Vt int pbm_memattr 400The caching attribute for the mapping. 401Typical values are 402.Dv VM_MEMATTR_UNCACHEABLE 403for control registers BARs, and 404.Dv VM_MEMATTR_WRITE_COMBINING 405for frame buffers. 406Regular memory-like BAR should be mapped with 407.Dv VM_MEMATTR_DEFAULT 408attribute. 409.El 410.Pp 411Currently defined flags are: 412.Bl -tag -width PCIIO_BAR_MMAP_ACTIVATE 413.It PCIIO_BAR_MMAP_FIXED 414The resulted mappings should be established at the address 415specified by the 416.Va pbm_map_base 417member, otherwise fail. 418.It PCIIO_BAR_MMAP_EXCL 419Must be used together with 420.Dv PCIIO_BAR_MMAP_FIXED 421If the specified base contains already established mappings, the 422operation fails instead of implicitly unmapping them. 423.It PCIIO_BAR_MMAP_RW 424The requested mapping allows both reading and writing. 425Without the flag, read-only mapping is established. 426Note that it is common for the device registers to have side-effects 427even on reads. 428.It PCIIO_BAR_MMAP_ACTIVATE 429(Unimplemented) If the BAR is not activated, activate it in the course 430of mapping. 431Currently attempt to mmap an inactive BAR results in error. 432.El 433.El 434.Sh LOADER TUNABLES 435Tunables can be set at the 436.Xr loader 8 437prompt before booting the kernel, or stored in 438.Xr loader.conf 5 . 439The current value of these tunables can be examined at runtime via 440.Xr sysctl 8 441nodes of the same name. 442Unless otherwise specified, 443each of these tunables is a boolean that can be enabled by setting the 444tunable to a non-zero value. 445.Bl -tag -width indent 446.It Va hw.pci.clear_bars Pq Defaults to 0 447Ignore any firmware-assigned memory and I/O port resources. 448This forces the 449.Tn PCI 450bus driver to allocate resource ranges for memory and I/O port resources 451from scratch. 452.It Va hw.pci.clear_buses Pq Defaults to 0 453Ignore any firmware-assigned bus number registers in PCI-PCI bridges. 454This forces the 455.Tn PCI 456bus driver and PCI-PCI bridge driver to allocate bus numbers for secondary 457buses behind PCI-PCI bridges. 458.It Va hw.pci.clear_pcib Pq Defaults to 0 459Ignore any firmware-assigned memory and I/O port resource windows in PCI-PCI 460bridges. 461This forces the PCI-PCI bridge driver to allocate memory and I/O port resources 462for resource windows from scratch. 463.Pp 464By default the PCI-PCI bridge driver will allocate windows that 465contain the firmware-assigned resources devices behind the bridge. 466In addition, the PCI-PCI bridge driver will suballocate from existing window 467regions when possible to satisfy a resource request. 468As a result, 469both 470.Va hw.pci.clear_bars 471and 472.Va hw.pci.clear_pcib 473must be enabled to fully ignore firmware-supplied resource assignments. 474.It Va hw.pci.default_vgapci_unit Pq Defaults to -1 475By default, 476the first 477.Tn PCI 478VGA adapter encountered by the system is assumed to be the boot display device. 479This tunable can be set to choose a specific VGA adapter by specifying the 480unit number of the associated 481.Va vgapci Ns Ar X 482device. 483.It Va hw.pci.do_power_nodriver Pq Defaults to 0 484Place devices into a low power state 485.Pq D3 486when a suitable device driver is not found. 487Can be set to one of the following values: 488.Bl -tag -width indent 489.It 3 490Powers down all 491.Tn PCI 492devices without a device driver. 493.It 2 494Powers down most devices without a device driver. 495PCI devices with the display, memory, and base peripheral device classes 496are not powered down. 497.It 1 498Similar to a setting of 2 except that storage controllers are also not 499powered down. 500.It 0 501All devices are left fully powered. 502.El 503.Pp 504A 505.Tn PCI 506device must support power management to be powered down. 507Placing a device into a low power state may not reduce power consumption. 508.It Va hw.pci.do_power_resume Pq Defaults to 1 509Place 510.Tn PCI 511devices into the fully powered state when resuming either the system or an 512individual device. 513Setting this to zero is discouraged as the system will not attempt to power 514up non-powered PCI devices after a suspend. 515.It Va hw.pci.do_power_suspend Pq Defaults to 1 516Place 517.Tn PCI 518devices into a low power state when suspending either the system or individual 519devices. 520Normally the D3 state is used as the low power state, 521but firmware may override the desired power state during a system suspend. 522.It Va hw.pci.enable_ari Pq Defaults to 1 523Enable support for PCI-express Alternative RID Interpretation. 524This is often used in conjunction with SR-IOV. 525.It Va hw.pci.enable_io_modes Pq Defaults to 1 526Enable memory or I/O port decoding in a PCI device's command register if it has 527firmware-assigned memory or I/O port resources. 528The firmware 529.Pq BIOS 530in some systems does not enable memory or I/O port decoding for some devices 531even when it has assigned resources to the device. 532This enables decoding for such resources during bus probe. 533.It Va hw.pci.enable_msi Pq Defaults to 1 534Enable support for Message Signalled Interrupts 535.Pq MSI . 536MSI interrupts can be disabled by setting this tunable to 0. 537.It Va hw.pci.enable_msix Pq Defaults to 1 538Enable support for extended Message Signalled Interrupts 539.Pq MSI-X . 540MSI-X interrupts can be disabled by setting this tunable to 0. 541.It Va hw.pci.enable_pcie_hp Pq Defaults to 1 542Enable support for native PCI-express HotPlug. 543.It Va hw.pci.honor_msi_blacklist Pq Defaults to 1 544MSI and MSI-X interrupts are disabled for certain chipsets known to have 545broken MSI and MSI-X implementations when this tunable is set. 546It can be set to zero to permit use of MSI and MSI-X interrupts if the 547chipset match is a false positive. 548.It Va hw.pci.iov_max_config Pq Defaults to 1MB 549The maximum amount of memory permitted for the configuration parameters 550used when creating Virtual Functions via SR-IOV. 551This tunable can also be changed at runtime via 552.Xr sysctl 8 . 553.It Va hw.pci.realloc_bars Pq Defaults to 0 554Attempt to allocate a new resource range during the initial device scan 555for any memory or I/O port resources with firmware-assigned ranges that 556conflict with another active resource. 557.It Va hw.pci.usb_early_takeover Pq Defaults to 1 on Tn amd64 and Tn i386 558Disable legacy device emulation of USB devices during the initial device 559scan. 560Set this tunable to zero to use USB devices via legacy emulation when 561using a custom kernel without USB controller drivers. 562.It Va hw.pci<D>.<B>.<S>.INT<P>.irq 563These tunables can be used to override the interrupt routing for legacy 564PCI INTx interrupts. 565Unlike other tunables in this list, 566these do not have corresponding sysctl nodes. 567The tunable name includes the address of the PCI device as well as the 568pin of the desired INTx IRQ to override: 569.Bl -tag -width indent 570.It <D> 571The domain 572.Pq or segment 573of the PCI device in decimal. 574.It <B> 575The bus address of the PCI device in decimal. 576.It <S> 577The slot of the PCI device in decimal. 578.It <P> 579The interrupt pin of the PCI slot to override. 580One of 581.Ql A , 582.Ql B , 583.Ql C , 584or 585.Ql D . 586.El 587.Pp 588The value of the tunable is the raw IRQ value to use for the INTx interrupt 589pin identified by the tunable name. 590Mapping of IRQ values to platform interrupt sources is machine dependent. 591.El 592.Sh DEVICE WIRING 593You can wire the device unit at a given location with device.hints. 594Entries of the form 595.Va hints.<name>.<unit>.at="pci<B>:<S>:<F>" 596or 597.Va hints.<name>.<unit>.at="pci<D>:<B>:<S>:<F>" 598will force the driver 599.Va name 600to probe and attach at unit 601.Va unit 602for any PCI device found to match the specification, where: 603.Bl -tag -width -indent 604.It <D> 605The domain 606.Pq or segment 607of the PCI device in decimal. 608Defaults to 0 if unspecified 609.It <B> 610The bus address of the PCI device in decimal. 611.It <S> 612The slot of the PCI device in decimal. 613.It <F> 614The function of the PCI device in decimal. 615.El 616.Pp 617The code to do the matching requires an exact string match. 618Do not specify the angle brackets 619.Pq < > 620in the hints file. 621Wiring multiple devices to the same 622.Va name 623and 624.Va unit 625produces undefined results. 626.Ss Examples 627Given the following lines in 628.Pa /boot/device.hints : 629.Cd hint.nvme.3.at="pci6:0:0" 630.Cd hint.igb.8.at="pci14:0:0" 631If there is a device that supports 632.Xr igb 4 633at PCI bus 14 slot 0 function 0, 634then it will be assigned igb8 for probe and attach. 635Likewise, if there is an 636.Xr nvme 4 637card at PCI bus 6 slot 0 function 0, 638then it will be assigned nvme3 for probe and attach. 639If another type of card is in either of these locations, the name and 640unit of that card will be the default names and will be unaffected by 641these hints. 642If other igb or nvme cards are located elsewhere, they will be 643assigned their unit numbers sequentially, skipping the unit numbers 644that have 'at' hints. 645.Sh FILES 646.Bl -tag -width /dev/pci -compact 647.It Pa /dev/pci 648Character device for the 649.Nm 650driver. 651.El 652.Sh SEE ALSO 653.Xr pciconf 8 654.Sh HISTORY 655The 656.Nm 657driver (not the kernel's 658.Tn PCI 659support code) first appeared in 660.Fx 2.2 , 661and was written by Stefan Esser and Garrett Wollman. 662Support for device listing and matching was re-implemented by 663Kenneth Merry, and first appeared in 664.Fx 3.0 . 665.Sh AUTHORS 666.An Kenneth Merry Aq Mt ken@FreeBSD.org 667.Sh BUGS 668It is not possible for users to specify an accurate offset into the device 669list without calling the 670.Dv PCIOCGETCONF 671at least once, since they have no way of knowing the current generation 672number otherwise. 673This probably is not a serious problem, though, since 674users can easily narrow their search by specifying a pattern or patterns 675for the kernel to match against. 676