1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 24 * Copyright 2019 Joyent, Inc. 25 * Copyright 2023 Oxide Computer Company 26 */ 27 28 /* 29 * PCIe Initialization 30 * ------------------- 31 * 32 * The PCIe subsystem is split about and initializes itself in a couple of 33 * different places. This is due to the platform-specific nature of initializing 34 * resources and the nature of the SPARC PROM and how that influenced the 35 * subsystem. Note that traditional PCI (mostly seen these days in Virtual 36 * Machines) follows most of the same basic path outlined here, but skips a 37 * large chunk of PCIe-specific initialization. 38 * 39 * First, there is an initial device discovery phase that is taken care of by 40 * the platform. This is where we discover the set of devices that are present 41 * at system power on. These devices may or may not be hot-pluggable. In 42 * particular, this happens in a platform-specific way right now. In general, we 43 * expect most discovery to be driven by scanning each bus, device, and 44 * function, and seeing what actually exists and responds to configuration space 45 * reads. This is driven via pci_boot.c on x86. This may be seeded by something 46 * like device tree, a PROM, supplemented with ACPI, or by knowledge that the 47 * underlying platform has. 48 * 49 * As a part of this discovery process, the full set of resources that exist in 50 * the system for PCIe are: 51 * 52 * o PCI buses 53 * o Prefetchable Memory 54 * o Non-prefetchable memory 55 * o I/O ports 56 * 57 * This process is driven by a platform's PCI platform Resource Discovery (PRD) 58 * module. The PRD definitions can be found in <sys/plat/pci_prd.h> and are used 59 * to discover these resources, which will be converted into the initial set of 60 * the standard properties in the system: 'regs', 'available', 'ranges', etc. 61 * Currently it is up to platform-specific code (which should ideally be 62 * consolidated at some point) to set up all these properties. 63 * 64 * As a part of the discovery process, the platform code will create a device 65 * node (dev_info_t) for each discovered function and will create a PCIe nexus 66 * for each overall root complex that exists in the system. Most root complexes 67 * will have multiple root ports, each of which is the foundation of an 68 * independent PCIe bus due to the point-to-point nature of PCIe. When a root 69 * complex is found, a nexus driver such as npe (Nexus for PCIe Express) is 70 * attached. In the case of a non-PCIe-capable system this is where the older 71 * pci nexus driver would be used instead. 72 * 73 * To track data about a given device on a bus, a 'pcie_bus_t' structure is 74 * created for and assigned to every PCIe-based dev_info_t. This can be used to 75 * find the root port and get basic information about the device, its faults, 76 * and related information. This contains pointers to the corresponding root 77 * port as well. 78 * 79 * A root complex has its pcie_bus_t initialized as part of the device discovery 80 * process. That is, because we're trying to bootstrap the actual tree and most 81 * platforms don't have a representation for this that's explicitly 82 * discoverable, this is created manually. See callers of pcie_rc_init_bus(). 83 * 84 * For other devices, bridges, and switches, the process is split into two. 85 * There is an initial pcie_bus_t that is created which will exist before we go 86 * through the actual driver attachment process. For example, on x86 this is 87 * done as part of the device and function discovery. The second pass of 88 * initialization is done only after the nexus driver actually is attached and 89 * it goes through and finishes processing all of its children. 90 * 91 * Child Initialization 92 * -------------------- 93 * 94 * Generally speaking, the platform will first enumerate all PCIe devices that 95 * are in the sytem before it actually creates a device tree. This is part of 96 * the bus/device/function scanning that is performed and from that dev_info_t 97 * nodes are created for each discovered device and are inserted into the 98 * broader device tree. Later in boot, the actual device tree is walked and the 99 * nodes go through the standard dev_info_t initialization process (DS_PROTO, 100 * DS_LINKED, DS_BOUND, etc.). 101 * 102 * PCIe-specific initialization can roughly be broken into the following pieces: 103 * 104 * 1. Platform initial discovery and resource assignment 105 * 2. The pcie_bus_t initialization 106 * 3. Nexus driver child initialization 107 * 4. Fabric initialization 108 * 5. Device driver-specific initialization 109 * 110 * The first part of this (1) and (2) are discussed in the previous section. 111 * Part (1) in particular is a combination of the PRD (platform resource 112 * discovery) and general device initialization. After this, because we have a 113 * device tree, most of the standard nexus initialization happens. 114 * 115 * (5) is somewhat simple, so let's get into it before we discuss (3) and (4). 116 * This is the last thing that is called and that happens after all of the 117 * others are done. This is the logic that occurs in a driver's attach(9E) entry 118 * point. This is always device-specific and generally speaking should not be 119 * manipulating standard PCIe registers directly on their own. For example, the 120 * MSI/MSI-X, AER, Serial Number, etc. capabilities will be automatically dealt 121 * with by the framework in (3) and (4) below. In many cases, particularly 122 * things that are part of (4), adjusting them in the individual driver is not 123 * safe. 124 * 125 * Finally, let's talk about (3) and (4) as these are related. The NDI provides 126 * for a standard hook for a nexus to initialize its children. In our platforms, 127 * there are basically two possible PCIe nexus drivers: there is the generic 128 * pcieb -- PCIe bridge -- driver which is used for standard root ports, 129 * switches, etc. Then there is the platform-specific primary nexus driver, 130 * which is being slowly consolidated into a single one where it makes sense. An 131 * example of this is npe. 132 * 133 * Each of these has a child initialization function which is called from their 134 * DDI_CTLOPS_INITCHILD operation on the bus_ctl function pointer. This goes 135 * through and initializes a large number of different pieces of PCIe-based 136 * settings through the common pcie_initchild() function. This takes care of 137 * things like: 138 * 139 * o Advanced Error Reporting 140 * o Alternative Routing 141 * o Capturing information around link speed, width, serial numbers, etc. 142 * o Setting common properties around aborts 143 * 144 * There are a few caveats with this that need to be kept in mind: 145 * 146 * o A dev_info_t indicates a specific function. This means that a 147 * multi-function device will not all be initialized at the same time and 148 * there is no guarantee that all children will be initialized before one of 149 * them is attached. 150 * o A child is only initialized if we have found a driver that matches an 151 * alias in the dev_info_t's compatible array property. While a lot of 152 * multi-function devices are often multiple instances of the same thing 153 * (e.g. a multi-port NIC with a function / NIC), this is not always the 154 * case and one cannot make any assumptions here. 155 * 156 * This in turn leads to the next form of initialization that takes place in the 157 * case of (4). This is where we take care of things that need to be consistent 158 * across either entire devices or more generally across an entire root port and 159 * all of its children. There are a few different examples of this: 160 * 161 * o Setting the maximum packet size 162 * o Determining the tag width 163 * 164 * Note that features which are only based on function 0, such as ASPM (Active 165 * State Power Management), hardware autonomous width disable, etc. ultimately 166 * do not go through this path today. There are some implications here in that 167 * today several of these things are captured on functions which may not have 168 * any control here. This is an area of needed improvement. 169 * 170 * The settings in (4) are initialized in a common way, via 171 * pcie_fabric_setup(). This is called into from two different parts of 172 * the stack: 173 * 174 * 1. When we attach a root port, which is driven by pcieb. 175 * 2. When we have a hotplug event that adds a device. 176 * 177 * In general here we are going to use the term 'fabric' to refer to everything 178 * that is downstream of a root port. This corresponds to what the PCIe 179 * specification calls a 'hierarchy domain'. Strictly speaking, this is fine 180 * until peer-to-peer requests begin to happen that cause you to need to forward 181 * things across root ports. At that point the scope of the fabric increases and 182 * these settings become more complicated. We currently optimize for the much 183 * more common case, which is that each root port is effectively independent 184 * from a PCIe transaction routing perspective. 185 * 186 * Put differently, we use the term 'fabric' to refer to a set of PCIe devices 187 * that can route transactions to one another, which is generally constrained to 188 * everything under a root port and that root ports are independent. If this 189 * constraint changes, then all one needs to do is replace the discussion of the 190 * root port below with the broader root complex and system. 191 * 192 * A challenge with these settings is that once they're set and devices are 193 * actively making requests, we cannot really change them without resetting the 194 * links and cancelling all outstanding transactions via device resets. Because 195 * this is not something that we want to do, we instead look at how and when we 196 * set this to constrain what's going on. 197 * 198 * Because of this we basically say that if a given fabric has more than one 199 * hot-plug capable device that's encountered, then we have to use safe defaults 200 * (which we can allow an operator to tune eventually via pcieadm). If we have a 201 * mix of non-hotpluggable slots with downstream endpoints present and 202 * hot-pluggable slots, then we're in this case. If we don't have hot-pluggable 203 * slots, then we can have an arbitrarily complex setup. Let's look at a few of 204 * these visually: 205 * 206 * In the following diagrams, RP stands for Root Port, EP stands for Endpoint. 207 * If something is hot-pluggable, then we label it with (HP). 208 * 209 * (1) RP --> EP 210 * (2) RP --> Switch --> EP 211 * +--> EP 212 * +--> EP 213 * 214 * (3) RP --> Switch --> EP 215 * +--> EP 216 * +--> Switch --> EP 217 * +--> EP 218 * +--> EP 219 * 220 * 221 * (4) RP (HP) --> EP 222 * (5) RP (HP) --> Switch --> EP 223 * +--> EP 224 * +--> EP 225 * 226 * (6) RP --> Switch (HP) --> EP 227 * (7) RP (HP) --> Switch (HP) --> EP 228 * 229 * If we look at all of these, these are all cases where it's safe for us to set 230 * things based on all devices. (1), (2), and (3) are straightforward because 231 * they have no hot-pluggable elements. This means that nothing should come/go 232 * on the system and we can set up fabric-wide properties as part of the root 233 * port. 234 * 235 * Case (4) is the most standard one that we encounter for hot-plug. Here you 236 * have a root port directly connected to an endpoint. The most common example 237 * would be an NVMe device plugged into a root port. Case (5) is interesting to 238 * highlight. While there is a switch and multiple endpoints there, they are 239 * showing up as a unit. This ends up being a weirder variant of (4), but it is 240 * safe for us to set advanced properties because we can figure out what the 241 * total set should be. 242 * 243 * Now, the more interesting bits here are (6) and (7). The reason that (6) 244 * works is that ultimately there is only a single down-stream port here that is 245 * hot-pluggable and all non-hotpluggable ports do not have a device present, 246 * which suggests that they will never have a device present. (7) also could be 247 * made to work by making the observation that if there's truly only one 248 * endpoint in a fabric, it doesn't matter how many switches there are that are 249 * hot-pluggable. This would only hold if we can assume for some reason that no 250 * other endpoints could be added. 251 * 252 * In turn, let's look at several cases that we believe aren't safe: 253 * 254 * (8) RP --> Switch --> EP 255 * +--> EP 256 * (HP) +--> EP 257 * 258 * (9) RP --> Switch (HP) +--> EP 259 * (HP) +--> EP 260 * 261 * (10) RP (HP) --> Switch (HP) +--> EP 262 * (HP) +--> EP 263 * 264 * All of these are situations where it's much more explicitly unsafe. Let's 265 * take (8). The problem here is that the devices on the non-hotpluggable 266 * downstream switches are always there and we should assume all device drivers 267 * will be active and performing I/O when the hot-pluggable slot changes. If the 268 * hot-pluggable slot has a lower max payload size, then we're mostly out of 269 * luck. The case of (9) is very similar to (8), just that we have more hot-plug 270 * capable slots. 271 * 272 * Finally (10) is a case of multiple instances of hotplug. (9) and (10) are the 273 * more general case of (6) and (7). While we can try to detect (6) and (7) more 274 * generally or try to make it safe, we're going to start with a simpler form of 275 * detection for this, which roughly follows the following rules: 276 * 277 * o If there are no hot-pluggable slots in an entire fabric, then we can set 278 * all fabric properties based on device capabilities. 279 * o If we encounter a hot-pluggable slot, we can only set fabric properties 280 * based on device capabilities if: 281 * 282 * 1. The hotpluggable slot is a root port. 283 * 2. There are no other hotpluggable devices downstream of it. 284 * 285 * Otherwise, if neither of the above is true, then we must use the basic PCIe 286 * defaults for various fabric-wide properties (discussed below). Even in these 287 * more complicated cases, device-specific properties such as the configuration 288 * of AERs, ASPM, etc. are still handled in the general pcie_init_bus() and 289 * related discussed earlier here. 290 * 291 * Because the only fabrics that we'll change are those that correspond to root 292 * ports, we will only call into the actual fabric feature setup when one of 293 * those changes. This has the side effect of simplifying locking. When we make 294 * changes here we need to be able to hold the entire device tree under the root 295 * port (including the root port and its parent). This is much harder to do 296 * safely when starting in the middle of the tree. 297 * 298 * Handling of Specific Properties 299 * ------------------------------- 300 * 301 * This section goes into the rationale behind how we initialize and program 302 * various parts of the PCIe stack. 303 * 304 * 5-, 8-, 10- AND 14-BIT TAGS 305 * 306 * Tags are part of PCIe transactions and when combined with a device identifier 307 * are used to uniquely identify a transaction. In PCIe parlance, a Requester 308 * (someone who initiates a PCIe request) sets a unique tag in the request and 309 * the Completer (someone who processes and responds to a PCIe request) echoes 310 * the tag back. This means that a requester generally is responsible for 311 * ensuring that they don't reuse a tag between transactions. 312 * 313 * Thus the number of tags that a device has relates to the number of 314 * outstanding transactions that it can have, which are usually tied to the 315 * number of outstanding DMA transfers. The size of these transactions is also 316 * then scoped by the handling of the Maximum Packet Payload. 317 * 318 * In PCIe 1.0, devices default to a 5-bit tag. There was also an option to 319 * support an 8-bit tag. The 8-bit extended tag did not distinguish between a 320 * Requester or Completer. There was a bit to indicate device support of 8-bit 321 * tags in the Device Capabilities Register of the PCIe Capability and a 322 * separate bit to enable it in the Device Control Register of the PCIe 323 * Capability. 324 * 325 * In PCIe 4.0, support for a 10-bit tag was added. The specification broke 326 * apart the support bit into multiple pieces. In particular, in the Device 327 * Capabilities 2 register of the PCIe Capability there is a separate bit to 328 * indicate whether the device supports 10-bit completions and 10-bit requests. 329 * All PCIe 4.0 compliant devices are required to support 10-bit tags if they 330 * operate at 16.0 GT/s speed (a PCIe Gen 4 compliant device does not have to 331 * operate at Gen 4 speeds). 332 * 333 * This allows a device to support 10-bit completions but not 10-bit requests. 334 * A device that supports 10-bit requests is required to support 10-bit 335 * completions. There is no ability to enable or disable 10-bit completion 336 * support in the Device Capabilities 2 register. There is only a bit to enable 337 * 10-bit requests. This distinction makes our life easier as this means that as 338 * long as the entire fabric supports 10-bit completions, it doesn't matter if 339 * not all devices support 10-bit requests and we can enable them as required. 340 * More on this in a bit. 341 * 342 * In PCIe 6.0, another set of bits was added for 14-bit tags. These follow the 343 * same pattern as the 10-bit tags. The biggest difference is that the 344 * capabilities and control for these are found in the Device Capabilities 3 345 * and Device Control 3 register of the Device 3 Extended Capability. Similar to 346 * what we see with 10-bit tags, requesters are required to support the 347 * completer capability. The only control bit is for whether or not they enable 348 * a 14-bit requester. 349 * 350 * PCIe switches which sit between root ports and endpoints and show up to 351 * software as a set of bridges. Bridges generally don't have to know about tags 352 * as they are usually neither requesters or completers (unless directly talking 353 * to the bridge instance). That is they are generally required to forward 354 * packets without modifying them. This works until we deal with switch error 355 * handling. At that point, the switch may try to interpret the transaction and 356 * if it doesn't understand the tagging scheme in use, return the transaction to 357 * with the wrong tag and also an incorrectly diagnosed error (usually a 358 * malformed TLP). 359 * 360 * With all this, we construct a somewhat simple policy of how and when we 361 * enable extended tags: 362 * 363 * o If we have a complex hotplug-capable fabric (based on the discussion 364 * earlier in fabric-specific settings), then we cannot enable any of the 365 * 8-bit, 10-bit, and 14-bit tagging features. This is due to the issues 366 * with intermediate PCIe switches and related. 367 * 368 * o If every device supports 8-bit capable tags, then we will go through and 369 * enable those everywhere. 370 * 371 * o If every device supports 10-bit capable completions, then we will enable 372 * 10-bit requester on every device that supports it. 373 * 374 * o If every device supports 14-bit capable completions, then we will enable 375 * 14-bit requesters on every device that supports it. 376 * 377 * This is the simpler end of the policy and one that is relatively easy to 378 * implement. While we could attempt to relax the constraint that every device 379 * in the fabric implement these features by making assumptions about peer-to- 380 * peer requests (that is devices at the same layer in the tree won't talk to 381 * one another), that is a lot of complexity. For now, we leave such an 382 * implementation to those who need it in the future. 383 * 384 * MAX PAYLOAD SIZE 385 * 386 * When performing transactions on the PCIe bus, a given transaction has a 387 * maximum allowed size. This size is called the MPS or 'Maximum Payload Size'. 388 * A given device reports its maximum supported size in the Device Capabilities 389 * register of the PCIe Capability. It is then set in the Device Control 390 * register. 391 * 392 * One of the challenges with this value is that different functions of a device 393 * have independent values, but strictly speaking are required to actually have 394 * the same value programmed in all of them lest device behavior goes awry. When 395 * a device has the ARI (alternative routing ID) capability enabled, then only 396 * function 0 controls the actual payload size. 397 * 398 * The settings for this need to be consistent throughout the fabric. A 399 * Transmitter is not allowed to create a TLP that exceeds its maximum packet 400 * size and a Receiver is not allowed to receive a packet that exceeds its 401 * maximum packet size. In all of these cases, this would result in something 402 * like a malformed TLP error. 403 * 404 * Effectively, this means that everything on a given fabric must have the same 405 * value programmed in its Device Control register for this value. While in the 406 * case of tags, switches generally weren't completers or requesters, here every 407 * device along the path is subject to this. This makes the actual value that we 408 * set throughout the fabric even more important and the constraints of hotplug 409 * even worse to deal with. 410 * 411 * Because a hotplug device can be inserted with any packet size, if we hit 412 * anything other than the simple hotplug cases discussed in the fabric-specific 413 * settings section, then we must use the smallest size of 128 byte payloads. 414 * This is because a device could be plugged in that supports something smaller 415 * than we had otherwise set. If there are other active devices, those could not 416 * be changed without quiescing the entire fabric. As such our algorithm is as 417 * follows: 418 * 419 * 1. Scan the entire fabric, keeping track of the smallest seen MPS in the 420 * Device Capabilities Register. 421 * 2. If we have a complex fabric, program each Device Control register with 422 * a 128 byte maximum payload size, otherwise, program it with the 423 * discovered value. 424 * 425 * 426 * MAX READ REQUEST SIZE 427 * 428 * The maximum read request size (mrrs) is a much more confusing thing when 429 * compared to the maximum payload size counterpart. The maximum payload size 430 * (MPS) above is what restricts the actual size of a TLP. The mrrs value 431 * is used to control part of the behavior of Memory Read Request, which is not 432 * strictly speaking subject to the MPS. A PCIe device is allowed to respond to 433 * a Memory Read Request with less bytes than were actually requested in a 434 * single completion. In general, the default size that a root complex and its 435 * root port will reply to are based around the length of a cache line. 436 * 437 * What this ultimately controls is the number of requests that the Requester 438 * has to make and trades off bandwidth, bus sharing, and related here. For 439 * example, if the maximum read request size is 4 KiB, then the requester would 440 * only issue a single read request asking for 4 KiB. It would still receive 441 * these as multiple packets in units of the MPS. If however, the maximum read 442 * request was only say 512 B, then it would need to make 8 separate requests, 443 * potentially increasing latency. On the other hand, if systems are relying on 444 * total requests for QoS, then it's important to set it to something that's 445 * closer to the actual MPS. 446 * 447 * Traditionally, the OS has not been the most straightforward about this. It's 448 * important to remember that setting this up is also somewhat in the realm of 449 * system firmware. Due to the PCI Firmware specification, the firmware may have 450 * set up a value for not just the MRRS but also the MPS. As such, our logic 451 * basically left the MRRS alone and used whatever the device had there as long 452 * as we weren't shrinking the device's MPS. If we were, then we'd set it to the 453 * MPS. If the device was a root port, then it was just left at a system wide 454 * and PCIe default of 512 bytes. 455 * 456 * If we survey firmware (which isn't easy due to its nature), we have seen most 457 * cases where the firmware just doesn't do anything and leaves it to the 458 * device's default, which is basically just the PCIe default, unless it has a 459 * specific knowledge of something like say wanting to do something for an NVMe 460 * device. The same is generally true of other systems, leaving it at its 461 * default unless otherwise set by a device driver. 462 * 463 * Because this value doesn't really have the same constraints as other fabric 464 * properties, this becomes much simpler and we instead opt to set it as part of 465 * the device node initialization. In addition, there are no real rules about 466 * different functions having different values here as it doesn't really impact 467 * the TLP processing the same way that the MPS does. 468 * 469 * While we should add a fuller way of setting this and allowing operator 470 * override of the MRRS based on things like device class, etc. that is driven 471 * by pcieadm, that is left to the future. For now we opt to that all devices 472 * are kept at their default (512 bytes or whatever firmware left behind) and we 473 * ensure that root ports always have the mrrs set to 512. 474 */ 475 476 #include <sys/sysmacros.h> 477 #include <sys/types.h> 478 #include <sys/kmem.h> 479 #include <sys/modctl.h> 480 #include <sys/ddi.h> 481 #include <sys/sunddi.h> 482 #include <sys/sunndi.h> 483 #include <sys/fm/protocol.h> 484 #include <sys/fm/util.h> 485 #include <sys/promif.h> 486 #include <sys/disp.h> 487 #include <sys/stat.h> 488 #include <sys/file.h> 489 #include <sys/pci_cap.h> 490 #include <sys/pci_impl.h> 491 #include <sys/pcie_impl.h> 492 #include <sys/hotplug/pci/pcie_hp.h> 493 #include <sys/hotplug/pci/pciehpc.h> 494 #include <sys/hotplug/pci/pcishpc.h> 495 #include <sys/hotplug/pci/pcicfg.h> 496 #include <sys/pci_cfgacc.h> 497 #include <sys/sysevent.h> 498 #include <sys/sysevent/eventdefs.h> 499 #include <sys/sysevent/pcie.h> 500 501 /* Local functions prototypes */ 502 static void pcie_init_pfd(dev_info_t *); 503 static void pcie_fini_pfd(dev_info_t *); 504 505 #if defined(__x86) 506 static void pcie_check_io_mem_range(ddi_acc_handle_t, boolean_t *, boolean_t *); 507 #endif /* defined(__x86) */ 508 509 #ifdef DEBUG 510 uint_t pcie_debug_flags = 0; 511 static void pcie_print_bus(pcie_bus_t *bus_p); 512 void pcie_dbg(char *fmt, ...); 513 #endif /* DEBUG */ 514 515 /* Variable to control default PCI-Express config settings */ 516 ushort_t pcie_command_default = 517 PCI_COMM_SERR_ENABLE | 518 PCI_COMM_WAIT_CYC_ENAB | 519 PCI_COMM_PARITY_DETECT | 520 PCI_COMM_ME | 521 PCI_COMM_MAE | 522 PCI_COMM_IO; 523 524 /* xxx_fw are bits that are controlled by FW and should not be modified */ 525 ushort_t pcie_command_default_fw = 526 PCI_COMM_SPEC_CYC | 527 PCI_COMM_MEMWR_INVAL | 528 PCI_COMM_PALETTE_SNOOP | 529 PCI_COMM_WAIT_CYC_ENAB | 530 0xF800; /* Reserved Bits */ 531 532 ushort_t pcie_bdg_command_default_fw = 533 PCI_BCNF_BCNTRL_ISA_ENABLE | 534 PCI_BCNF_BCNTRL_VGA_ENABLE | 535 0xF000; /* Reserved Bits */ 536 537 /* PCI-Express Base error defaults */ 538 ushort_t pcie_base_err_default = 539 PCIE_DEVCTL_CE_REPORTING_EN | 540 PCIE_DEVCTL_NFE_REPORTING_EN | 541 PCIE_DEVCTL_FE_REPORTING_EN | 542 PCIE_DEVCTL_UR_REPORTING_EN; 543 544 /* PCI-Express Device Control Register */ 545 uint16_t pcie_devctl_default = PCIE_DEVCTL_RO_EN | 546 PCIE_DEVCTL_MAX_READ_REQ_512; 547 548 /* PCI-Express AER Root Control Register */ 549 #define PCIE_ROOT_SYS_ERR (PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | \ 550 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | \ 551 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN) 552 553 ushort_t pcie_root_ctrl_default = 554 PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | 555 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | 556 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN; 557 558 /* PCI-Express Root Error Command Register */ 559 ushort_t pcie_root_error_cmd_default = 560 PCIE_AER_RE_CMD_CE_REP_EN | 561 PCIE_AER_RE_CMD_NFE_REP_EN | 562 PCIE_AER_RE_CMD_FE_REP_EN; 563 564 /* ECRC settings in the PCIe AER Control Register */ 565 uint32_t pcie_ecrc_value = 566 PCIE_AER_CTL_ECRC_GEN_ENA | 567 PCIE_AER_CTL_ECRC_CHECK_ENA; 568 569 /* 570 * If a particular platform wants to disable certain errors such as UR/MA, 571 * instead of using #defines have the platform's PCIe Root Complex driver set 572 * these masks using the pcie_get_XXX_mask and pcie_set_XXX_mask functions. For 573 * x86 the closest thing to a PCIe root complex driver is NPE. For SPARC the 574 * closest PCIe root complex driver is PX. 575 * 576 * pcie_serr_disable_flag : disable SERR only (in RCR and command reg) x86 577 * systems may want to disable SERR in general. For root ports, enabling SERR 578 * causes NMIs which are not handled and results in a watchdog timeout error. 579 */ 580 uint32_t pcie_aer_uce_mask = 0; /* AER UE Mask */ 581 uint32_t pcie_aer_ce_mask = 0; /* AER CE Mask */ 582 uint32_t pcie_aer_suce_mask = 0; /* AER Secondary UE Mask */ 583 uint32_t pcie_serr_disable_flag = 0; /* Disable SERR */ 584 585 /* Default severities needed for eversholt. Error handling doesn't care */ 586 uint32_t pcie_aer_uce_severity = PCIE_AER_UCE_MTLP | PCIE_AER_UCE_RO | \ 587 PCIE_AER_UCE_FCP | PCIE_AER_UCE_SD | PCIE_AER_UCE_DLP | \ 588 PCIE_AER_UCE_TRAINING; 589 uint32_t pcie_aer_suce_severity = PCIE_AER_SUCE_SERR_ASSERT | \ 590 PCIE_AER_SUCE_UC_ADDR_ERR | PCIE_AER_SUCE_UC_ATTR_ERR | \ 591 PCIE_AER_SUCE_USC_MSG_DATA_ERR; 592 593 int pcie_disable_ari = 0; 594 595 /* 596 * On some platforms, such as the AMD B450 chipset, we've seen an odd 597 * relationship between enabling link bandwidth notifications and AERs about 598 * ECRC errors. This provides a mechanism to disable it. 599 */ 600 int pcie_disable_lbw = 0; 601 602 /* 603 * Amount of time to wait for an in-progress retraining. The default is to try 604 * 500 times in 10ms chunks, thus a total of 5s. 605 */ 606 uint32_t pcie_link_retrain_count = 500; 607 uint32_t pcie_link_retrain_delay_ms = 10; 608 609 taskq_t *pcie_link_tq; 610 kmutex_t pcie_link_tq_mutex; 611 612 static int pcie_link_bw_intr(dev_info_t *); 613 static void pcie_capture_speeds(dev_info_t *); 614 615 dev_info_t *pcie_get_rc_dip(dev_info_t *dip); 616 617 /* 618 * modload support 619 */ 620 621 static struct modlmisc modlmisc = { 622 &mod_miscops, /* Type of module */ 623 "PCI Express Framework Module" 624 }; 625 626 static struct modlinkage modlinkage = { 627 MODREV_1, 628 (void *)&modlmisc, 629 NULL 630 }; 631 632 /* 633 * Global Variables needed for a non-atomic version of ddi_fm_ereport_post. 634 * Currently used to send the pci.fabric ereports whose payload depends on the 635 * type of PCI device it is being sent for. 636 */ 637 char *pcie_nv_buf; 638 nv_alloc_t *pcie_nvap; 639 nvlist_t *pcie_nvl; 640 641 int 642 _init(void) 643 { 644 int rval; 645 646 pcie_nv_buf = kmem_alloc(ERPT_DATA_SZ, KM_SLEEP); 647 pcie_nvap = fm_nva_xcreate(pcie_nv_buf, ERPT_DATA_SZ); 648 pcie_nvl = fm_nvlist_create(pcie_nvap); 649 mutex_init(&pcie_link_tq_mutex, NULL, MUTEX_DRIVER, NULL); 650 651 if ((rval = mod_install(&modlinkage)) != 0) { 652 mutex_destroy(&pcie_link_tq_mutex); 653 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 654 fm_nva_xdestroy(pcie_nvap); 655 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 656 } 657 return (rval); 658 } 659 660 int 661 _fini() 662 { 663 int rval; 664 665 if ((rval = mod_remove(&modlinkage)) == 0) { 666 if (pcie_link_tq != NULL) { 667 taskq_destroy(pcie_link_tq); 668 } 669 mutex_destroy(&pcie_link_tq_mutex); 670 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 671 fm_nva_xdestroy(pcie_nvap); 672 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 673 } 674 return (rval); 675 } 676 677 int 678 _info(struct modinfo *modinfop) 679 { 680 return (mod_info(&modlinkage, modinfop)); 681 } 682 683 /* ARGSUSED */ 684 int 685 pcie_init(dev_info_t *dip, caddr_t arg) 686 { 687 int ret = DDI_SUCCESS; 688 689 /* 690 * Our _init function is too early to create a taskq. Create the pcie 691 * link management taskq here now instead. 692 */ 693 mutex_enter(&pcie_link_tq_mutex); 694 if (pcie_link_tq == NULL) { 695 pcie_link_tq = taskq_create("pcie_link", 1, minclsyspri, 0, 0, 696 0); 697 } 698 mutex_exit(&pcie_link_tq_mutex); 699 700 701 /* 702 * Create a "devctl" minor node to support DEVCTL_DEVICE_* 703 * and DEVCTL_BUS_* ioctls to this bus. 704 */ 705 if ((ret = ddi_create_minor_node(dip, "devctl", S_IFCHR, 706 PCI_MINOR_NUM(ddi_get_instance(dip), PCI_DEVCTL_MINOR), 707 DDI_NT_NEXUS, 0)) != DDI_SUCCESS) { 708 PCIE_DBG("Failed to create devctl minor node for %s%d\n", 709 ddi_driver_name(dip), ddi_get_instance(dip)); 710 711 return (ret); 712 } 713 714 if ((ret = pcie_hp_init(dip, arg)) != DDI_SUCCESS) { 715 /* 716 * On some x86 platforms, we observed unexpected hotplug 717 * initialization failures in recent years. The known cause 718 * is a hardware issue: while the problem PCI bridges have 719 * the Hotplug Capable registers set, the machine actually 720 * does not implement the expected ACPI object. 721 * 722 * We don't want to stop PCI driver attach and system boot 723 * just because of this hotplug initialization failure. 724 * Continue with a debug message printed. 725 */ 726 PCIE_DBG("%s%d: Failed setting hotplug framework\n", 727 ddi_driver_name(dip), ddi_get_instance(dip)); 728 729 #if defined(__sparc) 730 ddi_remove_minor_node(dip, "devctl"); 731 732 return (ret); 733 #endif /* defined(__sparc) */ 734 } 735 736 return (DDI_SUCCESS); 737 } 738 739 /* ARGSUSED */ 740 int 741 pcie_uninit(dev_info_t *dip) 742 { 743 int ret = DDI_SUCCESS; 744 745 if (pcie_ari_is_enabled(dip) == PCIE_ARI_FORW_ENABLED) 746 (void) pcie_ari_disable(dip); 747 748 if ((ret = pcie_hp_uninit(dip)) != DDI_SUCCESS) { 749 PCIE_DBG("Failed to uninitialize hotplug for %s%d\n", 750 ddi_driver_name(dip), ddi_get_instance(dip)); 751 752 return (ret); 753 } 754 755 if (pcie_link_bw_supported(dip)) { 756 (void) pcie_link_bw_disable(dip); 757 } 758 759 ddi_remove_minor_node(dip, "devctl"); 760 761 return (ret); 762 } 763 764 /* 765 * PCIe module interface for enabling hotplug interrupt. 766 * 767 * It should be called after pcie_init() is done and bus driver's 768 * interrupt handlers have being attached. 769 */ 770 int 771 pcie_hpintr_enable(dev_info_t *dip) 772 { 773 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 774 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 775 776 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 777 (void) (ctrl_p->hc_ops.enable_hpc_intr)(ctrl_p); 778 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 779 (void) pcishpc_enable_irqs(ctrl_p); 780 } 781 return (DDI_SUCCESS); 782 } 783 784 /* 785 * PCIe module interface for disabling hotplug interrupt. 786 * 787 * It should be called before pcie_uninit() is called and bus driver's 788 * interrupt handlers is dettached. 789 */ 790 int 791 pcie_hpintr_disable(dev_info_t *dip) 792 { 793 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 794 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 795 796 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 797 (void) (ctrl_p->hc_ops.disable_hpc_intr)(ctrl_p); 798 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 799 (void) pcishpc_disable_irqs(ctrl_p); 800 } 801 return (DDI_SUCCESS); 802 } 803 804 /* ARGSUSED */ 805 int 806 pcie_intr(dev_info_t *dip) 807 { 808 int hp, lbw; 809 810 hp = pcie_hp_intr(dip); 811 lbw = pcie_link_bw_intr(dip); 812 813 if (hp == DDI_INTR_CLAIMED || lbw == DDI_INTR_CLAIMED) { 814 return (DDI_INTR_CLAIMED); 815 } 816 817 return (DDI_INTR_UNCLAIMED); 818 } 819 820 /* ARGSUSED */ 821 int 822 pcie_open(dev_info_t *dip, dev_t *devp, int flags, int otyp, cred_t *credp) 823 { 824 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 825 826 /* 827 * Make sure the open is for the right file type. 828 */ 829 if (otyp != OTYP_CHR) 830 return (EINVAL); 831 832 /* 833 * Handle the open by tracking the device state. 834 */ 835 if ((bus_p->bus_soft_state == PCI_SOFT_STATE_OPEN_EXCL) || 836 ((flags & FEXCL) && 837 (bus_p->bus_soft_state != PCI_SOFT_STATE_CLOSED))) { 838 return (EBUSY); 839 } 840 841 if (flags & FEXCL) 842 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN_EXCL; 843 else 844 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN; 845 846 return (0); 847 } 848 849 /* ARGSUSED */ 850 int 851 pcie_close(dev_info_t *dip, dev_t dev, int flags, int otyp, cred_t *credp) 852 { 853 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 854 855 if (otyp != OTYP_CHR) 856 return (EINVAL); 857 858 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 859 860 return (0); 861 } 862 863 /* ARGSUSED */ 864 int 865 pcie_ioctl(dev_info_t *dip, dev_t dev, int cmd, intptr_t arg, int mode, 866 cred_t *credp, int *rvalp) 867 { 868 struct devctl_iocdata *dcp; 869 uint_t bus_state; 870 int rv = DDI_SUCCESS; 871 872 /* 873 * We can use the generic implementation for devctl ioctl 874 */ 875 switch (cmd) { 876 case DEVCTL_DEVICE_GETSTATE: 877 case DEVCTL_DEVICE_ONLINE: 878 case DEVCTL_DEVICE_OFFLINE: 879 case DEVCTL_BUS_GETSTATE: 880 return (ndi_devctl_ioctl(dip, cmd, arg, mode, 0)); 881 default: 882 break; 883 } 884 885 /* 886 * read devctl ioctl data 887 */ 888 if (ndi_dc_allochdl((void *)arg, &dcp) != NDI_SUCCESS) 889 return (EFAULT); 890 891 switch (cmd) { 892 case DEVCTL_BUS_QUIESCE: 893 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 894 if (bus_state == BUS_QUIESCED) 895 break; 896 (void) ndi_set_bus_state(dip, BUS_QUIESCED); 897 break; 898 case DEVCTL_BUS_UNQUIESCE: 899 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 900 if (bus_state == BUS_ACTIVE) 901 break; 902 (void) ndi_set_bus_state(dip, BUS_ACTIVE); 903 break; 904 case DEVCTL_BUS_RESET: 905 case DEVCTL_BUS_RESETALL: 906 case DEVCTL_DEVICE_RESET: 907 rv = ENOTSUP; 908 break; 909 default: 910 rv = ENOTTY; 911 } 912 913 ndi_dc_freehdl(dcp); 914 return (rv); 915 } 916 917 /* ARGSUSED */ 918 int 919 pcie_prop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op, 920 int flags, char *name, caddr_t valuep, int *lengthp) 921 { 922 if (dev == DDI_DEV_T_ANY) 923 goto skip; 924 925 if (PCIE_IS_HOTPLUG_CAPABLE(dip) && 926 strcmp(name, "pci-occupant") == 0) { 927 int pci_dev = PCI_MINOR_NUM_TO_PCI_DEVNUM(getminor(dev)); 928 929 pcie_hp_create_occupant_props(dip, dev, pci_dev); 930 } 931 932 skip: 933 return (ddi_prop_op(dev, dip, prop_op, flags, name, valuep, lengthp)); 934 } 935 936 int 937 pcie_init_cfghdl(dev_info_t *cdip) 938 { 939 pcie_bus_t *bus_p; 940 ddi_acc_handle_t eh = NULL; 941 942 bus_p = PCIE_DIP2BUS(cdip); 943 if (bus_p == NULL) 944 return (DDI_FAILURE); 945 946 /* Create an config access special to error handling */ 947 if (pci_config_setup(cdip, &eh) != DDI_SUCCESS) { 948 cmn_err(CE_WARN, "Cannot setup config access" 949 " for BDF 0x%x\n", bus_p->bus_bdf); 950 return (DDI_FAILURE); 951 } 952 953 bus_p->bus_cfg_hdl = eh; 954 return (DDI_SUCCESS); 955 } 956 957 void 958 pcie_fini_cfghdl(dev_info_t *cdip) 959 { 960 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 961 962 pci_config_teardown(&bus_p->bus_cfg_hdl); 963 } 964 965 void 966 pcie_determine_serial(dev_info_t *dip) 967 { 968 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 969 ddi_acc_handle_t h; 970 uint16_t cap; 971 uchar_t serial[8]; 972 uint32_t low, high; 973 974 if (!PCIE_IS_PCIE(bus_p)) 975 return; 976 977 h = bus_p->bus_cfg_hdl; 978 979 if ((PCI_CAP_LOCATE(h, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_SER), &cap)) == 980 DDI_FAILURE) 981 return; 982 983 high = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_UPPER_DW); 984 low = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_LOWER_DW); 985 986 /* 987 * Here, we're trying to figure out if we had an invalid PCIe read. From 988 * looking at the contents of the value, it can be hard to tell the 989 * difference between a value that has all 1s correctly versus if we had 990 * an error. In this case, we only assume it's invalid if both register 991 * reads are invalid. We also only use 32-bit reads as we're not sure if 992 * all devices will support these as 64-bit reads, while we know that 993 * they'll support these as 32-bit reads. 994 */ 995 if (high == PCI_EINVAL32 && low == PCI_EINVAL32) 996 return; 997 998 serial[0] = low & 0xff; 999 serial[1] = (low >> 8) & 0xff; 1000 serial[2] = (low >> 16) & 0xff; 1001 serial[3] = (low >> 24) & 0xff; 1002 serial[4] = high & 0xff; 1003 serial[5] = (high >> 8) & 0xff; 1004 serial[6] = (high >> 16) & 0xff; 1005 serial[7] = (high >> 24) & 0xff; 1006 1007 (void) ndi_prop_update_byte_array(DDI_DEV_T_NONE, dip, "pcie-serial", 1008 serial, sizeof (serial)); 1009 } 1010 1011 static void 1012 pcie_determine_aspm(dev_info_t *dip) 1013 { 1014 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1015 uint32_t linkcap; 1016 uint16_t linkctl; 1017 1018 if (!PCIE_IS_PCIE(bus_p)) 1019 return; 1020 1021 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1022 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 1023 1024 switch (linkcap & PCIE_LINKCAP_ASPM_SUP_MASK) { 1025 case PCIE_LINKCAP_ASPM_SUP_L0S: 1026 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1027 "pcie-aspm-support", "l0s"); 1028 break; 1029 case PCIE_LINKCAP_ASPM_SUP_L1: 1030 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1031 "pcie-aspm-support", "l1"); 1032 break; 1033 case PCIE_LINKCAP_ASPM_SUP_L0S_L1: 1034 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1035 "pcie-aspm-support", "l0s,l1"); 1036 break; 1037 default: 1038 return; 1039 } 1040 1041 switch (linkctl & PCIE_LINKCTL_ASPM_CTL_MASK) { 1042 case PCIE_LINKCTL_ASPM_CTL_DIS: 1043 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1044 "pcie-aspm-state", "disabled"); 1045 break; 1046 case PCIE_LINKCTL_ASPM_CTL_L0S: 1047 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1048 "pcie-aspm-state", "l0s"); 1049 break; 1050 case PCIE_LINKCTL_ASPM_CTL_L1: 1051 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1052 "pcie-aspm-state", "l1"); 1053 break; 1054 case PCIE_LINKCTL_ASPM_CTL_L0S_L1: 1055 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1056 "pcie-aspm-state", "l0s,l1"); 1057 break; 1058 } 1059 } 1060 1061 /* 1062 * PCI-Express child device initialization. Note, this only will be called on a 1063 * device or function if we actually attach a device driver to it. 1064 * 1065 * This function enables generic pci-express interrupts and error handling. 1066 * Note, tagging, the max packet size, and related are all set up before this 1067 * point and is performed in pcie_fabric_setup(). 1068 * 1069 * @param pdip root dip (root nexus's dip) 1070 * @param cdip child's dip (device's dip) 1071 * @return DDI_SUCCESS or DDI_FAILURE 1072 */ 1073 /* ARGSUSED */ 1074 int 1075 pcie_initchild(dev_info_t *cdip) 1076 { 1077 uint16_t tmp16, reg16; 1078 pcie_bus_t *bus_p; 1079 uint32_t devid, venid; 1080 1081 bus_p = PCIE_DIP2BUS(cdip); 1082 if (bus_p == NULL) { 1083 PCIE_DBG("%s: BUS not found.\n", 1084 ddi_driver_name(cdip)); 1085 1086 return (DDI_FAILURE); 1087 } 1088 1089 if (pcie_init_cfghdl(cdip) != DDI_SUCCESS) 1090 return (DDI_FAILURE); 1091 1092 /* 1093 * Update pcie_bus_t with real Vendor Id Device Id. 1094 * 1095 * For assigned devices in IOV environment, the OBP will return 1096 * faked device id/vendor id on configration read and for both 1097 * properties in root domain. translate_devid() function will 1098 * update the properties with real device-id/vendor-id on such 1099 * platforms, so that we can utilize the properties here to get 1100 * real device-id/vendor-id and overwrite the faked ids. 1101 * 1102 * For unassigned devices or devices in non-IOV environment, the 1103 * operation below won't make a difference. 1104 * 1105 * The IOV implementation only supports assignment of PCIE 1106 * endpoint devices. Devices under pci-pci bridges don't need 1107 * operation like this. 1108 */ 1109 devid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1110 "device-id", -1); 1111 venid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1112 "vendor-id", -1); 1113 bus_p->bus_dev_ven_id = (devid << 16) | (venid & 0xffff); 1114 1115 /* Clear the device's status register */ 1116 reg16 = PCIE_GET(16, bus_p, PCI_CONF_STAT); 1117 PCIE_PUT(16, bus_p, PCI_CONF_STAT, reg16); 1118 1119 /* Setup the device's command register */ 1120 reg16 = PCIE_GET(16, bus_p, PCI_CONF_COMM); 1121 tmp16 = (reg16 & pcie_command_default_fw) | pcie_command_default; 1122 1123 #if defined(__x86) 1124 boolean_t empty_io_range = B_FALSE; 1125 boolean_t empty_mem_range = B_FALSE; 1126 /* 1127 * Check for empty IO and Mem ranges on bridges. If so disable IO/Mem 1128 * access as it can cause a hang if enabled. 1129 */ 1130 pcie_check_io_mem_range(bus_p->bus_cfg_hdl, &empty_io_range, 1131 &empty_mem_range); 1132 if ((empty_io_range == B_TRUE) && 1133 (pcie_command_default & PCI_COMM_IO)) { 1134 tmp16 &= ~PCI_COMM_IO; 1135 PCIE_DBG("No I/O range found for %s, bdf 0x%x\n", 1136 ddi_driver_name(cdip), bus_p->bus_bdf); 1137 } 1138 if ((empty_mem_range == B_TRUE) && 1139 (pcie_command_default & PCI_COMM_MAE)) { 1140 tmp16 &= ~PCI_COMM_MAE; 1141 PCIE_DBG("No Mem range found for %s, bdf 0x%x\n", 1142 ddi_driver_name(cdip), bus_p->bus_bdf); 1143 } 1144 #endif /* defined(__x86) */ 1145 1146 if (pcie_serr_disable_flag && PCIE_IS_PCIE(bus_p)) 1147 tmp16 &= ~PCI_COMM_SERR_ENABLE; 1148 1149 PCIE_PUT(16, bus_p, PCI_CONF_COMM, tmp16); 1150 PCIE_DBG_CFG(cdip, bus_p, "COMMAND", 16, PCI_CONF_COMM, reg16); 1151 1152 /* 1153 * If the device has a bus control register then program it 1154 * based on the settings in the command register. 1155 */ 1156 if (PCIE_IS_BDG(bus_p)) { 1157 /* Clear the device's secondary status register */ 1158 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_SEC_STATUS); 1159 PCIE_PUT(16, bus_p, PCI_BCNF_SEC_STATUS, reg16); 1160 1161 /* Setup the device's secondary command register */ 1162 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_BCNTRL); 1163 tmp16 = (reg16 & pcie_bdg_command_default_fw); 1164 1165 tmp16 |= PCI_BCNF_BCNTRL_SERR_ENABLE; 1166 /* 1167 * Workaround for this Nvidia bridge. Don't enable the SERR 1168 * enable bit in the bridge control register as it could lead to 1169 * bogus NMIs. 1170 */ 1171 if (bus_p->bus_dev_ven_id == 0x037010DE) 1172 tmp16 &= ~PCI_BCNF_BCNTRL_SERR_ENABLE; 1173 1174 if (pcie_command_default & PCI_COMM_PARITY_DETECT) 1175 tmp16 |= PCI_BCNF_BCNTRL_PARITY_ENABLE; 1176 1177 /* 1178 * Enable Master Abort Mode only if URs have not been masked. 1179 * For PCI and PCIe-PCI bridges, enabling this bit causes a 1180 * Master Aborts/UR to be forwarded as a UR/TA or SERR. If this 1181 * bit is masked, posted requests are dropped and non-posted 1182 * requests are returned with -1. 1183 */ 1184 if (pcie_aer_uce_mask & PCIE_AER_UCE_UR) 1185 tmp16 &= ~PCI_BCNF_BCNTRL_MAST_AB_MODE; 1186 else 1187 tmp16 |= PCI_BCNF_BCNTRL_MAST_AB_MODE; 1188 PCIE_PUT(16, bus_p, PCI_BCNF_BCNTRL, tmp16); 1189 PCIE_DBG_CFG(cdip, bus_p, "SEC CMD", 16, PCI_BCNF_BCNTRL, 1190 reg16); 1191 } 1192 1193 if (PCIE_IS_PCIE(bus_p)) { 1194 /* Setup PCIe device control register */ 1195 reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 1196 /* note: MPS/MRRS are initialized in pcie_initchild_mps() */ 1197 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 1198 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 1199 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 1200 PCIE_DEVCTL_MAX_PAYLOAD_MASK)); 1201 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 1202 PCIE_DBG_CAP(cdip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 1203 1204 /* Enable PCIe errors */ 1205 pcie_enable_errors(cdip); 1206 1207 pcie_determine_serial(cdip); 1208 1209 pcie_determine_aspm(cdip); 1210 1211 pcie_capture_speeds(cdip); 1212 } 1213 1214 bus_p->bus_ari = B_FALSE; 1215 if ((pcie_ari_is_enabled(ddi_get_parent(cdip)) 1216 == PCIE_ARI_FORW_ENABLED) && (pcie_ari_device(cdip) 1217 == PCIE_ARI_DEVICE)) { 1218 bus_p->bus_ari = B_TRUE; 1219 } 1220 1221 return (DDI_SUCCESS); 1222 } 1223 1224 static void 1225 pcie_init_pfd(dev_info_t *dip) 1226 { 1227 pf_data_t *pfd_p = PCIE_ZALLOC(pf_data_t); 1228 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1229 1230 PCIE_DIP2PFD(dip) = pfd_p; 1231 1232 pfd_p->pe_bus_p = bus_p; 1233 pfd_p->pe_severity_flags = 0; 1234 pfd_p->pe_severity_mask = 0; 1235 pfd_p->pe_orig_severity_flags = 0; 1236 pfd_p->pe_lock = B_FALSE; 1237 pfd_p->pe_valid = B_FALSE; 1238 1239 /* Allocate the root fault struct for both RC and RP */ 1240 if (PCIE_IS_ROOT(bus_p)) { 1241 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1242 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1243 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1244 } 1245 1246 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1247 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1248 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1249 1250 if (PCIE_IS_BDG(bus_p)) 1251 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1252 1253 if (PCIE_IS_PCIE(bus_p)) { 1254 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1255 1256 if (PCIE_IS_RP(bus_p)) 1257 PCIE_RP_REG(pfd_p) = 1258 PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1259 1260 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1261 PCIE_ADV_REG(pfd_p)->pcie_ue_tgt_bdf = PCIE_INVALID_BDF; 1262 1263 if (PCIE_IS_RP(bus_p)) { 1264 PCIE_ADV_RP_REG(pfd_p) = 1265 PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1266 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = 1267 PCIE_INVALID_BDF; 1268 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = 1269 PCIE_INVALID_BDF; 1270 } else if (PCIE_IS_PCIE_BDG(bus_p)) { 1271 PCIE_ADV_BDG_REG(pfd_p) = 1272 PCIE_ZALLOC(pf_pcie_adv_bdg_err_regs_t); 1273 PCIE_ADV_BDG_REG(pfd_p)->pcie_sue_tgt_bdf = 1274 PCIE_INVALID_BDF; 1275 } 1276 1277 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1278 PCIX_BDG_ERR_REG(pfd_p) = 1279 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1280 1281 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1282 PCIX_BDG_ECC_REG(pfd_p, 0) = 1283 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1284 PCIX_BDG_ECC_REG(pfd_p, 1) = 1285 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1286 } 1287 } 1288 1289 PCIE_SLOT_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_slot_regs_t); 1290 PCIE_SLOT_REG(pfd_p)->pcie_slot_regs_valid = B_FALSE; 1291 PCIE_SLOT_REG(pfd_p)->pcie_slot_cap = 0; 1292 PCIE_SLOT_REG(pfd_p)->pcie_slot_control = 0; 1293 PCIE_SLOT_REG(pfd_p)->pcie_slot_status = 0; 1294 1295 } else if (PCIE_IS_PCIX(bus_p)) { 1296 if (PCIE_IS_BDG(bus_p)) { 1297 PCIX_BDG_ERR_REG(pfd_p) = 1298 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1299 1300 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1301 PCIX_BDG_ECC_REG(pfd_p, 0) = 1302 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1303 PCIX_BDG_ECC_REG(pfd_p, 1) = 1304 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1305 } 1306 } else { 1307 PCIX_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcix_err_regs_t); 1308 1309 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1310 PCIX_ECC_REG(pfd_p) = 1311 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1312 } 1313 } 1314 } 1315 1316 static void 1317 pcie_fini_pfd(dev_info_t *dip) 1318 { 1319 pf_data_t *pfd_p = PCIE_DIP2PFD(dip); 1320 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1321 1322 if (PCIE_IS_PCIE(bus_p)) { 1323 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1324 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1325 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1326 sizeof (pf_pcix_ecc_regs_t)); 1327 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1328 sizeof (pf_pcix_ecc_regs_t)); 1329 } 1330 1331 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1332 sizeof (pf_pcix_bdg_err_regs_t)); 1333 } 1334 1335 if (PCIE_IS_RP(bus_p)) 1336 kmem_free(PCIE_ADV_RP_REG(pfd_p), 1337 sizeof (pf_pcie_adv_rp_err_regs_t)); 1338 else if (PCIE_IS_PCIE_BDG(bus_p)) 1339 kmem_free(PCIE_ADV_BDG_REG(pfd_p), 1340 sizeof (pf_pcie_adv_bdg_err_regs_t)); 1341 1342 kmem_free(PCIE_ADV_REG(pfd_p), 1343 sizeof (pf_pcie_adv_err_regs_t)); 1344 1345 if (PCIE_IS_RP(bus_p)) 1346 kmem_free(PCIE_RP_REG(pfd_p), 1347 sizeof (pf_pcie_rp_err_regs_t)); 1348 1349 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1350 } else if (PCIE_IS_PCIX(bus_p)) { 1351 if (PCIE_IS_BDG(bus_p)) { 1352 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1353 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1354 sizeof (pf_pcix_ecc_regs_t)); 1355 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1356 sizeof (pf_pcix_ecc_regs_t)); 1357 } 1358 1359 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1360 sizeof (pf_pcix_bdg_err_regs_t)); 1361 } else { 1362 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1363 kmem_free(PCIX_ECC_REG(pfd_p), 1364 sizeof (pf_pcix_ecc_regs_t)); 1365 1366 kmem_free(PCIX_ERR_REG(pfd_p), 1367 sizeof (pf_pcix_err_regs_t)); 1368 } 1369 } 1370 1371 if (PCIE_IS_BDG(bus_p)) 1372 kmem_free(PCI_BDG_ERR_REG(pfd_p), 1373 sizeof (pf_pci_bdg_err_regs_t)); 1374 1375 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1376 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1377 1378 if (PCIE_IS_ROOT(bus_p)) { 1379 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1380 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1381 } 1382 1383 kmem_free(PCIE_DIP2PFD(dip), sizeof (pf_data_t)); 1384 1385 PCIE_DIP2PFD(dip) = NULL; 1386 } 1387 1388 1389 /* 1390 * Special functions to allocate pf_data_t's for PCIe root complexes. 1391 * Note: Root Complex not Root Port 1392 */ 1393 void 1394 pcie_rc_init_pfd(dev_info_t *dip, pf_data_t *pfd_p) 1395 { 1396 pfd_p->pe_bus_p = PCIE_DIP2DOWNBUS(dip); 1397 pfd_p->pe_severity_flags = 0; 1398 pfd_p->pe_severity_mask = 0; 1399 pfd_p->pe_orig_severity_flags = 0; 1400 pfd_p->pe_lock = B_FALSE; 1401 pfd_p->pe_valid = B_FALSE; 1402 1403 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1404 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1405 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1406 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1407 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1408 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1409 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1410 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1411 PCIE_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1412 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1413 PCIE_ADV_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1414 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = PCIE_INVALID_BDF; 1415 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = PCIE_INVALID_BDF; 1416 1417 PCIE_ADV_REG(pfd_p)->pcie_ue_sev = pcie_aer_uce_severity; 1418 } 1419 1420 void 1421 pcie_rc_fini_pfd(pf_data_t *pfd_p) 1422 { 1423 kmem_free(PCIE_ADV_RP_REG(pfd_p), sizeof (pf_pcie_adv_rp_err_regs_t)); 1424 kmem_free(PCIE_ADV_REG(pfd_p), sizeof (pf_pcie_adv_err_regs_t)); 1425 kmem_free(PCIE_RP_REG(pfd_p), sizeof (pf_pcie_rp_err_regs_t)); 1426 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1427 kmem_free(PCI_BDG_ERR_REG(pfd_p), sizeof (pf_pci_bdg_err_regs_t)); 1428 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1429 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1430 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1431 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1432 } 1433 1434 /* 1435 * init pcie_bus_t for root complex 1436 * 1437 * Only a few of the fields in bus_t is valid for root complex. 1438 * The fields that are bracketed are initialized in this routine: 1439 * 1440 * dev_info_t * <bus_dip> 1441 * dev_info_t * bus_rp_dip 1442 * ddi_acc_handle_t bus_cfg_hdl 1443 * uint_t <bus_fm_flags> 1444 * pcie_req_id_t bus_bdf 1445 * pcie_req_id_t bus_rp_bdf 1446 * uint32_t bus_dev_ven_id 1447 * uint8_t bus_rev_id 1448 * uint8_t <bus_hdr_type> 1449 * uint16_t <bus_dev_type> 1450 * uint8_t bus_bdg_secbus 1451 * uint16_t bus_pcie_off 1452 * uint16_t <bus_aer_off> 1453 * uint16_t bus_pcix_off 1454 * uint16_t bus_ecc_ver 1455 * pci_bus_range_t bus_bus_range 1456 * ppb_ranges_t * bus_addr_ranges 1457 * int bus_addr_entries 1458 * pci_regspec_t * bus_assigned_addr 1459 * int bus_assigned_entries 1460 * pf_data_t * bus_pfd 1461 * pcie_domain_t * <bus_dom> 1462 * int bus_mps 1463 * uint64_t bus_cfgacc_base 1464 * void * bus_plat_private 1465 */ 1466 void 1467 pcie_rc_init_bus(dev_info_t *dip) 1468 { 1469 pcie_bus_t *bus_p; 1470 1471 bus_p = (pcie_bus_t *)kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1472 bus_p->bus_dip = dip; 1473 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_RC_PSEUDO; 1474 bus_p->bus_hdr_type = PCI_HEADER_ONE; 1475 1476 /* Fake that there are AER logs */ 1477 bus_p->bus_aer_off = (uint16_t)-1; 1478 1479 /* Needed only for handle lookup */ 1480 atomic_or_uint(&bus_p->bus_fm_flags, PF_FM_READY); 1481 1482 ndi_set_bus_private(dip, B_FALSE, DEVI_PORT_TYPE_PCI, bus_p); 1483 1484 PCIE_BUS2DOM(bus_p) = PCIE_ZALLOC(pcie_domain_t); 1485 } 1486 1487 void 1488 pcie_rc_fini_bus(dev_info_t *dip) 1489 { 1490 pcie_bus_t *bus_p = PCIE_DIP2DOWNBUS(dip); 1491 ndi_set_bus_private(dip, B_FALSE, 0, NULL); 1492 kmem_free(PCIE_BUS2DOM(bus_p), sizeof (pcie_domain_t)); 1493 kmem_free(bus_p, sizeof (pcie_bus_t)); 1494 } 1495 1496 static int 1497 pcie_width_to_int(pcie_link_width_t width) 1498 { 1499 switch (width) { 1500 case PCIE_LINK_WIDTH_X1: 1501 return (1); 1502 case PCIE_LINK_WIDTH_X2: 1503 return (2); 1504 case PCIE_LINK_WIDTH_X4: 1505 return (4); 1506 case PCIE_LINK_WIDTH_X8: 1507 return (8); 1508 case PCIE_LINK_WIDTH_X12: 1509 return (12); 1510 case PCIE_LINK_WIDTH_X16: 1511 return (16); 1512 case PCIE_LINK_WIDTH_X32: 1513 return (32); 1514 default: 1515 return (0); 1516 } 1517 } 1518 1519 /* 1520 * Return the speed in Transfers / second. This is a signed quantity to match 1521 * the ndi/ddi property interfaces. 1522 */ 1523 static int64_t 1524 pcie_speed_to_int(pcie_link_speed_t speed) 1525 { 1526 switch (speed) { 1527 case PCIE_LINK_SPEED_2_5: 1528 return (2500000000LL); 1529 case PCIE_LINK_SPEED_5: 1530 return (5000000000LL); 1531 case PCIE_LINK_SPEED_8: 1532 return (8000000000LL); 1533 case PCIE_LINK_SPEED_16: 1534 return (16000000000LL); 1535 case PCIE_LINK_SPEED_32: 1536 return (32000000000LL); 1537 case PCIE_LINK_SPEED_64: 1538 return (64000000000LL); 1539 default: 1540 return (0); 1541 } 1542 } 1543 1544 /* 1545 * Translate the recorded speed information into devinfo properties. 1546 */ 1547 static void 1548 pcie_speeds_to_devinfo(dev_info_t *dip, pcie_bus_t *bus_p) 1549 { 1550 if (bus_p->bus_max_width != PCIE_LINK_WIDTH_UNKNOWN) { 1551 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1552 "pcie-link-maximum-width", 1553 pcie_width_to_int(bus_p->bus_max_width)); 1554 } 1555 1556 if (bus_p->bus_cur_width != PCIE_LINK_WIDTH_UNKNOWN) { 1557 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1558 "pcie-link-current-width", 1559 pcie_width_to_int(bus_p->bus_cur_width)); 1560 } 1561 1562 if (bus_p->bus_cur_speed != PCIE_LINK_SPEED_UNKNOWN) { 1563 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1564 "pcie-link-current-speed", 1565 pcie_speed_to_int(bus_p->bus_cur_speed)); 1566 } 1567 1568 if (bus_p->bus_max_speed != PCIE_LINK_SPEED_UNKNOWN) { 1569 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1570 "pcie-link-maximum-speed", 1571 pcie_speed_to_int(bus_p->bus_max_speed)); 1572 } 1573 1574 if (bus_p->bus_target_speed != PCIE_LINK_SPEED_UNKNOWN) { 1575 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1576 "pcie-link-target-speed", 1577 pcie_speed_to_int(bus_p->bus_target_speed)); 1578 } 1579 1580 if ((bus_p->bus_speed_flags & PCIE_LINK_F_ADMIN_TARGET) != 0) { 1581 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 1582 "pcie-link-admin-target-speed"); 1583 } 1584 1585 if (bus_p->bus_sup_speed != PCIE_LINK_SPEED_UNKNOWN) { 1586 int64_t speeds[PCIE_NSPEEDS]; 1587 uint_t nspeeds = 0; 1588 1589 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_2_5) { 1590 speeds[nspeeds++] = 1591 pcie_speed_to_int(PCIE_LINK_SPEED_2_5); 1592 } 1593 1594 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_5) { 1595 speeds[nspeeds++] = 1596 pcie_speed_to_int(PCIE_LINK_SPEED_5); 1597 } 1598 1599 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_8) { 1600 speeds[nspeeds++] = 1601 pcie_speed_to_int(PCIE_LINK_SPEED_8); 1602 } 1603 1604 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_16) { 1605 speeds[nspeeds++] = 1606 pcie_speed_to_int(PCIE_LINK_SPEED_16); 1607 } 1608 1609 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_32) { 1610 speeds[nspeeds++] = 1611 pcie_speed_to_int(PCIE_LINK_SPEED_32); 1612 } 1613 1614 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_64) { 1615 speeds[nspeeds++] = 1616 pcie_speed_to_int(PCIE_LINK_SPEED_64); 1617 } 1618 1619 (void) ndi_prop_update_int64_array(DDI_DEV_T_NONE, dip, 1620 "pcie-link-supported-speeds", speeds, nspeeds); 1621 } 1622 } 1623 1624 /* 1625 * We need to capture the supported, maximum, and current device speed and 1626 * width. The way that this has been done has changed over time. 1627 * 1628 * Prior to PCIe Gen 3, there were only current and supported speed fields. 1629 * These were found in the link status and link capabilities registers of the 1630 * PCI express capability. With the change to PCIe Gen 3, the information in the 1631 * link capabilities changed to the maximum value. The supported speeds vector 1632 * was moved to the link capabilities 2 register. 1633 * 1634 * Now, a device may not implement some of these registers. To determine whether 1635 * or not it's here, we have to do the following. First, we need to check the 1636 * revision of the PCI express capability. The link capabilities 2 register did 1637 * not exist prior to version 2 of this capability. If a modern device does not 1638 * implement it, it is supposed to return zero for the register. 1639 */ 1640 static void 1641 pcie_capture_speeds(dev_info_t *dip) 1642 { 1643 uint16_t vers, status; 1644 uint32_t cap, cap2, ctl2; 1645 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1646 dev_info_t *rcdip; 1647 1648 if (!PCIE_IS_PCIE(bus_p)) 1649 return; 1650 1651 rcdip = pcie_get_rc_dip(dip); 1652 if (bus_p->bus_cfg_hdl == NULL) { 1653 vers = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1654 bus_p->bus_pcie_off + PCIE_PCIECAP); 1655 } else { 1656 vers = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 1657 } 1658 if (vers == PCI_EINVAL16) 1659 return; 1660 vers &= PCIE_PCIECAP_VER_MASK; 1661 1662 /* 1663 * Verify the capability's version. 1664 */ 1665 switch (vers) { 1666 case PCIE_PCIECAP_VER_1_0: 1667 cap2 = 0; 1668 ctl2 = 0; 1669 break; 1670 case PCIE_PCIECAP_VER_2_0: 1671 if (bus_p->bus_cfg_hdl == NULL) { 1672 cap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1673 bus_p->bus_pcie_off + PCIE_LINKCAP2); 1674 ctl2 = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1675 bus_p->bus_pcie_off + PCIE_LINKCTL2); 1676 } else { 1677 cap2 = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP2); 1678 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 1679 } 1680 if (cap2 == PCI_EINVAL32) 1681 cap2 = 0; 1682 if (ctl2 == PCI_EINVAL16) 1683 ctl2 = 0; 1684 break; 1685 default: 1686 /* Don't try and handle an unknown version */ 1687 return; 1688 } 1689 1690 if (bus_p->bus_cfg_hdl == NULL) { 1691 status = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1692 bus_p->bus_pcie_off + PCIE_LINKSTS); 1693 cap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1694 bus_p->bus_pcie_off + PCIE_LINKCAP); 1695 } else { 1696 status = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 1697 cap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1698 } 1699 if (status == PCI_EINVAL16 || cap == PCI_EINVAL32) 1700 return; 1701 1702 mutex_enter(&bus_p->bus_speed_mutex); 1703 1704 switch (status & PCIE_LINKSTS_SPEED_MASK) { 1705 case PCIE_LINKSTS_SPEED_2_5: 1706 bus_p->bus_cur_speed = PCIE_LINK_SPEED_2_5; 1707 break; 1708 case PCIE_LINKSTS_SPEED_5: 1709 bus_p->bus_cur_speed = PCIE_LINK_SPEED_5; 1710 break; 1711 case PCIE_LINKSTS_SPEED_8: 1712 bus_p->bus_cur_speed = PCIE_LINK_SPEED_8; 1713 break; 1714 case PCIE_LINKSTS_SPEED_16: 1715 bus_p->bus_cur_speed = PCIE_LINK_SPEED_16; 1716 break; 1717 case PCIE_LINKSTS_SPEED_32: 1718 bus_p->bus_cur_speed = PCIE_LINK_SPEED_32; 1719 break; 1720 case PCIE_LINKSTS_SPEED_64: 1721 bus_p->bus_cur_speed = PCIE_LINK_SPEED_64; 1722 break; 1723 default: 1724 bus_p->bus_cur_speed = PCIE_LINK_SPEED_UNKNOWN; 1725 break; 1726 } 1727 1728 switch (status & PCIE_LINKSTS_NEG_WIDTH_MASK) { 1729 case PCIE_LINKSTS_NEG_WIDTH_X1: 1730 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X1; 1731 break; 1732 case PCIE_LINKSTS_NEG_WIDTH_X2: 1733 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X2; 1734 break; 1735 case PCIE_LINKSTS_NEG_WIDTH_X4: 1736 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X4; 1737 break; 1738 case PCIE_LINKSTS_NEG_WIDTH_X8: 1739 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X8; 1740 break; 1741 case PCIE_LINKSTS_NEG_WIDTH_X12: 1742 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X12; 1743 break; 1744 case PCIE_LINKSTS_NEG_WIDTH_X16: 1745 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X16; 1746 break; 1747 case PCIE_LINKSTS_NEG_WIDTH_X32: 1748 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X32; 1749 break; 1750 default: 1751 bus_p->bus_cur_width = PCIE_LINK_WIDTH_UNKNOWN; 1752 break; 1753 } 1754 1755 switch (cap & PCIE_LINKCAP_MAX_WIDTH_MASK) { 1756 case PCIE_LINKCAP_MAX_WIDTH_X1: 1757 bus_p->bus_max_width = PCIE_LINK_WIDTH_X1; 1758 break; 1759 case PCIE_LINKCAP_MAX_WIDTH_X2: 1760 bus_p->bus_max_width = PCIE_LINK_WIDTH_X2; 1761 break; 1762 case PCIE_LINKCAP_MAX_WIDTH_X4: 1763 bus_p->bus_max_width = PCIE_LINK_WIDTH_X4; 1764 break; 1765 case PCIE_LINKCAP_MAX_WIDTH_X8: 1766 bus_p->bus_max_width = PCIE_LINK_WIDTH_X8; 1767 break; 1768 case PCIE_LINKCAP_MAX_WIDTH_X12: 1769 bus_p->bus_max_width = PCIE_LINK_WIDTH_X12; 1770 break; 1771 case PCIE_LINKCAP_MAX_WIDTH_X16: 1772 bus_p->bus_max_width = PCIE_LINK_WIDTH_X16; 1773 break; 1774 case PCIE_LINKCAP_MAX_WIDTH_X32: 1775 bus_p->bus_max_width = PCIE_LINK_WIDTH_X32; 1776 break; 1777 default: 1778 bus_p->bus_max_width = PCIE_LINK_WIDTH_UNKNOWN; 1779 break; 1780 } 1781 1782 /* 1783 * If we have the Link Capabilities 2, then we can get the supported 1784 * speeds from it and treat the bits in Link Capabilities 1 as the 1785 * maximum. If we don't, then we need to follow the Implementation Note 1786 * in the standard under Link Capabilities 2. Effectively, this means 1787 * that if the value of 10b is set in Link Capabilities register, that 1788 * it supports both 2.5 and 5 GT/s speeds. 1789 */ 1790 if (cap2 != 0) { 1791 if (cap2 & PCIE_LINKCAP2_SPEED_2_5) 1792 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_2_5; 1793 if (cap2 & PCIE_LINKCAP2_SPEED_5) 1794 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_5; 1795 if (cap2 & PCIE_LINKCAP2_SPEED_8) 1796 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_8; 1797 if (cap2 & PCIE_LINKCAP2_SPEED_16) 1798 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_16; 1799 if (cap2 & PCIE_LINKCAP2_SPEED_32) 1800 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_32; 1801 if (cap2 & PCIE_LINKCAP2_SPEED_64) 1802 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_64; 1803 1804 switch (cap & PCIE_LINKCAP_MAX_SPEED_MASK) { 1805 case PCIE_LINKCAP_MAX_SPEED_2_5: 1806 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1807 break; 1808 case PCIE_LINKCAP_MAX_SPEED_5: 1809 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1810 break; 1811 case PCIE_LINKCAP_MAX_SPEED_8: 1812 bus_p->bus_max_speed = PCIE_LINK_SPEED_8; 1813 break; 1814 case PCIE_LINKCAP_MAX_SPEED_16: 1815 bus_p->bus_max_speed = PCIE_LINK_SPEED_16; 1816 break; 1817 case PCIE_LINKCAP_MAX_SPEED_32: 1818 bus_p->bus_max_speed = PCIE_LINK_SPEED_32; 1819 break; 1820 case PCIE_LINKCAP_MAX_SPEED_64: 1821 bus_p->bus_max_speed = PCIE_LINK_SPEED_64; 1822 break; 1823 default: 1824 bus_p->bus_max_speed = PCIE_LINK_SPEED_UNKNOWN; 1825 break; 1826 } 1827 } else { 1828 if (cap & PCIE_LINKCAP_MAX_SPEED_5) { 1829 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1830 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5 | 1831 PCIE_LINK_SPEED_5; 1832 } else if (cap & PCIE_LINKCAP_MAX_SPEED_2_5) { 1833 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1834 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5; 1835 } 1836 } 1837 1838 switch (ctl2 & PCIE_LINKCTL2_TARGET_SPEED_MASK) { 1839 case PCIE_LINKCTL2_TARGET_SPEED_2_5: 1840 bus_p->bus_target_speed = PCIE_LINK_SPEED_2_5; 1841 break; 1842 case PCIE_LINKCTL2_TARGET_SPEED_5: 1843 bus_p->bus_target_speed = PCIE_LINK_SPEED_5; 1844 break; 1845 case PCIE_LINKCTL2_TARGET_SPEED_8: 1846 bus_p->bus_target_speed = PCIE_LINK_SPEED_8; 1847 break; 1848 case PCIE_LINKCTL2_TARGET_SPEED_16: 1849 bus_p->bus_target_speed = PCIE_LINK_SPEED_16; 1850 break; 1851 case PCIE_LINKCTL2_TARGET_SPEED_32: 1852 bus_p->bus_target_speed = PCIE_LINK_SPEED_32; 1853 break; 1854 case PCIE_LINKCTL2_TARGET_SPEED_64: 1855 bus_p->bus_target_speed = PCIE_LINK_SPEED_64; 1856 break; 1857 default: 1858 bus_p->bus_target_speed = PCIE_LINK_SPEED_UNKNOWN; 1859 break; 1860 } 1861 1862 pcie_speeds_to_devinfo(dip, bus_p); 1863 mutex_exit(&bus_p->bus_speed_mutex); 1864 } 1865 1866 /* 1867 * partially init pcie_bus_t for device (dip,bdf) for accessing pci 1868 * config space 1869 * 1870 * This routine is invoked during boot, either after creating a devinfo node 1871 * (x86 case) or during px driver attach (sparc case); it is also invoked 1872 * in hotplug context after a devinfo node is created. 1873 * 1874 * The fields that are bracketed are initialized if flag PCIE_BUS_INITIAL 1875 * is set: 1876 * 1877 * dev_info_t * <bus_dip> 1878 * dev_info_t * <bus_rp_dip> 1879 * ddi_acc_handle_t bus_cfg_hdl 1880 * uint_t bus_fm_flags 1881 * pcie_req_id_t <bus_bdf> 1882 * pcie_req_id_t <bus_rp_bdf> 1883 * uint32_t <bus_dev_ven_id> 1884 * uint8_t <bus_rev_id> 1885 * uint8_t <bus_hdr_type> 1886 * uint16_t <bus_dev_type> 1887 * uint8_t <bus_bdg_secbus 1888 * uint16_t <bus_pcie_off> 1889 * uint16_t <bus_aer_off> 1890 * uint16_t <bus_pcix_off> 1891 * uint16_t <bus_ecc_ver> 1892 * pci_bus_range_t bus_bus_range 1893 * ppb_ranges_t * bus_addr_ranges 1894 * int bus_addr_entries 1895 * pci_regspec_t * bus_assigned_addr 1896 * int bus_assigned_entries 1897 * pf_data_t * bus_pfd 1898 * pcie_domain_t * bus_dom 1899 * int bus_mps 1900 * uint64_t bus_cfgacc_base 1901 * void * bus_plat_private 1902 * 1903 * The fields that are bracketed are initialized if flag PCIE_BUS_FINAL 1904 * is set: 1905 * 1906 * dev_info_t * bus_dip 1907 * dev_info_t * bus_rp_dip 1908 * ddi_acc_handle_t bus_cfg_hdl 1909 * uint_t bus_fm_flags 1910 * pcie_req_id_t bus_bdf 1911 * pcie_req_id_t bus_rp_bdf 1912 * uint32_t bus_dev_ven_id 1913 * uint8_t bus_rev_id 1914 * uint8_t bus_hdr_type 1915 * uint16_t bus_dev_type 1916 * uint8_t <bus_bdg_secbus> 1917 * uint16_t bus_pcie_off 1918 * uint16_t bus_aer_off 1919 * uint16_t bus_pcix_off 1920 * uint16_t bus_ecc_ver 1921 * pci_bus_range_t <bus_bus_range> 1922 * ppb_ranges_t * <bus_addr_ranges> 1923 * int <bus_addr_entries> 1924 * pci_regspec_t * <bus_assigned_addr> 1925 * int <bus_assigned_entries> 1926 * pf_data_t * <bus_pfd> 1927 * pcie_domain_t * bus_dom 1928 * int bus_mps 1929 * uint64_t bus_cfgacc_base 1930 * void * <bus_plat_private> 1931 */ 1932 1933 pcie_bus_t * 1934 pcie_init_bus(dev_info_t *dip, pcie_req_id_t bdf, uint8_t flags) 1935 { 1936 uint16_t status, base, baseptr, num_cap; 1937 uint32_t capid; 1938 int range_size; 1939 pcie_bus_t *bus_p = NULL; 1940 dev_info_t *rcdip; 1941 dev_info_t *pdip; 1942 const char *errstr = NULL; 1943 1944 if (!(flags & PCIE_BUS_INITIAL)) 1945 goto initial_done; 1946 1947 bus_p = kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1948 1949 bus_p->bus_dip = dip; 1950 bus_p->bus_bdf = bdf; 1951 1952 rcdip = pcie_get_rc_dip(dip); 1953 ASSERT(rcdip != NULL); 1954 1955 /* Save the Vendor ID, Device ID and revision ID */ 1956 bus_p->bus_dev_ven_id = pci_cfgacc_get32(rcdip, bdf, PCI_CONF_VENID); 1957 bus_p->bus_rev_id = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_REVID); 1958 /* Save the Header Type */ 1959 bus_p->bus_hdr_type = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_HEADER); 1960 bus_p->bus_hdr_type &= PCI_HEADER_TYPE_M; 1961 1962 /* 1963 * Figure out the device type and all the relavant capability offsets 1964 */ 1965 /* set default value */ 1966 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_PCI_PSEUDO; 1967 1968 status = pci_cfgacc_get16(rcdip, bdf, PCI_CONF_STAT); 1969 if (status == PCI_CAP_EINVAL16 || !(status & PCI_STAT_CAP)) 1970 goto caps_done; /* capability not supported */ 1971 1972 /* Relevant conventional capabilities first */ 1973 1974 /* Conventional caps: PCI_CAP_ID_PCI_E, PCI_CAP_ID_PCIX */ 1975 num_cap = 2; 1976 1977 switch (bus_p->bus_hdr_type) { 1978 case PCI_HEADER_ZERO: 1979 baseptr = PCI_CONF_CAP_PTR; 1980 break; 1981 case PCI_HEADER_PPB: 1982 baseptr = PCI_BCNF_CAP_PTR; 1983 break; 1984 case PCI_HEADER_CARDBUS: 1985 baseptr = PCI_CBUS_CAP_PTR; 1986 break; 1987 default: 1988 cmn_err(CE_WARN, "%s: unexpected pci header type:%x", 1989 __func__, bus_p->bus_hdr_type); 1990 goto caps_done; 1991 } 1992 1993 base = baseptr; 1994 for (base = pci_cfgacc_get8(rcdip, bdf, base); base && num_cap; 1995 base = pci_cfgacc_get8(rcdip, bdf, base + PCI_CAP_NEXT_PTR)) { 1996 capid = pci_cfgacc_get8(rcdip, bdf, base); 1997 uint16_t pcap; 1998 1999 switch (capid) { 2000 case PCI_CAP_ID_PCI_E: 2001 bus_p->bus_pcie_off = base; 2002 pcap = pci_cfgacc_get16(rcdip, bdf, base + 2003 PCIE_PCIECAP); 2004 bus_p->bus_dev_type = pcap & PCIE_PCIECAP_DEV_TYPE_MASK; 2005 bus_p->bus_pcie_vers = pcap & PCIE_PCIECAP_VER_MASK; 2006 2007 /* Check and save PCIe hotplug capability information */ 2008 if ((PCIE_IS_RP(bus_p) || PCIE_IS_SWD(bus_p)) && 2009 (pci_cfgacc_get16(rcdip, bdf, base + PCIE_PCIECAP) 2010 & PCIE_PCIECAP_SLOT_IMPL) && 2011 (pci_cfgacc_get32(rcdip, bdf, base + PCIE_SLOTCAP) 2012 & PCIE_SLOTCAP_HP_CAPABLE)) 2013 bus_p->bus_hp_sup_modes |= PCIE_NATIVE_HP_MODE; 2014 2015 num_cap--; 2016 break; 2017 case PCI_CAP_ID_PCIX: 2018 bus_p->bus_pcix_off = base; 2019 if (PCIE_IS_BDG(bus_p)) 2020 bus_p->bus_ecc_ver = 2021 pci_cfgacc_get16(rcdip, bdf, base + 2022 PCI_PCIX_SEC_STATUS) & PCI_PCIX_VER_MASK; 2023 else 2024 bus_p->bus_ecc_ver = 2025 pci_cfgacc_get16(rcdip, bdf, base + 2026 PCI_PCIX_COMMAND) & PCI_PCIX_VER_MASK; 2027 num_cap--; 2028 break; 2029 default: 2030 break; 2031 } 2032 } 2033 2034 /* Check and save PCI hotplug (SHPC) capability information */ 2035 if (PCIE_IS_BDG(bus_p)) { 2036 base = baseptr; 2037 for (base = pci_cfgacc_get8(rcdip, bdf, base); 2038 base; base = pci_cfgacc_get8(rcdip, bdf, 2039 base + PCI_CAP_NEXT_PTR)) { 2040 capid = pci_cfgacc_get8(rcdip, bdf, base); 2041 if (capid == PCI_CAP_ID_PCI_HOTPLUG) { 2042 bus_p->bus_pci_hp_off = base; 2043 bus_p->bus_hp_sup_modes |= PCIE_PCI_HP_MODE; 2044 break; 2045 } 2046 } 2047 } 2048 2049 /* Then, relevant extended capabilities */ 2050 2051 if (!PCIE_IS_PCIE(bus_p)) 2052 goto caps_done; 2053 2054 /* Extended caps: PCIE_EXT_CAP_ID_AER */ 2055 for (base = PCIE_EXT_CAP; base; base = (capid >> 2056 PCIE_EXT_CAP_NEXT_PTR_SHIFT) & PCIE_EXT_CAP_NEXT_PTR_MASK) { 2057 capid = pci_cfgacc_get32(rcdip, bdf, base); 2058 if (capid == PCI_CAP_EINVAL32) 2059 break; 2060 switch ((capid >> PCIE_EXT_CAP_ID_SHIFT) & 2061 PCIE_EXT_CAP_ID_MASK) { 2062 case PCIE_EXT_CAP_ID_AER: 2063 bus_p->bus_aer_off = base; 2064 break; 2065 case PCIE_EXT_CAP_ID_DEV3: 2066 bus_p->bus_dev3_off = base; 2067 break; 2068 } 2069 } 2070 2071 caps_done: 2072 /* save RP dip and RP bdf */ 2073 if (PCIE_IS_RP(bus_p)) { 2074 bus_p->bus_rp_dip = dip; 2075 bus_p->bus_rp_bdf = bus_p->bus_bdf; 2076 2077 bus_p->bus_fab = PCIE_ZALLOC(pcie_fabric_data_t); 2078 } else { 2079 for (pdip = ddi_get_parent(dip); pdip; 2080 pdip = ddi_get_parent(pdip)) { 2081 pcie_bus_t *parent_bus_p = PCIE_DIP2BUS(pdip); 2082 2083 /* 2084 * If RP dip and RP bdf in parent's bus_t have 2085 * been initialized, simply use these instead of 2086 * continuing up to the RC. 2087 */ 2088 if (parent_bus_p->bus_rp_dip != NULL) { 2089 bus_p->bus_rp_dip = parent_bus_p->bus_rp_dip; 2090 bus_p->bus_rp_bdf = parent_bus_p->bus_rp_bdf; 2091 break; 2092 } 2093 2094 /* 2095 * When debugging be aware that some NVIDIA x86 2096 * architectures have 2 nodes for each RP, One at Bus 2097 * 0x0 and one at Bus 0x80. The requester is from Bus 2098 * 0x80 2099 */ 2100 if (PCIE_IS_ROOT(parent_bus_p)) { 2101 bus_p->bus_rp_dip = pdip; 2102 bus_p->bus_rp_bdf = parent_bus_p->bus_bdf; 2103 break; 2104 } 2105 } 2106 } 2107 2108 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 2109 (void) atomic_swap_uint(&bus_p->bus_fm_flags, 0); 2110 2111 ndi_set_bus_private(dip, B_TRUE, DEVI_PORT_TYPE_PCI, (void *)bus_p); 2112 2113 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) 2114 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 2115 "hotplug-capable"); 2116 2117 initial_done: 2118 if (!(flags & PCIE_BUS_FINAL)) 2119 goto final_done; 2120 2121 /* already initialized? */ 2122 bus_p = PCIE_DIP2BUS(dip); 2123 2124 /* Save the Range information if device is a switch/bridge */ 2125 if (PCIE_IS_BDG(bus_p)) { 2126 /* get "bus_range" property */ 2127 range_size = sizeof (pci_bus_range_t); 2128 if (ddi_getlongprop_buf(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2129 "bus-range", (caddr_t)&bus_p->bus_bus_range, &range_size) 2130 != DDI_PROP_SUCCESS) { 2131 errstr = "Cannot find \"bus-range\" property"; 2132 cmn_err(CE_WARN, 2133 "PCIE init err info failed BDF 0x%x:%s\n", 2134 bus_p->bus_bdf, errstr); 2135 } 2136 2137 /* get secondary bus number */ 2138 rcdip = pcie_get_rc_dip(dip); 2139 ASSERT(rcdip != NULL); 2140 2141 bus_p->bus_bdg_secbus = pci_cfgacc_get8(rcdip, 2142 bus_p->bus_bdf, PCI_BCNF_SECBUS); 2143 2144 /* Get "ranges" property */ 2145 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2146 "ranges", (caddr_t)&bus_p->bus_addr_ranges, 2147 &bus_p->bus_addr_entries) != DDI_PROP_SUCCESS) 2148 bus_p->bus_addr_entries = 0; 2149 bus_p->bus_addr_entries /= sizeof (ppb_ranges_t); 2150 } 2151 2152 /* save "assigned-addresses" property array, ignore failues */ 2153 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2154 "assigned-addresses", (caddr_t)&bus_p->bus_assigned_addr, 2155 &bus_p->bus_assigned_entries) == DDI_PROP_SUCCESS) 2156 bus_p->bus_assigned_entries /= sizeof (pci_regspec_t); 2157 else 2158 bus_p->bus_assigned_entries = 0; 2159 2160 pcie_init_pfd(dip); 2161 2162 pcie_init_plat(dip); 2163 2164 pcie_capture_speeds(dip); 2165 2166 final_done: 2167 2168 PCIE_DBG("Add %s(dip 0x%p, bdf 0x%x, secbus 0x%x)\n", 2169 ddi_driver_name(dip), (void *)dip, bus_p->bus_bdf, 2170 bus_p->bus_bdg_secbus); 2171 #ifdef DEBUG 2172 if (bus_p != NULL) { 2173 pcie_print_bus(bus_p); 2174 } 2175 #endif 2176 2177 return (bus_p); 2178 } 2179 2180 /* 2181 * Invoked before destroying devinfo node, mostly during hotplug 2182 * operation to free pcie_bus_t data structure 2183 */ 2184 /* ARGSUSED */ 2185 void 2186 pcie_fini_bus(dev_info_t *dip, uint8_t flags) 2187 { 2188 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2189 ASSERT(bus_p); 2190 2191 if (flags & PCIE_BUS_INITIAL) { 2192 pcie_fini_plat(dip); 2193 pcie_fini_pfd(dip); 2194 2195 if (PCIE_IS_RP(bus_p)) { 2196 kmem_free(bus_p->bus_fab, sizeof (pcie_fabric_data_t)); 2197 bus_p->bus_fab = NULL; 2198 } 2199 2200 kmem_free(bus_p->bus_assigned_addr, 2201 (sizeof (pci_regspec_t) * bus_p->bus_assigned_entries)); 2202 kmem_free(bus_p->bus_addr_ranges, 2203 (sizeof (ppb_ranges_t) * bus_p->bus_addr_entries)); 2204 /* zero out the fields that have been destroyed */ 2205 bus_p->bus_assigned_addr = NULL; 2206 bus_p->bus_addr_ranges = NULL; 2207 bus_p->bus_assigned_entries = 0; 2208 bus_p->bus_addr_entries = 0; 2209 } 2210 2211 if (flags & PCIE_BUS_FINAL) { 2212 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) { 2213 (void) ndi_prop_remove(DDI_DEV_T_NONE, dip, 2214 "hotplug-capable"); 2215 } 2216 2217 ndi_set_bus_private(dip, B_TRUE, 0, NULL); 2218 kmem_free(bus_p, sizeof (pcie_bus_t)); 2219 } 2220 } 2221 2222 int 2223 pcie_postattach_child(dev_info_t *cdip) 2224 { 2225 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 2226 2227 if (!bus_p) 2228 return (DDI_FAILURE); 2229 2230 return (pcie_enable_ce(cdip)); 2231 } 2232 2233 /* 2234 * PCI-Express child device de-initialization. 2235 * This function disables generic pci-express interrupts and error 2236 * handling. 2237 */ 2238 void 2239 pcie_uninitchild(dev_info_t *cdip) 2240 { 2241 pcie_disable_errors(cdip); 2242 pcie_fini_cfghdl(cdip); 2243 pcie_fini_dom(cdip); 2244 } 2245 2246 /* 2247 * find the root complex dip 2248 */ 2249 dev_info_t * 2250 pcie_get_rc_dip(dev_info_t *dip) 2251 { 2252 dev_info_t *rcdip; 2253 pcie_bus_t *rc_bus_p; 2254 2255 for (rcdip = ddi_get_parent(dip); rcdip; 2256 rcdip = ddi_get_parent(rcdip)) { 2257 rc_bus_p = PCIE_DIP2BUS(rcdip); 2258 if (rc_bus_p && PCIE_IS_RC(rc_bus_p)) 2259 break; 2260 } 2261 2262 return (rcdip); 2263 } 2264 2265 boolean_t 2266 pcie_is_pci_device(dev_info_t *dip) 2267 { 2268 dev_info_t *pdip; 2269 char *device_type; 2270 2271 pdip = ddi_get_parent(dip); 2272 if (pdip == NULL) 2273 return (B_FALSE); 2274 2275 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, DDI_PROP_DONTPASS, 2276 "device_type", &device_type) != DDI_PROP_SUCCESS) 2277 return (B_FALSE); 2278 2279 if (strcmp(device_type, "pciex") != 0 && 2280 strcmp(device_type, "pci") != 0) { 2281 ddi_prop_free(device_type); 2282 return (B_FALSE); 2283 } 2284 2285 ddi_prop_free(device_type); 2286 return (B_TRUE); 2287 } 2288 2289 typedef struct { 2290 boolean_t init; 2291 uint8_t flags; 2292 } pcie_bus_arg_t; 2293 2294 /*ARGSUSED*/ 2295 static int 2296 pcie_fab_do_init_fini(dev_info_t *dip, void *arg) 2297 { 2298 pcie_req_id_t bdf; 2299 pcie_bus_arg_t *bus_arg = (pcie_bus_arg_t *)arg; 2300 2301 if (!pcie_is_pci_device(dip)) 2302 goto out; 2303 2304 if (bus_arg->init) { 2305 if (pcie_get_bdf_from_dip(dip, &bdf) != DDI_SUCCESS) 2306 goto out; 2307 2308 (void) pcie_init_bus(dip, bdf, bus_arg->flags); 2309 } else { 2310 (void) pcie_fini_bus(dip, bus_arg->flags); 2311 } 2312 2313 return (DDI_WALK_CONTINUE); 2314 2315 out: 2316 return (DDI_WALK_PRUNECHILD); 2317 } 2318 2319 void 2320 pcie_fab_init_bus(dev_info_t *rcdip, uint8_t flags) 2321 { 2322 dev_info_t *dip = ddi_get_child(rcdip); 2323 pcie_bus_arg_t arg; 2324 2325 arg.init = B_TRUE; 2326 arg.flags = flags; 2327 2328 ndi_devi_enter(rcdip); 2329 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2330 ndi_devi_exit(rcdip); 2331 } 2332 2333 void 2334 pcie_fab_fini_bus(dev_info_t *rcdip, uint8_t flags) 2335 { 2336 dev_info_t *dip = ddi_get_child(rcdip); 2337 pcie_bus_arg_t arg; 2338 2339 arg.init = B_FALSE; 2340 arg.flags = flags; 2341 2342 ndi_devi_enter(rcdip); 2343 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2344 ndi_devi_exit(rcdip); 2345 } 2346 2347 void 2348 pcie_enable_errors(dev_info_t *dip) 2349 { 2350 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2351 uint16_t reg16, tmp16; 2352 uint32_t reg32, tmp32; 2353 2354 ASSERT(bus_p); 2355 2356 /* 2357 * Clear any pending errors 2358 */ 2359 pcie_clear_errors(dip); 2360 2361 if (!PCIE_IS_PCIE(bus_p)) 2362 return; 2363 2364 /* 2365 * Enable Baseline Error Handling but leave CE reporting off (poweron 2366 * default). 2367 */ 2368 if ((reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL)) != 2369 PCI_CAP_EINVAL16) { 2370 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 2371 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2372 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 2373 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2374 (pcie_base_err_default & (~PCIE_DEVCTL_CE_REPORTING_EN)); 2375 2376 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 2377 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 2378 } 2379 2380 /* Enable Root Port Baseline Error Receiving */ 2381 if (PCIE_IS_ROOT(bus_p) && 2382 (reg16 = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL)) != 2383 PCI_CAP_EINVAL16) { 2384 2385 tmp16 = pcie_serr_disable_flag ? 2386 (pcie_root_ctrl_default & ~PCIE_ROOT_SYS_ERR) : 2387 pcie_root_ctrl_default; 2388 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, tmp16); 2389 PCIE_DBG_CAP(dip, bus_p, "ROOT DEVCTL", 16, PCIE_ROOTCTL, 2390 reg16); 2391 } 2392 2393 /* 2394 * Enable PCI-Express Advanced Error Handling if Exists 2395 */ 2396 if (!PCIE_HAS_AER(bus_p)) 2397 return; 2398 2399 /* Set Uncorrectable Severity */ 2400 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_SERV)) != 2401 PCI_CAP_EINVAL32) { 2402 tmp32 = pcie_aer_uce_severity; 2403 2404 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_SERV, tmp32); 2405 PCIE_DBG_AER(dip, bus_p, "AER UCE SEV", 32, PCIE_AER_UCE_SERV, 2406 reg32); 2407 } 2408 2409 /* Enable Uncorrectable errors */ 2410 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_MASK)) != 2411 PCI_CAP_EINVAL32) { 2412 tmp32 = pcie_aer_uce_mask; 2413 2414 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, tmp32); 2415 PCIE_DBG_AER(dip, bus_p, "AER UCE MASK", 32, PCIE_AER_UCE_MASK, 2416 reg32); 2417 } 2418 2419 /* Enable ECRC generation and checking */ 2420 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2421 PCI_CAP_EINVAL32) { 2422 tmp32 = reg32 | pcie_ecrc_value; 2423 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, tmp32); 2424 PCIE_DBG_AER(dip, bus_p, "AER CTL", 32, PCIE_AER_CTL, reg32); 2425 } 2426 2427 /* Enable Secondary Uncorrectable errors if this is a bridge */ 2428 if (!PCIE_IS_PCIE_BDG(bus_p)) 2429 goto root; 2430 2431 /* Set Uncorrectable Severity */ 2432 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_SERV)) != 2433 PCI_CAP_EINVAL32) { 2434 tmp32 = pcie_aer_suce_severity; 2435 2436 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_SERV, tmp32); 2437 PCIE_DBG_AER(dip, bus_p, "AER SUCE SEV", 32, PCIE_AER_SUCE_SERV, 2438 reg32); 2439 } 2440 2441 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_MASK)) != 2442 PCI_CAP_EINVAL32) { 2443 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, pcie_aer_suce_mask); 2444 PCIE_DBG_AER(dip, bus_p, "AER SUCE MASK", 32, 2445 PCIE_AER_SUCE_MASK, reg32); 2446 } 2447 2448 root: 2449 /* 2450 * Enable Root Control this is a Root device 2451 */ 2452 if (!PCIE_IS_ROOT(bus_p)) 2453 return; 2454 2455 if ((reg16 = PCIE_AER_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2456 PCI_CAP_EINVAL16) { 2457 PCIE_AER_PUT(16, bus_p, PCIE_AER_RE_CMD, 2458 pcie_root_error_cmd_default); 2459 PCIE_DBG_AER(dip, bus_p, "AER Root Err Cmd", 16, 2460 PCIE_AER_RE_CMD, reg16); 2461 } 2462 } 2463 2464 /* 2465 * This function is used for enabling CE reporting and setting the AER CE mask. 2466 * When called from outside the pcie module it should always be preceded by 2467 * a call to pcie_enable_errors. 2468 */ 2469 int 2470 pcie_enable_ce(dev_info_t *dip) 2471 { 2472 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2473 uint16_t device_sts, device_ctl; 2474 uint32_t tmp_pcie_aer_ce_mask; 2475 2476 if (!PCIE_IS_PCIE(bus_p)) 2477 return (DDI_SUCCESS); 2478 2479 /* 2480 * The "pcie_ce_mask" property is used to control both the CE reporting 2481 * enable field in the device control register and the AER CE mask. We 2482 * leave CE reporting disabled if pcie_ce_mask is set to -1. 2483 */ 2484 2485 tmp_pcie_aer_ce_mask = (uint32_t)ddi_prop_get_int(DDI_DEV_T_ANY, dip, 2486 DDI_PROP_DONTPASS, "pcie_ce_mask", pcie_aer_ce_mask); 2487 2488 if (tmp_pcie_aer_ce_mask == (uint32_t)-1) { 2489 /* 2490 * Nothing to do since CE reporting has already been disabled. 2491 */ 2492 return (DDI_SUCCESS); 2493 } 2494 2495 if (PCIE_HAS_AER(bus_p)) { 2496 /* Enable AER CE */ 2497 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, tmp_pcie_aer_ce_mask); 2498 PCIE_DBG_AER(dip, bus_p, "AER CE MASK", 32, PCIE_AER_CE_MASK, 2499 0); 2500 2501 /* Clear any pending AER CE errors */ 2502 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_STS, -1); 2503 } 2504 2505 /* clear any pending CE errors */ 2506 if ((device_sts = PCIE_CAP_GET(16, bus_p, PCIE_DEVSTS)) != 2507 PCI_CAP_EINVAL16) 2508 PCIE_CAP_PUT(16, bus_p, PCIE_DEVSTS, 2509 device_sts & (~PCIE_DEVSTS_CE_DETECTED)); 2510 2511 /* Enable CE reporting */ 2512 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2513 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, 2514 (device_ctl & (~PCIE_DEVCTL_ERR_MASK)) | pcie_base_err_default); 2515 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, device_ctl); 2516 2517 return (DDI_SUCCESS); 2518 } 2519 2520 /* ARGSUSED */ 2521 void 2522 pcie_disable_errors(dev_info_t *dip) 2523 { 2524 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2525 uint16_t device_ctl; 2526 uint32_t aer_reg; 2527 2528 if (!PCIE_IS_PCIE(bus_p)) 2529 return; 2530 2531 /* 2532 * Disable PCI-Express Baseline Error Handling 2533 */ 2534 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2535 device_ctl &= ~PCIE_DEVCTL_ERR_MASK; 2536 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, device_ctl); 2537 2538 /* 2539 * Disable PCI-Express Advanced Error Handling if Exists 2540 */ 2541 if (!PCIE_HAS_AER(bus_p)) 2542 goto root; 2543 2544 /* Disable Uncorrectable errors */ 2545 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, PCIE_AER_UCE_BITS); 2546 2547 /* Disable Correctable errors */ 2548 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, PCIE_AER_CE_BITS); 2549 2550 /* Disable ECRC generation and checking */ 2551 if ((aer_reg = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2552 PCI_CAP_EINVAL32) { 2553 aer_reg &= ~(PCIE_AER_CTL_ECRC_GEN_ENA | 2554 PCIE_AER_CTL_ECRC_CHECK_ENA); 2555 2556 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, aer_reg); 2557 } 2558 /* 2559 * Disable Secondary Uncorrectable errors if this is a bridge 2560 */ 2561 if (!PCIE_IS_PCIE_BDG(bus_p)) 2562 goto root; 2563 2564 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, PCIE_AER_SUCE_BITS); 2565 2566 root: 2567 /* 2568 * disable Root Control this is a Root device 2569 */ 2570 if (!PCIE_IS_ROOT(bus_p)) 2571 return; 2572 2573 if (!pcie_serr_disable_flag) { 2574 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL); 2575 device_ctl &= ~PCIE_ROOT_SYS_ERR; 2576 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, device_ctl); 2577 } 2578 2579 if (!PCIE_HAS_AER(bus_p)) 2580 return; 2581 2582 if ((device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2583 PCI_CAP_EINVAL16) { 2584 device_ctl &= ~pcie_root_error_cmd_default; 2585 PCIE_CAP_PUT(16, bus_p, PCIE_AER_RE_CMD, device_ctl); 2586 } 2587 } 2588 2589 /* 2590 * Extract bdf from "reg" property. 2591 */ 2592 int 2593 pcie_get_bdf_from_dip(dev_info_t *dip, pcie_req_id_t *bdf) 2594 { 2595 pci_regspec_t *regspec; 2596 int reglen; 2597 2598 if (ddi_prop_lookup_int_array(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2599 "reg", (int **)®spec, (uint_t *)®len) != DDI_SUCCESS) 2600 return (DDI_FAILURE); 2601 2602 if (reglen < (sizeof (pci_regspec_t) / sizeof (int))) { 2603 ddi_prop_free(regspec); 2604 return (DDI_FAILURE); 2605 } 2606 2607 /* Get phys_hi from first element. All have same bdf. */ 2608 *bdf = (regspec->pci_phys_hi & (PCI_REG_BDFR_M ^ PCI_REG_REG_M)) >> 8; 2609 2610 ddi_prop_free(regspec); 2611 return (DDI_SUCCESS); 2612 } 2613 2614 dev_info_t * 2615 pcie_get_my_childs_dip(dev_info_t *dip, dev_info_t *rdip) 2616 { 2617 dev_info_t *cdip = rdip; 2618 2619 for (; ddi_get_parent(cdip) != dip; cdip = ddi_get_parent(cdip)) 2620 ; 2621 2622 return (cdip); 2623 } 2624 2625 uint32_t 2626 pcie_get_bdf_for_dma_xfer(dev_info_t *dip, dev_info_t *rdip) 2627 { 2628 dev_info_t *cdip; 2629 2630 /* 2631 * As part of the probing, the PCI fcode interpreter may setup a DMA 2632 * request if a given card has a fcode on it using dip and rdip of the 2633 * hotplug connector i.e, dip and rdip of px/pcieb driver. In this 2634 * case, return a invalid value for the bdf since we cannot get to the 2635 * bdf value of the actual device which will be initiating this DMA. 2636 */ 2637 if (rdip == dip) 2638 return (PCIE_INVALID_BDF); 2639 2640 cdip = pcie_get_my_childs_dip(dip, rdip); 2641 2642 /* 2643 * For a given rdip, return the bdf value of dip's (px or pcieb) 2644 * immediate child or secondary bus-id if dip is a PCIe2PCI bridge. 2645 * 2646 * XXX - For now, return a invalid bdf value for all PCI and PCI-X 2647 * devices since this needs more work. 2648 */ 2649 return (PCI_GET_PCIE2PCI_SECBUS(cdip) ? 2650 PCIE_INVALID_BDF : PCI_GET_BDF(cdip)); 2651 } 2652 2653 uint32_t 2654 pcie_get_aer_uce_mask() 2655 { 2656 return (pcie_aer_uce_mask); 2657 } 2658 uint32_t 2659 pcie_get_aer_ce_mask() 2660 { 2661 return (pcie_aer_ce_mask); 2662 } 2663 uint32_t 2664 pcie_get_aer_suce_mask() 2665 { 2666 return (pcie_aer_suce_mask); 2667 } 2668 uint32_t 2669 pcie_get_serr_mask() 2670 { 2671 return (pcie_serr_disable_flag); 2672 } 2673 2674 void 2675 pcie_set_aer_uce_mask(uint32_t mask) 2676 { 2677 pcie_aer_uce_mask = mask; 2678 if (mask & PCIE_AER_UCE_UR) 2679 pcie_base_err_default &= ~PCIE_DEVCTL_UR_REPORTING_EN; 2680 else 2681 pcie_base_err_default |= PCIE_DEVCTL_UR_REPORTING_EN; 2682 2683 if (mask & PCIE_AER_UCE_ECRC) 2684 pcie_ecrc_value = 0; 2685 } 2686 2687 void 2688 pcie_set_aer_ce_mask(uint32_t mask) 2689 { 2690 pcie_aer_ce_mask = mask; 2691 } 2692 void 2693 pcie_set_aer_suce_mask(uint32_t mask) 2694 { 2695 pcie_aer_suce_mask = mask; 2696 } 2697 void 2698 pcie_set_serr_mask(uint32_t mask) 2699 { 2700 pcie_serr_disable_flag = mask; 2701 } 2702 2703 /* 2704 * Is the rdip a child of dip. Used for checking certain CTLOPS from bubbling 2705 * up erronously. Ex. ISA ctlops to a PCI-PCI Bridge. 2706 */ 2707 boolean_t 2708 pcie_is_child(dev_info_t *dip, dev_info_t *rdip) 2709 { 2710 dev_info_t *cdip = ddi_get_child(dip); 2711 for (; cdip; cdip = ddi_get_next_sibling(cdip)) 2712 if (cdip == rdip) 2713 break; 2714 return (cdip != NULL); 2715 } 2716 2717 boolean_t 2718 pcie_is_link_disabled(dev_info_t *dip) 2719 { 2720 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2721 2722 if (PCIE_IS_PCIE(bus_p)) { 2723 if (PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL) & 2724 PCIE_LINKCTL_LINK_DISABLE) 2725 return (B_TRUE); 2726 } 2727 return (B_FALSE); 2728 } 2729 2730 /* 2731 * Determines if there are any root ports attached to a root complex. 2732 * 2733 * dip - dip of root complex 2734 * 2735 * Returns - DDI_SUCCESS if there is at least one root port otherwise 2736 * DDI_FAILURE. 2737 */ 2738 int 2739 pcie_root_port(dev_info_t *dip) 2740 { 2741 int port_type; 2742 uint16_t cap_ptr; 2743 ddi_acc_handle_t config_handle; 2744 dev_info_t *cdip = ddi_get_child(dip); 2745 2746 /* 2747 * Determine if any of the children of the passed in dip 2748 * are root ports. 2749 */ 2750 for (; cdip; cdip = ddi_get_next_sibling(cdip)) { 2751 2752 if (pci_config_setup(cdip, &config_handle) != DDI_SUCCESS) 2753 continue; 2754 2755 if ((PCI_CAP_LOCATE(config_handle, PCI_CAP_ID_PCI_E, 2756 &cap_ptr)) == DDI_FAILURE) { 2757 pci_config_teardown(&config_handle); 2758 continue; 2759 } 2760 2761 port_type = PCI_CAP_GET16(config_handle, 0, cap_ptr, 2762 PCIE_PCIECAP) & PCIE_PCIECAP_DEV_TYPE_MASK; 2763 2764 pci_config_teardown(&config_handle); 2765 2766 if (port_type == PCIE_PCIECAP_DEV_TYPE_ROOT) 2767 return (DDI_SUCCESS); 2768 } 2769 2770 /* No root ports were found */ 2771 2772 return (DDI_FAILURE); 2773 } 2774 2775 /* 2776 * Function that determines if a device a PCIe device. 2777 * 2778 * dip - dip of device. 2779 * 2780 * returns - DDI_SUCCESS if device is a PCIe device, otherwise DDI_FAILURE. 2781 */ 2782 int 2783 pcie_dev(dev_info_t *dip) 2784 { 2785 /* get parent device's device_type property */ 2786 char *device_type; 2787 int rc = DDI_FAILURE; 2788 dev_info_t *pdip = ddi_get_parent(dip); 2789 2790 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, 2791 DDI_PROP_DONTPASS, "device_type", &device_type) 2792 != DDI_PROP_SUCCESS) { 2793 return (DDI_FAILURE); 2794 } 2795 2796 if (strcmp(device_type, "pciex") == 0) 2797 rc = DDI_SUCCESS; 2798 else 2799 rc = DDI_FAILURE; 2800 2801 ddi_prop_free(device_type); 2802 return (rc); 2803 } 2804 2805 void 2806 pcie_set_rber_fatal(dev_info_t *dip, boolean_t val) 2807 { 2808 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2809 bus_p->bus_pfd->pe_rber_fatal = val; 2810 } 2811 2812 /* 2813 * Return parent Root Port's pe_rber_fatal value. 2814 */ 2815 boolean_t 2816 pcie_get_rber_fatal(dev_info_t *dip) 2817 { 2818 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2819 pcie_bus_t *rp_bus_p = PCIE_DIP2UPBUS(bus_p->bus_rp_dip); 2820 return (rp_bus_p->bus_pfd->pe_rber_fatal); 2821 } 2822 2823 int 2824 pcie_ari_supported(dev_info_t *dip) 2825 { 2826 uint32_t devcap2; 2827 uint16_t pciecap; 2828 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2829 uint8_t dev_type; 2830 2831 PCIE_DBG("pcie_ari_supported: dip=%p\n", dip); 2832 2833 if (bus_p == NULL) 2834 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2835 2836 dev_type = bus_p->bus_dev_type; 2837 2838 if ((dev_type != PCIE_PCIECAP_DEV_TYPE_DOWN) && 2839 (dev_type != PCIE_PCIECAP_DEV_TYPE_ROOT)) 2840 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2841 2842 if (pcie_disable_ari) { 2843 PCIE_DBG("pcie_ari_supported: dip=%p: ARI Disabled\n", dip); 2844 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2845 } 2846 2847 pciecap = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 2848 2849 if ((pciecap & PCIE_PCIECAP_VER_MASK) < PCIE_PCIECAP_VER_2_0) { 2850 PCIE_DBG("pcie_ari_supported: dip=%p: Not 2.0\n", dip); 2851 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2852 } 2853 2854 devcap2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCAP2); 2855 2856 PCIE_DBG("pcie_ari_supported: dip=%p: DevCap2=0x%x\n", 2857 dip, devcap2); 2858 2859 if (devcap2 & PCIE_DEVCAP2_ARI_FORWARD) { 2860 PCIE_DBG("pcie_ari_supported: " 2861 "dip=%p: ARI Forwarding is supported\n", dip); 2862 return (PCIE_ARI_FORW_SUPPORTED); 2863 } 2864 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2865 } 2866 2867 int 2868 pcie_ari_enable(dev_info_t *dip) 2869 { 2870 uint16_t devctl2; 2871 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2872 2873 PCIE_DBG("pcie_ari_enable: dip=%p\n", dip); 2874 2875 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2876 return (DDI_FAILURE); 2877 2878 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2879 devctl2 |= PCIE_DEVCTL2_ARI_FORWARD_EN; 2880 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2881 2882 PCIE_DBG("pcie_ari_enable: dip=%p: writing 0x%x to DevCtl2\n", 2883 dip, devctl2); 2884 2885 return (DDI_SUCCESS); 2886 } 2887 2888 int 2889 pcie_ari_disable(dev_info_t *dip) 2890 { 2891 uint16_t devctl2; 2892 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2893 2894 PCIE_DBG("pcie_ari_disable: dip=%p\n", dip); 2895 2896 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2897 return (DDI_FAILURE); 2898 2899 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2900 devctl2 &= ~PCIE_DEVCTL2_ARI_FORWARD_EN; 2901 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2902 2903 PCIE_DBG("pcie_ari_disable: dip=%p: writing 0x%x to DevCtl2\n", 2904 dip, devctl2); 2905 2906 return (DDI_SUCCESS); 2907 } 2908 2909 int 2910 pcie_ari_is_enabled(dev_info_t *dip) 2911 { 2912 uint16_t devctl2; 2913 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2914 2915 PCIE_DBG("pcie_ari_is_enabled: dip=%p\n", dip); 2916 2917 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2918 return (PCIE_ARI_FORW_DISABLED); 2919 2920 devctl2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCTL2); 2921 2922 PCIE_DBG("pcie_ari_is_enabled: dip=%p: DevCtl2=0x%x\n", 2923 dip, devctl2); 2924 2925 if (devctl2 & PCIE_DEVCTL2_ARI_FORWARD_EN) { 2926 PCIE_DBG("pcie_ari_is_enabled: " 2927 "dip=%p: ARI Forwarding is enabled\n", dip); 2928 return (PCIE_ARI_FORW_ENABLED); 2929 } 2930 2931 return (PCIE_ARI_FORW_DISABLED); 2932 } 2933 2934 int 2935 pcie_ari_device(dev_info_t *dip) 2936 { 2937 ddi_acc_handle_t handle; 2938 uint16_t cap_ptr; 2939 2940 PCIE_DBG("pcie_ari_device: dip=%p\n", dip); 2941 2942 /* 2943 * XXX - This function may be called before the bus_p structure 2944 * has been populated. This code can be changed to remove 2945 * pci_config_setup()/pci_config_teardown() when the RFE 2946 * to populate the bus_p structures early in boot is putback. 2947 */ 2948 2949 /* First make sure it is a PCIe device */ 2950 2951 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2952 return (PCIE_NOT_ARI_DEVICE); 2953 2954 if ((PCI_CAP_LOCATE(handle, PCI_CAP_ID_PCI_E, &cap_ptr)) 2955 != DDI_SUCCESS) { 2956 pci_config_teardown(&handle); 2957 return (PCIE_NOT_ARI_DEVICE); 2958 } 2959 2960 /* Locate the ARI Capability */ 2961 2962 if ((PCI_CAP_LOCATE(handle, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), 2963 &cap_ptr)) == DDI_FAILURE) { 2964 pci_config_teardown(&handle); 2965 return (PCIE_NOT_ARI_DEVICE); 2966 } 2967 2968 /* ARI Capability was found so it must be a ARI device */ 2969 PCIE_DBG("pcie_ari_device: ARI Device dip=%p\n", dip); 2970 2971 pci_config_teardown(&handle); 2972 return (PCIE_ARI_DEVICE); 2973 } 2974 2975 int 2976 pcie_ari_get_next_function(dev_info_t *dip, int *func) 2977 { 2978 uint32_t val; 2979 uint16_t cap_ptr, next_function; 2980 ddi_acc_handle_t handle; 2981 2982 /* 2983 * XXX - This function may be called before the bus_p structure 2984 * has been populated. This code can be changed to remove 2985 * pci_config_setup()/pci_config_teardown() when the RFE 2986 * to populate the bus_p structures early in boot is putback. 2987 */ 2988 2989 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2990 return (DDI_FAILURE); 2991 2992 if ((PCI_CAP_LOCATE(handle, 2993 PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), &cap_ptr)) == DDI_FAILURE) { 2994 pci_config_teardown(&handle); 2995 return (DDI_FAILURE); 2996 } 2997 2998 val = PCI_CAP_GET32(handle, 0, cap_ptr, PCIE_ARI_CAP); 2999 3000 next_function = (val >> PCIE_ARI_CAP_NEXT_FUNC_SHIFT) & 3001 PCIE_ARI_CAP_NEXT_FUNC_MASK; 3002 3003 pci_config_teardown(&handle); 3004 3005 *func = next_function; 3006 3007 return (DDI_SUCCESS); 3008 } 3009 3010 dev_info_t * 3011 pcie_func_to_dip(dev_info_t *dip, pcie_req_id_t function) 3012 { 3013 pcie_req_id_t child_bdf; 3014 dev_info_t *cdip; 3015 3016 for (cdip = ddi_get_child(dip); cdip; 3017 cdip = ddi_get_next_sibling(cdip)) { 3018 3019 if (pcie_get_bdf_from_dip(cdip, &child_bdf) == DDI_FAILURE) 3020 return (NULL); 3021 3022 if ((child_bdf & PCIE_REQ_ID_ARI_FUNC_MASK) == function) 3023 return (cdip); 3024 } 3025 return (NULL); 3026 } 3027 3028 #ifdef DEBUG 3029 3030 static void 3031 pcie_print_bus(pcie_bus_t *bus_p) 3032 { 3033 pcie_dbg("\tbus_dip = 0x%p\n", bus_p->bus_dip); 3034 pcie_dbg("\tbus_fm_flags = 0x%x\n", bus_p->bus_fm_flags); 3035 3036 pcie_dbg("\tbus_bdf = 0x%x\n", bus_p->bus_bdf); 3037 pcie_dbg("\tbus_dev_ven_id = 0x%x\n", bus_p->bus_dev_ven_id); 3038 pcie_dbg("\tbus_rev_id = 0x%x\n", bus_p->bus_rev_id); 3039 pcie_dbg("\tbus_hdr_type = 0x%x\n", bus_p->bus_hdr_type); 3040 pcie_dbg("\tbus_dev_type = 0x%x\n", bus_p->bus_dev_type); 3041 pcie_dbg("\tbus_bdg_secbus = 0x%x\n", bus_p->bus_bdg_secbus); 3042 pcie_dbg("\tbus_pcie_off = 0x%x\n", bus_p->bus_pcie_off); 3043 pcie_dbg("\tbus_aer_off = 0x%x\n", bus_p->bus_aer_off); 3044 pcie_dbg("\tbus_pcix_off = 0x%x\n", bus_p->bus_pcix_off); 3045 pcie_dbg("\tbus_ecc_ver = 0x%x\n", bus_p->bus_ecc_ver); 3046 } 3047 3048 /* 3049 * For debugging purposes set pcie_dbg_print != 0 to see printf messages 3050 * during interrupt. 3051 * 3052 * When a proper solution is in place this code will disappear. 3053 * Potential solutions are: 3054 * o circular buffers 3055 * o taskq to print at lower pil 3056 */ 3057 int pcie_dbg_print = 0; 3058 void 3059 pcie_dbg(char *fmt, ...) 3060 { 3061 va_list ap; 3062 3063 if (!pcie_debug_flags) { 3064 return; 3065 } 3066 va_start(ap, fmt); 3067 if (servicing_interrupt()) { 3068 if (pcie_dbg_print) { 3069 prom_vprintf(fmt, ap); 3070 } 3071 } else { 3072 prom_vprintf(fmt, ap); 3073 } 3074 va_end(ap); 3075 } 3076 #endif /* DEBUG */ 3077 3078 #if defined(__x86) 3079 static void 3080 pcie_check_io_mem_range(ddi_acc_handle_t cfg_hdl, boolean_t *empty_io_range, 3081 boolean_t *empty_mem_range) 3082 { 3083 uint8_t class, subclass; 3084 uint_t val; 3085 3086 class = pci_config_get8(cfg_hdl, PCI_CONF_BASCLASS); 3087 subclass = pci_config_get8(cfg_hdl, PCI_CONF_SUBCLASS); 3088 3089 if ((class == PCI_CLASS_BRIDGE) && (subclass == PCI_BRIDGE_PCI)) { 3090 val = (((uint_t)pci_config_get8(cfg_hdl, PCI_BCNF_IO_BASE_LOW) & 3091 PCI_BCNF_IO_MASK) << 8); 3092 /* 3093 * Assuming that a zero based io_range[0] implies an 3094 * invalid I/O range. Likewise for mem_range[0]. 3095 */ 3096 if (val == 0) 3097 *empty_io_range = B_TRUE; 3098 val = (((uint_t)pci_config_get16(cfg_hdl, PCI_BCNF_MEM_BASE) & 3099 PCI_BCNF_MEM_MASK) << 16); 3100 if (val == 0) 3101 *empty_mem_range = B_TRUE; 3102 } 3103 } 3104 3105 #endif /* defined(__x86) */ 3106 3107 boolean_t 3108 pcie_link_bw_supported(dev_info_t *dip) 3109 { 3110 uint32_t linkcap; 3111 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3112 3113 if (!PCIE_IS_PCIE(bus_p)) { 3114 return (B_FALSE); 3115 } 3116 3117 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3118 return (B_FALSE); 3119 } 3120 3121 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 3122 return ((linkcap & PCIE_LINKCAP_LINK_BW_NOTIFY_CAP) != 0); 3123 } 3124 3125 int 3126 pcie_link_bw_enable(dev_info_t *dip) 3127 { 3128 uint16_t linkctl; 3129 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3130 3131 if (pcie_disable_lbw != 0) { 3132 return (DDI_FAILURE); 3133 } 3134 3135 if (!pcie_link_bw_supported(dip)) { 3136 return (DDI_FAILURE); 3137 } 3138 3139 mutex_init(&bus_p->bus_lbw_mutex, NULL, MUTEX_DRIVER, NULL); 3140 cv_init(&bus_p->bus_lbw_cv, NULL, CV_DRIVER, NULL); 3141 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3142 linkctl |= PCIE_LINKCTL_LINK_BW_INTR_EN; 3143 linkctl |= PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3144 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3145 3146 bus_p->bus_lbw_pbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3147 bus_p->bus_lbw_cbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3148 bus_p->bus_lbw_state |= PCIE_LBW_S_ENABLED; 3149 3150 return (DDI_SUCCESS); 3151 } 3152 3153 int 3154 pcie_link_bw_disable(dev_info_t *dip) 3155 { 3156 uint16_t linkctl; 3157 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3158 3159 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3160 return (DDI_FAILURE); 3161 } 3162 3163 mutex_enter(&bus_p->bus_lbw_mutex); 3164 while ((bus_p->bus_lbw_state & 3165 (PCIE_LBW_S_DISPATCHED | PCIE_LBW_S_RUNNING)) != 0) { 3166 cv_wait(&bus_p->bus_lbw_cv, &bus_p->bus_lbw_mutex); 3167 } 3168 mutex_exit(&bus_p->bus_lbw_mutex); 3169 3170 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3171 linkctl &= ~PCIE_LINKCTL_LINK_BW_INTR_EN; 3172 linkctl &= ~PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3173 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3174 3175 bus_p->bus_lbw_state &= ~PCIE_LBW_S_ENABLED; 3176 kmem_free(bus_p->bus_lbw_pbuf, MAXPATHLEN); 3177 kmem_free(bus_p->bus_lbw_cbuf, MAXPATHLEN); 3178 bus_p->bus_lbw_pbuf = NULL; 3179 bus_p->bus_lbw_cbuf = NULL; 3180 3181 mutex_destroy(&bus_p->bus_lbw_mutex); 3182 cv_destroy(&bus_p->bus_lbw_cv); 3183 3184 return (DDI_SUCCESS); 3185 } 3186 3187 void 3188 pcie_link_bw_taskq(void *arg) 3189 { 3190 dev_info_t *dip = arg; 3191 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3192 dev_info_t *cdip; 3193 boolean_t again; 3194 sysevent_t *se; 3195 sysevent_value_t se_val; 3196 sysevent_id_t eid; 3197 sysevent_attr_list_t *ev_attr_list; 3198 3199 top: 3200 ndi_devi_enter(dip); 3201 se = NULL; 3202 ev_attr_list = NULL; 3203 mutex_enter(&bus_p->bus_lbw_mutex); 3204 bus_p->bus_lbw_state &= ~PCIE_LBW_S_DISPATCHED; 3205 bus_p->bus_lbw_state |= PCIE_LBW_S_RUNNING; 3206 mutex_exit(&bus_p->bus_lbw_mutex); 3207 3208 /* 3209 * Update our own speeds as we've likely changed something. 3210 */ 3211 pcie_capture_speeds(dip); 3212 3213 /* 3214 * Walk our children. We only care about updating this on function 0 3215 * because the PCIe specification requires that these all be the same 3216 * otherwise. 3217 */ 3218 for (cdip = ddi_get_child(dip); cdip != NULL; 3219 cdip = ddi_get_next_sibling(cdip)) { 3220 pcie_bus_t *cbus_p = PCIE_DIP2BUS(cdip); 3221 3222 if (cbus_p == NULL) { 3223 continue; 3224 } 3225 3226 if ((cbus_p->bus_bdf & PCIE_REQ_ID_FUNC_MASK) != 0) { 3227 continue; 3228 } 3229 3230 /* 3231 * It's possible that this can fire while a child is otherwise 3232 * only partially constructed. Therefore, if we don't have the 3233 * config handle, don't bother updating the child. 3234 */ 3235 if (cbus_p->bus_cfg_hdl == NULL) { 3236 continue; 3237 } 3238 3239 pcie_capture_speeds(cdip); 3240 break; 3241 } 3242 3243 se = sysevent_alloc(EC_PCIE, ESC_PCIE_LINK_STATE, 3244 ILLUMOS_KERN_PUB "pcie", SE_SLEEP); 3245 3246 (void) ddi_pathname(dip, bus_p->bus_lbw_pbuf); 3247 se_val.value_type = SE_DATA_TYPE_STRING; 3248 se_val.value.sv_string = bus_p->bus_lbw_pbuf; 3249 if (sysevent_add_attr(&ev_attr_list, PCIE_EV_DETECTOR_PATH, &se_val, 3250 SE_SLEEP) != 0) { 3251 ndi_devi_exit(dip); 3252 goto err; 3253 } 3254 3255 if (cdip != NULL) { 3256 (void) ddi_pathname(cdip, bus_p->bus_lbw_cbuf); 3257 3258 se_val.value_type = SE_DATA_TYPE_STRING; 3259 se_val.value.sv_string = bus_p->bus_lbw_cbuf; 3260 3261 /* 3262 * If this fails, that's OK. We'd rather get the event off and 3263 * there's a chance that there may not be anything there for us. 3264 */ 3265 (void) sysevent_add_attr(&ev_attr_list, PCIE_EV_CHILD_PATH, 3266 &se_val, SE_SLEEP); 3267 } 3268 3269 ndi_devi_exit(dip); 3270 3271 /* 3272 * Before we generate and send down a sysevent, we need to tell the 3273 * system that parts of the devinfo cache need to be invalidated. While 3274 * the function below takes several args, it ignores them all. Because 3275 * this is a global invalidation, we don't bother trying to do much more 3276 * than requesting a global invalidation, lest we accidentally kick off 3277 * several in a row. 3278 */ 3279 ddi_prop_cache_invalidate(DDI_DEV_T_NONE, NULL, NULL, 0); 3280 3281 if (sysevent_attach_attributes(se, ev_attr_list) != 0) { 3282 goto err; 3283 } 3284 ev_attr_list = NULL; 3285 3286 if (log_sysevent(se, SE_SLEEP, &eid) != 0) { 3287 goto err; 3288 } 3289 3290 err: 3291 sysevent_free_attr(ev_attr_list); 3292 sysevent_free(se); 3293 3294 mutex_enter(&bus_p->bus_lbw_mutex); 3295 bus_p->bus_lbw_state &= ~PCIE_LBW_S_RUNNING; 3296 cv_broadcast(&bus_p->bus_lbw_cv); 3297 again = (bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) != 0; 3298 mutex_exit(&bus_p->bus_lbw_mutex); 3299 3300 if (again) { 3301 goto top; 3302 } 3303 } 3304 3305 int 3306 pcie_link_bw_intr(dev_info_t *dip) 3307 { 3308 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3309 uint16_t linksts; 3310 uint16_t flags = PCIE_LINKSTS_LINK_BW_MGMT | PCIE_LINKSTS_AUTO_BW; 3311 3312 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3313 return (DDI_INTR_UNCLAIMED); 3314 } 3315 3316 linksts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3317 if ((linksts & flags) == 0) { 3318 return (DDI_INTR_UNCLAIMED); 3319 } 3320 3321 /* 3322 * Check if we've already dispatched this event. If we have already 3323 * dispatched it, then there's nothing else to do, we coalesce multiple 3324 * events. 3325 */ 3326 mutex_enter(&bus_p->bus_lbw_mutex); 3327 bus_p->bus_lbw_nevents++; 3328 if ((bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) == 0) { 3329 if ((bus_p->bus_lbw_state & PCIE_LBW_S_RUNNING) == 0) { 3330 taskq_dispatch_ent(pcie_link_tq, pcie_link_bw_taskq, 3331 dip, 0, &bus_p->bus_lbw_ent); 3332 } 3333 3334 bus_p->bus_lbw_state |= PCIE_LBW_S_DISPATCHED; 3335 } 3336 mutex_exit(&bus_p->bus_lbw_mutex); 3337 3338 PCIE_CAP_PUT(16, bus_p, PCIE_LINKSTS, flags); 3339 return (DDI_INTR_CLAIMED); 3340 } 3341 3342 int 3343 pcie_link_set_target(dev_info_t *dip, pcie_link_speed_t speed) 3344 { 3345 uint16_t ctl2, rval; 3346 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3347 3348 if (!PCIE_IS_PCIE(bus_p)) { 3349 return (ENOTSUP); 3350 } 3351 3352 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3353 return (ENOTSUP); 3354 } 3355 3356 if (bus_p->bus_pcie_vers < 2) { 3357 return (ENOTSUP); 3358 } 3359 3360 switch (speed) { 3361 case PCIE_LINK_SPEED_2_5: 3362 rval = PCIE_LINKCTL2_TARGET_SPEED_2_5; 3363 break; 3364 case PCIE_LINK_SPEED_5: 3365 rval = PCIE_LINKCTL2_TARGET_SPEED_5; 3366 break; 3367 case PCIE_LINK_SPEED_8: 3368 rval = PCIE_LINKCTL2_TARGET_SPEED_8; 3369 break; 3370 case PCIE_LINK_SPEED_16: 3371 rval = PCIE_LINKCTL2_TARGET_SPEED_16; 3372 break; 3373 case PCIE_LINK_SPEED_32: 3374 rval = PCIE_LINKCTL2_TARGET_SPEED_32; 3375 break; 3376 case PCIE_LINK_SPEED_64: 3377 rval = PCIE_LINKCTL2_TARGET_SPEED_64; 3378 break; 3379 default: 3380 return (EINVAL); 3381 } 3382 3383 mutex_enter(&bus_p->bus_speed_mutex); 3384 if ((bus_p->bus_sup_speed & speed) == 0) { 3385 mutex_exit(&bus_p->bus_speed_mutex); 3386 return (ENOTSUP); 3387 } 3388 3389 bus_p->bus_target_speed = speed; 3390 bus_p->bus_speed_flags |= PCIE_LINK_F_ADMIN_TARGET; 3391 3392 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 3393 ctl2 &= ~PCIE_LINKCTL2_TARGET_SPEED_MASK; 3394 ctl2 |= rval; 3395 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL2, ctl2); 3396 mutex_exit(&bus_p->bus_speed_mutex); 3397 3398 /* 3399 * Make sure our updates have been reflected in devinfo. 3400 */ 3401 pcie_capture_speeds(dip); 3402 3403 return (0); 3404 } 3405 3406 int 3407 pcie_link_retrain(dev_info_t *dip) 3408 { 3409 uint16_t ctl; 3410 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3411 3412 if (!PCIE_IS_PCIE(bus_p)) { 3413 return (ENOTSUP); 3414 } 3415 3416 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3417 return (ENOTSUP); 3418 } 3419 3420 /* 3421 * The PCIe specification suggests that we make sure that the link isn't 3422 * in training before issuing this command in case there was a state 3423 * machine transition prior to when we got here. We wait and then go 3424 * ahead and issue the command anyways. 3425 */ 3426 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3427 uint16_t sts; 3428 3429 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3430 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3431 break; 3432 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3433 } 3434 3435 ctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3436 ctl |= PCIE_LINKCTL_RETRAIN_LINK; 3437 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, ctl); 3438 3439 /* 3440 * Wait again to see if it clears before returning to the user. 3441 */ 3442 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3443 uint16_t sts; 3444 3445 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3446 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3447 break; 3448 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3449 } 3450 3451 return (0); 3452 } 3453 3454 /* 3455 * Here we're going through and grabbing information about a given PCIe device. 3456 * Our situation is a little bit complicated at this point. This gets invoked 3457 * both during early initialization and during hotplug events. We cannot rely on 3458 * the device node having been fully set up, that is, while the pcie_bus_t 3459 * normally contains a ddi_acc_handle_t for configuration space, that may not be 3460 * valid yet as this can occur before child initialization or we may be dealing 3461 * with a function that will never have a handle. 3462 * 3463 * However, we should always have a fully furnished pcie_bus_t, which means that 3464 * we can get its bdf and use that to access the devices configuration space. 3465 */ 3466 static int 3467 pcie_fabric_feature_scan(dev_info_t *dip, void *arg) 3468 { 3469 pcie_bus_t *bus_p; 3470 uint32_t devcap; 3471 uint16_t mps; 3472 dev_info_t *rcdip; 3473 pcie_fabric_data_t *fab = arg; 3474 3475 /* 3476 * Skip over non-PCIe devices. If we encounter something here, we don't 3477 * bother going through any of its children because we don't have reason 3478 * to believe that a PCIe device that this will impact will exist below 3479 * this. While it is possible that there's a PCIe fabric downstream an 3480 * intermediate old PCI/PCI-X bus, at that point, we'll still trigger 3481 * our complex fabric detection and use the minimums. 3482 * 3483 * The reason this doesn't trigger an immediate flagging as a complex 3484 * case like the one below is because we could be scanning a device that 3485 * is a nexus driver and has children already (albeit that would be 3486 * somewhat surprising as we don't anticipate being called at this 3487 * point). 3488 */ 3489 if (pcie_dev(dip) != DDI_SUCCESS) { 3490 return (DDI_WALK_PRUNECHILD); 3491 } 3492 3493 /* 3494 * If we fail to find a pcie_bus_t for some reason, that's somewhat 3495 * surprising. We log this fact and set the complex flag and indicate it 3496 * was because of this case. This immediately transitions us to a 3497 * "complex" case which means use the minimal, safe, settings. 3498 */ 3499 bus_p = PCIE_DIP2BUS(dip); 3500 if (bus_p == NULL) { 3501 dev_err(dip, CE_WARN, "failed to find associated pcie_bus_t " 3502 "during fabric scan"); 3503 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3504 return (DDI_WALK_TERMINATE); 3505 } 3506 3507 /* 3508 * In a similar case, there is hardware out there which is a PCIe 3509 * device, but does not advertise a PCIe capability. An example of this 3510 * is the IDT Tsi382A which can hide its PCIe capability. If this is 3511 * the case, we immediately terminate scanning and flag this as a 3512 * 'complex' case which causes us to use guaranteed safe settings. 3513 */ 3514 if (bus_p->bus_pcie_off == 0) { 3515 dev_err(dip, CE_WARN, "encountered PCIe device without PCIe " 3516 "capability"); 3517 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3518 return (DDI_WALK_TERMINATE); 3519 } 3520 3521 rcdip = pcie_get_rc_dip(dip); 3522 3523 /* 3524 * First, start by determining what the device's tagging and max packet 3525 * size is. All PCIe devices will always have the 8-bit tag information 3526 * as this has existed since PCIe 1.0. 10-bit tagging requires a V2 3527 * PCIe capability. 14-bit requires the DEV3 cap. If we are missing a 3528 * version or capability, then we always treat that as lacking the bits 3529 * in the fabric. 3530 */ 3531 ASSERT3U(bus_p->bus_pcie_off, !=, 0); 3532 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3533 PCIE_DEVCAP); 3534 mps = devcap & PCIE_DEVCAP_MAX_PAYLOAD_MASK; 3535 if (mps < fab->pfd_mps_found) { 3536 fab->pfd_mps_found = mps; 3537 } 3538 3539 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) == 0) { 3540 fab->pfd_tag_found &= ~PCIE_TAG_8B; 3541 } 3542 3543 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0) { 3544 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3545 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3546 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_COMP_SUP) == 0) { 3547 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3548 } 3549 } else { 3550 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3551 } 3552 3553 if (bus_p->bus_dev3_off != 0) { 3554 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3555 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3556 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_COMP_SUP) == 0) { 3557 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3558 } 3559 } else { 3560 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3561 } 3562 3563 /* 3564 * Now that we have captured device information, we must go and ask 3565 * questions of the topology here. The big theory statement enumerates 3566 * several types of cases. The big question we need to answer is have we 3567 * encountered a hotpluggable bridge that means we need to mark this as 3568 * complex. 3569 * 3570 * The big theory statement notes several different kinds of hotplug 3571 * topologies that exist that we can theoretically support. Right now we 3572 * opt to keep our lives simple and focus solely on (4) and (5). These 3573 * can both be summarized by a single, fairly straightforward rule: 3574 * 3575 * The only allowed hotpluggable entity is a root port. 3576 * 3577 * The reason that this can work and detect cases like (6), (7), and our 3578 * other invalid ones is that the hotplug code will scan and find all 3579 * children before we are called into here. 3580 */ 3581 if (bus_p->bus_hp_sup_modes != 0) { 3582 /* 3583 * We opt to terminate in this case because there's no value in 3584 * scanning the rest of the tree at this point. 3585 */ 3586 if (!PCIE_IS_RP(bus_p)) { 3587 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3588 return (DDI_WALK_TERMINATE); 3589 } 3590 3591 fab->pfd_flags |= PCIE_FABRIC_F_RP_HP; 3592 } 3593 3594 /* 3595 * As our walk starts at a root port, we need to make sure that we don't 3596 * pick up any of its siblings and their children as those would be 3597 * different PCIe fabric domains for us to scan. In many hardware 3598 * platforms multiple root ports are all at the same level in the tree. 3599 */ 3600 if (bus_p->bus_rp_dip == dip) { 3601 return (DDI_WALK_PRUNESIB); 3602 } 3603 3604 return (DDI_WALK_CONTINUE); 3605 } 3606 3607 static int 3608 pcie_fabric_feature_set(dev_info_t *dip, void *arg) 3609 { 3610 pcie_bus_t *bus_p; 3611 dev_info_t *rcdip; 3612 pcie_fabric_data_t *fab = arg; 3613 uint32_t devcap, devctl; 3614 3615 if (pcie_dev(dip) != DDI_SUCCESS) { 3616 return (DDI_WALK_PRUNECHILD); 3617 } 3618 3619 /* 3620 * The missing bus_t sent us into the complex case previously. We still 3621 * need to make sure all devices have values we expect here and thus 3622 * don't terminate like the above. The same is true for the case where 3623 * there is no PCIe capability. 3624 */ 3625 bus_p = PCIE_DIP2BUS(dip); 3626 if (bus_p == NULL || bus_p->bus_pcie_off == 0) { 3627 return (DDI_WALK_CONTINUE); 3628 } 3629 rcdip = pcie_get_rc_dip(dip); 3630 3631 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3632 PCIE_DEVCAP); 3633 devctl = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3634 PCIE_DEVCTL); 3635 3636 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) != 0 && 3637 (fab->pfd_tag_act & PCIE_TAG_8B) != 0) { 3638 devctl |= PCIE_DEVCTL_EXT_TAG_FIELD_EN; 3639 } 3640 3641 devctl &= ~PCIE_DEVCTL_MAX_PAYLOAD_MASK; 3642 ASSERT0(fab->pfd_mps_act & ~PCIE_DEVCAP_MAX_PAYLOAD_MASK); 3643 devctl |= fab->pfd_mps_act << PCIE_DEVCTL_MAX_PAYLOAD_SHIFT; 3644 3645 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3646 PCIE_DEVCTL, devctl); 3647 3648 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0 && 3649 (fab->pfd_tag_act & PCIE_TAG_10B_COMP) != 0) { 3650 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3651 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3652 3653 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_REQ_SUP) == 0) { 3654 uint16_t devctl2 = pci_cfgacc_get16(rcdip, 3655 bus_p->bus_bdf, bus_p->bus_pcie_off + PCIE_DEVCTL2); 3656 devctl2 |= PCIE_DEVCTL2_10B_TAG_REQ_EN; 3657 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3658 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl2); 3659 } 3660 } 3661 3662 if (bus_p->bus_dev3_off != 0 && 3663 (fab->pfd_tag_act & PCIE_TAG_14B_COMP) != 0) { 3664 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3665 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3666 3667 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_REQ_SUP) == 0) { 3668 uint16_t devctl3 = pci_cfgacc_get16(rcdip, 3669 bus_p->bus_bdf, bus_p->bus_dev3_off + PCIE_DEVCTL3); 3670 devctl3 |= PCIE_DEVCTL3_14B_TAG_REQ_EN; 3671 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3672 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl3); 3673 } 3674 } 3675 3676 /* 3677 * As our walk starts at a root port, we need to make sure that we don't 3678 * pick up any of its siblings and their children as those would be 3679 * different PCIe fabric domains for us to scan. In many hardware 3680 * platforms multiple root ports are all at the same level in the tree. 3681 */ 3682 if (bus_p->bus_rp_dip == dip) { 3683 return (DDI_WALK_PRUNESIB); 3684 } 3685 3686 return (DDI_WALK_CONTINUE); 3687 } 3688 3689 /* 3690 * This is used to scan and determine the total set of PCIe fabric settings that 3691 * we should have in the system for everything downstream of this specified root 3692 * port. Note, it is only really safe to call this while working from the 3693 * perspective of a root port as we will be walking down the entire device tree. 3694 * 3695 * However, our callers, particularly hoptlug, don't have all the information 3696 * we'd like. In particular, we need to check that: 3697 * 3698 * o This is actually a PCIe device. 3699 * o That this is a root port (see the big theory statement to understand this 3700 * constraint). 3701 */ 3702 void 3703 pcie_fabric_setup(dev_info_t *dip) 3704 { 3705 pcie_bus_t *bus_p; 3706 pcie_fabric_data_t *fab; 3707 dev_info_t *pdip; 3708 3709 bus_p = PCIE_DIP2BUS(dip); 3710 if (bus_p == NULL || !PCIE_IS_RP(bus_p)) { 3711 return; 3712 } 3713 3714 VERIFY3P(bus_p->bus_fab, !=, NULL); 3715 fab = bus_p->bus_fab; 3716 3717 /* 3718 * For us to call ddi_walk_devs(), our parent needs to be held. 3719 * ddi_walk_devs() will take care of grabbing our dip as part of its 3720 * walk before we iterate over our children. 3721 * 3722 * A reasonable question to ask here is why is it safe to ask for our 3723 * parent? In this case, because we have entered here through some 3724 * thread that's operating on us whether as part of attach or a hotplug 3725 * event, our dip somewhat by definition has to be valid. If we were 3726 * looking at our dip's children and then asking them for a parent, then 3727 * that would be a race condition. 3728 */ 3729 pdip = ddi_get_parent(dip); 3730 VERIFY3P(pdip, !=, NULL); 3731 ndi_devi_enter(pdip); 3732 fab->pfd_flags |= PCIE_FABRIC_F_SCANNING; 3733 3734 /* 3735 * Reinitialize the tracking structure to basically set the maximum 3736 * caps. These will be chipped away during the scan. 3737 */ 3738 fab->pfd_mps_found = PCIE_DEVCAP_MAX_PAYLOAD_4096; 3739 fab->pfd_tag_found = PCIE_TAG_ALL; 3740 fab->pfd_flags &= ~PCIE_FABRIC_F_COMPLEX; 3741 3742 ddi_walk_devs(dip, pcie_fabric_feature_scan, fab); 3743 3744 if ((fab->pfd_flags & PCIE_FABRIC_F_COMPLEX) != 0) { 3745 fab->pfd_tag_act = PCIE_TAG_5B; 3746 fab->pfd_mps_act = PCIE_DEVCAP_MAX_PAYLOAD_128; 3747 } else { 3748 fab->pfd_tag_act = fab->pfd_tag_found; 3749 fab->pfd_mps_act = fab->pfd_mps_found; 3750 } 3751 3752 ddi_walk_devs(dip, pcie_fabric_feature_set, fab); 3753 3754 fab->pfd_flags &= ~PCIE_FABRIC_F_SCANNING; 3755 ndi_devi_exit(pdip); 3756 } 3757