1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 24 * Copyright 2019 Joyent, Inc. 25 * Copyright 2024 Oxide Computer Company 26 */ 27 28 /* 29 * PCIe Initialization 30 * ------------------- 31 * 32 * The PCIe subsystem is split about and initializes itself in a couple of 33 * different places. This is due to the platform-specific nature of initializing 34 * resources and the nature of the SPARC PROM and how that influenced the 35 * subsystem. Note that traditional PCI (mostly seen these days in Virtual 36 * Machines) follows most of the same basic path outlined here, but skips a 37 * large chunk of PCIe-specific initialization. 38 * 39 * First, there is an initial device discovery phase that is taken care of by 40 * the platform. This is where we discover the set of devices that are present 41 * at system power on. These devices may or may not be hot-pluggable. In 42 * particular, this happens in a platform-specific way right now. In general, we 43 * expect most discovery to be driven by scanning each bus, device, and 44 * function, and seeing what actually exists and responds to configuration space 45 * reads. This is driven via pci_boot.c on x86. This may be seeded by something 46 * like device tree, a PROM, supplemented with ACPI, or by knowledge that the 47 * underlying platform has. 48 * 49 * As a part of this discovery process, the full set of resources that exist in 50 * the system for PCIe are: 51 * 52 * o PCI buses 53 * o Prefetchable Memory 54 * o Non-prefetchable memory 55 * o I/O ports 56 * 57 * This process is driven by a platform's PCI platform Resource Discovery (PRD) 58 * module. The PRD definitions can be found in <sys/plat/pci_prd.h> and are used 59 * to discover these resources, which will be converted into the initial set of 60 * the standard properties in the system: 'regs', 'available', 'ranges', etc. 61 * Currently it is up to platform-specific code (which should ideally be 62 * consolidated at some point) to set up all these properties. 63 * 64 * As a part of the discovery process, the platform code will create a device 65 * node (dev_info_t) for each discovered function and will create a PCIe nexus 66 * for each overall root complex that exists in the system. Most root complexes 67 * will have multiple root ports, each of which is the foundation of an 68 * independent PCIe bus due to the point-to-point nature of PCIe. When a root 69 * complex is found, a nexus driver such as npe (Nexus for PCIe Express) is 70 * attached. In the case of a non-PCIe-capable system this is where the older 71 * pci nexus driver would be used instead. 72 * 73 * To track data about a given device on a bus, a 'pcie_bus_t' structure is 74 * created for and assigned to every PCIe-based dev_info_t. This can be used to 75 * find the root port and get basic information about the device, its faults, 76 * and related information. This contains pointers to the corresponding root 77 * port as well. 78 * 79 * A root complex has its pcie_bus_t initialized as part of the device discovery 80 * process. That is, because we're trying to bootstrap the actual tree and most 81 * platforms don't have a representation for this that's explicitly 82 * discoverable, this is created manually. See callers of pcie_rc_init_bus(). 83 * 84 * For other devices, bridges, and switches, the process is split into two. 85 * There is an initial pcie_bus_t that is created which will exist before we go 86 * through the actual driver attachment process. For example, on x86 this is 87 * done as part of the device and function discovery. The second pass of 88 * initialization is done only after the nexus driver actually is attached and 89 * it goes through and finishes processing all of its children. 90 * 91 * Child Initialization 92 * -------------------- 93 * 94 * Generally speaking, the platform will first enumerate all PCIe devices that 95 * are in the sytem before it actually creates a device tree. This is part of 96 * the bus/device/function scanning that is performed and from that dev_info_t 97 * nodes are created for each discovered device and are inserted into the 98 * broader device tree. Later in boot, the actual device tree is walked and the 99 * nodes go through the standard dev_info_t initialization process (DS_PROTO, 100 * DS_LINKED, DS_BOUND, etc.). 101 * 102 * PCIe-specific initialization can roughly be broken into the following pieces: 103 * 104 * 1. Platform initial discovery and resource assignment 105 * 2. The pcie_bus_t initialization 106 * 3. Nexus driver child initialization 107 * 4. Fabric initialization 108 * 5. Device driver-specific initialization 109 * 110 * The first part of this (1) and (2) are discussed in the previous section. 111 * Part (1) in particular is a combination of the PRD (platform resource 112 * discovery) and general device initialization. After this, because we have a 113 * device tree, most of the standard nexus initialization happens. 114 * 115 * (5) is somewhat simple, so let's get into it before we discuss (3) and (4). 116 * This is the last thing that is called and that happens after all of the 117 * others are done. This is the logic that occurs in a driver's attach(9E) entry 118 * point. This is always device-specific and generally speaking should not be 119 * manipulating standard PCIe registers directly on their own. For example, the 120 * MSI/MSI-X, AER, Serial Number, etc. capabilities will be automatically dealt 121 * with by the framework in (3) and (4) below. In many cases, particularly 122 * things that are part of (4), adjusting them in the individual driver is not 123 * safe. 124 * 125 * Finally, let's talk about (3) and (4) as these are related. The NDI provides 126 * for a standard hook for a nexus to initialize its children. In our platforms, 127 * there are basically two possible PCIe nexus drivers: there is the generic 128 * pcieb -- PCIe bridge -- driver which is used for standard root ports, 129 * switches, etc. Then there is the platform-specific primary nexus driver, 130 * which is being slowly consolidated into a single one where it makes sense. An 131 * example of this is npe. 132 * 133 * Each of these has a child initialization function which is called from their 134 * DDI_CTLOPS_INITCHILD operation on the bus_ctl function pointer. This goes 135 * through and initializes a large number of different pieces of PCIe-based 136 * settings through the common pcie_initchild() function. This takes care of 137 * things like: 138 * 139 * o Advanced Error Reporting 140 * o Alternative Routing 141 * o Capturing information around link speed, width, serial numbers, etc. 142 * o Setting common properties around aborts 143 * 144 * There are a few caveats with this that need to be kept in mind: 145 * 146 * o A dev_info_t indicates a specific function. This means that a 147 * multi-function device will not all be initialized at the same time and 148 * there is no guarantee that all children will be initialized before one of 149 * them is attached. 150 * o A child is only initialized if we have found a driver that matches an 151 * alias in the dev_info_t's compatible array property. While a lot of 152 * multi-function devices are often multiple instances of the same thing 153 * (e.g. a multi-port NIC with a function / NIC), this is not always the 154 * case and one cannot make any assumptions here. 155 * 156 * This in turn leads to the next form of initialization that takes place in the 157 * case of (4). This is where we take care of things that need to be consistent 158 * across either entire devices or more generally across an entire root port and 159 * all of its children. There are a few different examples of this: 160 * 161 * o Setting the maximum packet size 162 * o Determining the tag width 163 * 164 * Note that features which are only based on function 0, such as ASPM (Active 165 * State Power Management), hardware autonomous width disable, etc. ultimately 166 * do not go through this path today. There are some implications here in that 167 * today several of these things are captured on functions which may not have 168 * any control here. This is an area of needed improvement. 169 * 170 * The settings in (4) are initialized in a common way, via 171 * pcie_fabric_setup(). This is called into from two different parts of 172 * the stack: 173 * 174 * 1. When we attach a root port, which is driven by pcieb. 175 * 2. When we have a hotplug event that adds a device. 176 * 177 * In general here we are going to use the term 'fabric' to refer to everything 178 * that is downstream of a root port. This corresponds to what the PCIe 179 * specification calls a 'hierarchy domain'. Strictly speaking, this is fine 180 * until peer-to-peer requests begin to happen that cause you to need to forward 181 * things across root ports. At that point the scope of the fabric increases and 182 * these settings become more complicated. We currently optimize for the much 183 * more common case, which is that each root port is effectively independent 184 * from a PCIe transaction routing perspective. 185 * 186 * Put differently, we use the term 'fabric' to refer to a set of PCIe devices 187 * that can route transactions to one another, which is generally constrained to 188 * everything under a root port and that root ports are independent. If this 189 * constraint changes, then all one needs to do is replace the discussion of the 190 * root port below with the broader root complex and system. 191 * 192 * A challenge with these settings is that once they're set and devices are 193 * actively making requests, we cannot really change them without resetting the 194 * links and cancelling all outstanding transactions via device resets. Because 195 * this is not something that we want to do, we instead look at how and when we 196 * set this to constrain what's going on. 197 * 198 * Because of this we basically say that if a given fabric has more than one 199 * hot-plug capable device that's encountered, then we have to use safe defaults 200 * (which we can allow an operator to tune eventually via pcieadm). If we have a 201 * mix of non-hotpluggable slots with downstream endpoints present and 202 * hot-pluggable slots, then we're in this case. If we don't have hot-pluggable 203 * slots, then we can have an arbitrarily complex setup. Let's look at a few of 204 * these visually: 205 * 206 * In the following diagrams, RP stands for Root Port, EP stands for Endpoint. 207 * If something is hot-pluggable, then we label it with (HP). 208 * 209 * (1) RP --> EP 210 * (2) RP --> Switch --> EP 211 * +--> EP 212 * +--> EP 213 * 214 * (3) RP --> Switch --> EP 215 * +--> EP 216 * +--> Switch --> EP 217 * +--> EP 218 * +--> EP 219 * 220 * 221 * (4) RP (HP) --> EP 222 * (5) RP (HP) --> Switch --> EP 223 * +--> EP 224 * +--> EP 225 * 226 * (6) RP --> Switch (HP) --> EP 227 * (7) RP (HP) --> Switch (HP) --> EP 228 * 229 * If we look at all of these, these are all cases where it's safe for us to set 230 * things based on all devices. (1), (2), and (3) are straightforward because 231 * they have no hot-pluggable elements. This means that nothing should come/go 232 * on the system and we can set up fabric-wide properties as part of the root 233 * port. 234 * 235 * Case (4) is the most standard one that we encounter for hot-plug. Here you 236 * have a root port directly connected to an endpoint. The most common example 237 * would be an NVMe device plugged into a root port. Case (5) is interesting to 238 * highlight. While there is a switch and multiple endpoints there, they are 239 * showing up as a unit. This ends up being a weirder variant of (4), but it is 240 * safe for us to set advanced properties because we can figure out what the 241 * total set should be. 242 * 243 * Now, the more interesting bits here are (6) and (7). The reason that (6) 244 * works is that ultimately there is only a single down-stream port here that is 245 * hot-pluggable and all non-hotpluggable ports do not have a device present, 246 * which suggests that they will never have a device present. (7) also could be 247 * made to work by making the observation that if there's truly only one 248 * endpoint in a fabric, it doesn't matter how many switches there are that are 249 * hot-pluggable. This would only hold if we can assume for some reason that no 250 * other endpoints could be added. 251 * 252 * In turn, let's look at several cases that we believe aren't safe: 253 * 254 * (8) RP --> Switch --> EP 255 * +--> EP 256 * (HP) +--> EP 257 * 258 * (9) RP --> Switch (HP) +--> EP 259 * (HP) +--> EP 260 * 261 * (10) RP (HP) --> Switch (HP) +--> EP 262 * (HP) +--> EP 263 * 264 * All of these are situations where it's much more explicitly unsafe. Let's 265 * take (8). The problem here is that the devices on the non-hotpluggable 266 * downstream switches are always there and we should assume all device drivers 267 * will be active and performing I/O when the hot-pluggable slot changes. If the 268 * hot-pluggable slot has a lower max payload size, then we're mostly out of 269 * luck. The case of (9) is very similar to (8), just that we have more hot-plug 270 * capable slots. 271 * 272 * Finally (10) is a case of multiple instances of hotplug. (9) and (10) are the 273 * more general case of (6) and (7). While we can try to detect (6) and (7) more 274 * generally or try to make it safe, we're going to start with a simpler form of 275 * detection for this, which roughly follows the following rules: 276 * 277 * o If there are no hot-pluggable slots in an entire fabric, then we can set 278 * all fabric properties based on device capabilities. 279 * o If we encounter a hot-pluggable slot, we can only set fabric properties 280 * based on device capabilities if: 281 * 282 * 1. The hotpluggable slot is a root port. 283 * 2. There are no other hotpluggable devices downstream of it. 284 * 285 * Otherwise, if neither of the above is true, then we must use the basic PCIe 286 * defaults for various fabric-wide properties (discussed below). Even in these 287 * more complicated cases, device-specific properties such as the configuration 288 * of AERs, ASPM, etc. are still handled in the general pcie_init_bus() and 289 * related discussed earlier here. 290 * 291 * Because the only fabrics that we'll change are those that correspond to root 292 * ports, we will only call into the actual fabric feature setup when one of 293 * those changes. This has the side effect of simplifying locking. When we make 294 * changes here we need to be able to hold the entire device tree under the root 295 * port (including the root port and its parent). This is much harder to do 296 * safely when starting in the middle of the tree. 297 * 298 * Handling of Specific Properties 299 * ------------------------------- 300 * 301 * This section goes into the rationale behind how we initialize and program 302 * various parts of the PCIe stack. 303 * 304 * 5-, 8-, 10- AND 14-BIT TAGS 305 * 306 * Tags are part of PCIe transactions and when combined with a device identifier 307 * are used to uniquely identify a transaction. In PCIe parlance, a Requester 308 * (someone who initiates a PCIe request) sets a unique tag in the request and 309 * the Completer (someone who processes and responds to a PCIe request) echoes 310 * the tag back. This means that a requester generally is responsible for 311 * ensuring that they don't reuse a tag between transactions. 312 * 313 * Thus the number of tags that a device has relates to the number of 314 * outstanding transactions that it can have, which are usually tied to the 315 * number of outstanding DMA transfers. The size of these transactions is also 316 * then scoped by the handling of the Maximum Packet Payload. 317 * 318 * In PCIe 1.0, devices default to a 5-bit tag. There was also an option to 319 * support an 8-bit tag. The 8-bit extended tag did not distinguish between a 320 * Requester or Completer. There was a bit to indicate device support of 8-bit 321 * tags in the Device Capabilities Register of the PCIe Capability and a 322 * separate bit to enable it in the Device Control Register of the PCIe 323 * Capability. 324 * 325 * In PCIe 4.0, support for a 10-bit tag was added. The specification broke 326 * apart the support bit into multiple pieces. In particular, in the Device 327 * Capabilities 2 register of the PCIe Capability there is a separate bit to 328 * indicate whether the device supports 10-bit completions and 10-bit requests. 329 * All PCIe 4.0 compliant devices are required to support 10-bit tags if they 330 * operate at 16.0 GT/s speed (a PCIe Gen 4 compliant device does not have to 331 * operate at Gen 4 speeds). 332 * 333 * This allows a device to support 10-bit completions but not 10-bit requests. 334 * A device that supports 10-bit requests is required to support 10-bit 335 * completions. There is no ability to enable or disable 10-bit completion 336 * support in the Device Capabilities 2 register. There is only a bit to enable 337 * 10-bit requests. This distinction makes our life easier as this means that as 338 * long as the entire fabric supports 10-bit completions, it doesn't matter if 339 * not all devices support 10-bit requests and we can enable them as required. 340 * More on this in a bit. 341 * 342 * In PCIe 6.0, another set of bits was added for 14-bit tags. These follow the 343 * same pattern as the 10-bit tags. The biggest difference is that the 344 * capabilities and control for these are found in the Device Capabilities 3 345 * and Device Control 3 register of the Device 3 Extended Capability. Similar to 346 * what we see with 10-bit tags, requesters are required to support the 347 * completer capability. The only control bit is for whether or not they enable 348 * a 14-bit requester. 349 * 350 * PCIe switches which sit between root ports and endpoints and show up to 351 * software as a set of bridges. Bridges generally don't have to know about tags 352 * as they are usually neither requesters or completers (unless directly talking 353 * to the bridge instance). That is they are generally required to forward 354 * packets without modifying them. This works until we deal with switch error 355 * handling. At that point, the switch may try to interpret the transaction and 356 * if it doesn't understand the tagging scheme in use, return the transaction to 357 * with the wrong tag and also an incorrectly diagnosed error (usually a 358 * malformed TLP). 359 * 360 * With all this, we construct a somewhat simple policy of how and when we 361 * enable extended tags: 362 * 363 * o If we have a complex hotplug-capable fabric (based on the discussion 364 * earlier in fabric-specific settings), then we cannot enable any of the 365 * 8-bit, 10-bit, and 14-bit tagging features. This is due to the issues 366 * with intermediate PCIe switches and related. 367 * 368 * o If every device supports 8-bit capable tags, then we will go through and 369 * enable those everywhere. 370 * 371 * o If every device supports 10-bit capable completions, then we will enable 372 * 10-bit requester on every device that supports it. 373 * 374 * o If every device supports 14-bit capable completions, then we will enable 375 * 14-bit requesters on every device that supports it. 376 * 377 * This is the simpler end of the policy and one that is relatively easy to 378 * implement. While we could attempt to relax the constraint that every device 379 * in the fabric implement these features by making assumptions about peer-to- 380 * peer requests (that is devices at the same layer in the tree won't talk to 381 * one another), that is a lot of complexity. For now, we leave such an 382 * implementation to those who need it in the future. 383 * 384 * MAX PAYLOAD SIZE 385 * 386 * When performing transactions on the PCIe bus, a given transaction has a 387 * maximum allowed size. This size is called the MPS or 'Maximum Payload Size'. 388 * A given device reports its maximum supported size in the Device Capabilities 389 * register of the PCIe Capability. It is then set in the Device Control 390 * register. 391 * 392 * One of the challenges with this value is that different functions of a device 393 * have independent values, but strictly speaking are required to actually have 394 * the same value programmed in all of them lest device behavior goes awry. When 395 * a device has the ARI (alternative routing ID) capability enabled, then only 396 * function 0 controls the actual payload size. 397 * 398 * The settings for this need to be consistent throughout the fabric. A 399 * Transmitter is not allowed to create a TLP that exceeds its maximum packet 400 * size and a Receiver is not allowed to receive a packet that exceeds its 401 * maximum packet size. In all of these cases, this would result in something 402 * like a malformed TLP error. 403 * 404 * Effectively, this means that everything on a given fabric must have the same 405 * value programmed in its Device Control register for this value. While in the 406 * case of tags, switches generally weren't completers or requesters, here every 407 * device along the path is subject to this. This makes the actual value that we 408 * set throughout the fabric even more important and the constraints of hotplug 409 * even worse to deal with. 410 * 411 * Because a hotplug device can be inserted with any packet size, if we hit 412 * anything other than the simple hotplug cases discussed in the fabric-specific 413 * settings section, then we must use the smallest size of 128 byte payloads. 414 * This is because a device could be plugged in that supports something smaller 415 * than we had otherwise set. If there are other active devices, those could not 416 * be changed without quiescing the entire fabric. As such our algorithm is as 417 * follows: 418 * 419 * 1. Scan the entire fabric, keeping track of the smallest seen MPS in the 420 * Device Capabilities Register. 421 * 2. If we have a complex fabric, program each Device Control register with 422 * a 128 byte maximum payload size, otherwise, program it with the 423 * discovered value. 424 * 425 * 426 * MAX READ REQUEST SIZE 427 * 428 * The maximum read request size (mrrs) is a much more confusing thing when 429 * compared to the maximum payload size counterpart. The maximum payload size 430 * (MPS) above is what restricts the actual size of a TLP. The mrrs value 431 * is used to control part of the behavior of Memory Read Request, which is not 432 * strictly speaking subject to the MPS. A PCIe device is allowed to respond to 433 * a Memory Read Request with less bytes than were actually requested in a 434 * single completion. In general, the default size that a root complex and its 435 * root port will reply to are based around the length of a cache line. 436 * 437 * What this ultimately controls is the number of requests that the Requester 438 * has to make and trades off bandwidth, bus sharing, and related here. For 439 * example, if the maximum read request size is 4 KiB, then the requester would 440 * only issue a single read request asking for 4 KiB. It would still receive 441 * these as multiple packets in units of the MPS. If however, the maximum read 442 * request was only say 512 B, then it would need to make 8 separate requests, 443 * potentially increasing latency. On the other hand, if systems are relying on 444 * total requests for QoS, then it's important to set it to something that's 445 * closer to the actual MPS. 446 * 447 * Traditionally, the OS has not been the most straightforward about this. It's 448 * important to remember that setting this up is also somewhat in the realm of 449 * system firmware. Due to the PCI Firmware specification, the firmware may have 450 * set up a value for not just the MRRS but also the MPS. As such, our logic 451 * basically left the MRRS alone and used whatever the device had there as long 452 * as we weren't shrinking the device's MPS. If we were, then we'd set it to the 453 * MPS. If the device was a root port, then it was just left at a system wide 454 * and PCIe default of 512 bytes. 455 * 456 * If we survey firmware (which isn't easy due to its nature), we have seen most 457 * cases where the firmware just doesn't do anything and leaves it to the 458 * device's default, which is basically just the PCIe default, unless it has a 459 * specific knowledge of something like say wanting to do something for an NVMe 460 * device. The same is generally true of other systems, leaving it at its 461 * default unless otherwise set by a device driver. 462 * 463 * Because this value doesn't really have the same constraints as other fabric 464 * properties, this becomes much simpler and we instead opt to set it as part of 465 * the device node initialization. In addition, there are no real rules about 466 * different functions having different values here as it doesn't really impact 467 * the TLP processing the same way that the MPS does. 468 * 469 * While we should add a fuller way of setting this and allowing operator 470 * override of the MRRS based on things like device class, etc. that is driven 471 * by pcieadm, that is left to the future. For now we opt to that all devices 472 * are kept at their default (512 bytes or whatever firmware left behind) and we 473 * ensure that root ports always have the mrrs set to 512. 474 */ 475 476 #include <sys/sysmacros.h> 477 #include <sys/types.h> 478 #include <sys/kmem.h> 479 #include <sys/modctl.h> 480 #include <sys/ddi.h> 481 #include <sys/sunddi.h> 482 #include <sys/sunndi.h> 483 #include <sys/fm/protocol.h> 484 #include <sys/fm/util.h> 485 #include <sys/promif.h> 486 #include <sys/disp.h> 487 #include <sys/stat.h> 488 #include <sys/file.h> 489 #include <sys/pci_cap.h> 490 #include <sys/pci_impl.h> 491 #include <sys/pcie_impl.h> 492 #include <sys/hotplug/pci/pcie_hp.h> 493 #include <sys/hotplug/pci/pciehpc.h> 494 #include <sys/hotplug/pci/pcishpc.h> 495 #include <sys/hotplug/pci/pcicfg.h> 496 #include <sys/pci_cfgacc.h> 497 #include <sys/sysevent.h> 498 #include <sys/sysevent/eventdefs.h> 499 #include <sys/sysevent/pcie.h> 500 501 /* Local functions prototypes */ 502 static void pcie_init_pfd(dev_info_t *); 503 static void pcie_fini_pfd(dev_info_t *); 504 505 #ifdef DEBUG 506 uint_t pcie_debug_flags = 0; 507 static void pcie_print_bus(pcie_bus_t *bus_p); 508 void pcie_dbg(char *fmt, ...); 509 #endif /* DEBUG */ 510 511 /* Variable to control default PCI-Express config settings */ 512 ushort_t pcie_command_default = 513 PCI_COMM_SERR_ENABLE | 514 PCI_COMM_WAIT_CYC_ENAB | 515 PCI_COMM_PARITY_DETECT | 516 PCI_COMM_ME | 517 PCI_COMM_MAE | 518 PCI_COMM_IO; 519 520 /* xxx_fw are bits that are controlled by FW and should not be modified */ 521 ushort_t pcie_command_default_fw = 522 PCI_COMM_SPEC_CYC | 523 PCI_COMM_MEMWR_INVAL | 524 PCI_COMM_PALETTE_SNOOP | 525 PCI_COMM_WAIT_CYC_ENAB | 526 0xF800; /* Reserved Bits */ 527 528 ushort_t pcie_bdg_command_default_fw = 529 PCI_BCNF_BCNTRL_ISA_ENABLE | 530 PCI_BCNF_BCNTRL_VGA_ENABLE | 531 0xF000; /* Reserved Bits */ 532 533 /* PCI-Express Base error defaults */ 534 ushort_t pcie_base_err_default = 535 PCIE_DEVCTL_CE_REPORTING_EN | 536 PCIE_DEVCTL_NFE_REPORTING_EN | 537 PCIE_DEVCTL_FE_REPORTING_EN | 538 PCIE_DEVCTL_UR_REPORTING_EN; 539 540 /* PCI-Express Device Control Register */ 541 uint16_t pcie_devctl_default = PCIE_DEVCTL_RO_EN | 542 PCIE_DEVCTL_MAX_READ_REQ_512; 543 544 /* PCI-Express AER Root Control Register */ 545 #define PCIE_ROOT_SYS_ERR (PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | \ 546 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | \ 547 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN) 548 549 ushort_t pcie_root_ctrl_default = 550 PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | 551 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | 552 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN; 553 554 /* PCI-Express Root Error Command Register */ 555 ushort_t pcie_root_error_cmd_default = 556 PCIE_AER_RE_CMD_CE_REP_EN | 557 PCIE_AER_RE_CMD_NFE_REP_EN | 558 PCIE_AER_RE_CMD_FE_REP_EN; 559 560 /* ECRC settings in the PCIe AER Control Register */ 561 uint32_t pcie_ecrc_value = 562 PCIE_AER_CTL_ECRC_GEN_ENA | 563 PCIE_AER_CTL_ECRC_CHECK_ENA; 564 565 /* 566 * If a particular platform wants to disable certain errors such as UR/MA, 567 * instead of using #defines have the platform's PCIe Root Complex driver set 568 * these masks using the pcie_get_XXX_mask and pcie_set_XXX_mask functions. For 569 * x86 the closest thing to a PCIe root complex driver is NPE. For SPARC the 570 * closest PCIe root complex driver is PX. 571 * 572 * pcie_serr_disable_flag : disable SERR only (in RCR and command reg) x86 573 * systems may want to disable SERR in general. For root ports, enabling SERR 574 * causes NMIs which are not handled and results in a watchdog timeout error. 575 */ 576 uint32_t pcie_aer_uce_mask = 0; /* AER UE Mask */ 577 uint32_t pcie_aer_ce_mask = 0; /* AER CE Mask */ 578 uint32_t pcie_aer_suce_mask = 0; /* AER Secondary UE Mask */ 579 uint32_t pcie_serr_disable_flag = 0; /* Disable SERR */ 580 581 /* Default severities needed for eversholt. Error handling doesn't care */ 582 uint32_t pcie_aer_uce_severity = PCIE_AER_UCE_MTLP | PCIE_AER_UCE_RO | \ 583 PCIE_AER_UCE_FCP | PCIE_AER_UCE_SD | PCIE_AER_UCE_DLP | \ 584 PCIE_AER_UCE_TRAINING; 585 uint32_t pcie_aer_suce_severity = PCIE_AER_SUCE_SERR_ASSERT | \ 586 PCIE_AER_SUCE_UC_ADDR_ERR | PCIE_AER_SUCE_UC_ATTR_ERR | \ 587 PCIE_AER_SUCE_USC_MSG_DATA_ERR; 588 589 int pcie_disable_ari = 0; 590 591 /* 592 * On some platforms, such as the AMD B450 chipset, we've seen an odd 593 * relationship between enabling link bandwidth notifications and AERs about 594 * ECRC errors. This provides a mechanism to disable it. 595 */ 596 int pcie_disable_lbw = 0; 597 598 /* 599 * Amount of time to wait for an in-progress retraining. The default is to try 600 * 500 times in 10ms chunks, thus a total of 5s. 601 */ 602 uint32_t pcie_link_retrain_count = 500; 603 uint32_t pcie_link_retrain_delay_ms = 10; 604 605 taskq_t *pcie_link_tq; 606 kmutex_t pcie_link_tq_mutex; 607 608 static int pcie_link_bw_intr(dev_info_t *); 609 static void pcie_capture_speeds(dev_info_t *); 610 611 dev_info_t *pcie_get_rc_dip(dev_info_t *dip); 612 613 /* 614 * modload support 615 */ 616 617 static struct modlmisc modlmisc = { 618 &mod_miscops, /* Type of module */ 619 "PCI Express Framework Module" 620 }; 621 622 static struct modlinkage modlinkage = { 623 MODREV_1, 624 (void *)&modlmisc, 625 NULL 626 }; 627 628 /* 629 * Global Variables needed for a non-atomic version of ddi_fm_ereport_post. 630 * Currently used to send the pci.fabric ereports whose payload depends on the 631 * type of PCI device it is being sent for. 632 */ 633 char *pcie_nv_buf; 634 nv_alloc_t *pcie_nvap; 635 nvlist_t *pcie_nvl; 636 637 int 638 _init(void) 639 { 640 int rval; 641 642 pcie_nv_buf = kmem_alloc(ERPT_DATA_SZ, KM_SLEEP); 643 pcie_nvap = fm_nva_xcreate(pcie_nv_buf, ERPT_DATA_SZ); 644 pcie_nvl = fm_nvlist_create(pcie_nvap); 645 mutex_init(&pcie_link_tq_mutex, NULL, MUTEX_DRIVER, NULL); 646 647 if ((rval = mod_install(&modlinkage)) != 0) { 648 mutex_destroy(&pcie_link_tq_mutex); 649 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 650 fm_nva_xdestroy(pcie_nvap); 651 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 652 } 653 return (rval); 654 } 655 656 int 657 _fini() 658 { 659 int rval; 660 661 if ((rval = mod_remove(&modlinkage)) == 0) { 662 if (pcie_link_tq != NULL) { 663 taskq_destroy(pcie_link_tq); 664 } 665 mutex_destroy(&pcie_link_tq_mutex); 666 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 667 fm_nva_xdestroy(pcie_nvap); 668 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 669 } 670 return (rval); 671 } 672 673 int 674 _info(struct modinfo *modinfop) 675 { 676 return (mod_info(&modlinkage, modinfop)); 677 } 678 679 /* ARGSUSED */ 680 int 681 pcie_init(dev_info_t *dip, caddr_t arg) 682 { 683 int ret = DDI_SUCCESS; 684 685 /* 686 * Our _init function is too early to create a taskq. Create the pcie 687 * link management taskq here now instead. 688 */ 689 mutex_enter(&pcie_link_tq_mutex); 690 if (pcie_link_tq == NULL) { 691 pcie_link_tq = taskq_create("pcie_link", 1, minclsyspri, 0, 0, 692 0); 693 } 694 mutex_exit(&pcie_link_tq_mutex); 695 696 697 /* 698 * Create a "devctl" minor node to support DEVCTL_DEVICE_* 699 * and DEVCTL_BUS_* ioctls to this bus. 700 */ 701 if ((ret = ddi_create_minor_node(dip, "devctl", S_IFCHR, 702 PCI_MINOR_NUM(ddi_get_instance(dip), PCI_DEVCTL_MINOR), 703 DDI_NT_NEXUS, 0)) != DDI_SUCCESS) { 704 PCIE_DBG("Failed to create devctl minor node for %s%d\n", 705 ddi_driver_name(dip), ddi_get_instance(dip)); 706 707 return (ret); 708 } 709 710 if ((ret = pcie_hp_init(dip, arg)) != DDI_SUCCESS) { 711 /* 712 * On some x86 platforms, we observed unexpected hotplug 713 * initialization failures in recent years. The known cause 714 * is a hardware issue: while the problem PCI bridges have 715 * the Hotplug Capable registers set, the machine actually 716 * does not implement the expected ACPI object. 717 * 718 * We don't want to stop PCI driver attach and system boot 719 * just because of this hotplug initialization failure. 720 * Continue with a debug message printed. 721 */ 722 PCIE_DBG("%s%d: Failed setting hotplug framework\n", 723 ddi_driver_name(dip), ddi_get_instance(dip)); 724 725 #if defined(__sparc) 726 ddi_remove_minor_node(dip, "devctl"); 727 728 return (ret); 729 #endif /* defined(__sparc) */ 730 } 731 732 return (DDI_SUCCESS); 733 } 734 735 /* ARGSUSED */ 736 int 737 pcie_uninit(dev_info_t *dip) 738 { 739 int ret = DDI_SUCCESS; 740 741 if (pcie_ari_is_enabled(dip) == PCIE_ARI_FORW_ENABLED) 742 (void) pcie_ari_disable(dip); 743 744 if ((ret = pcie_hp_uninit(dip)) != DDI_SUCCESS) { 745 PCIE_DBG("Failed to uninitialize hotplug for %s%d\n", 746 ddi_driver_name(dip), ddi_get_instance(dip)); 747 748 return (ret); 749 } 750 751 if (pcie_link_bw_supported(dip)) { 752 (void) pcie_link_bw_disable(dip); 753 } 754 755 ddi_remove_minor_node(dip, "devctl"); 756 757 return (ret); 758 } 759 760 /* 761 * PCIe module interface for enabling hotplug interrupt. 762 * 763 * It should be called after pcie_init() is done and bus driver's 764 * interrupt handlers have being attached. 765 */ 766 int 767 pcie_hpintr_enable(dev_info_t *dip) 768 { 769 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 770 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 771 772 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 773 (void) (ctrl_p->hc_ops.enable_hpc_intr)(ctrl_p); 774 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 775 (void) pcishpc_enable_irqs(ctrl_p); 776 } 777 return (DDI_SUCCESS); 778 } 779 780 /* 781 * PCIe module interface for disabling hotplug interrupt. 782 * 783 * It should be called before pcie_uninit() is called and bus driver's 784 * interrupt handlers is dettached. 785 */ 786 int 787 pcie_hpintr_disable(dev_info_t *dip) 788 { 789 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 790 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 791 792 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 793 (void) (ctrl_p->hc_ops.disable_hpc_intr)(ctrl_p); 794 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 795 (void) pcishpc_disable_irqs(ctrl_p); 796 } 797 return (DDI_SUCCESS); 798 } 799 800 /* ARGSUSED */ 801 int 802 pcie_intr(dev_info_t *dip) 803 { 804 int hp, lbw; 805 806 hp = pcie_hp_intr(dip); 807 lbw = pcie_link_bw_intr(dip); 808 809 if (hp == DDI_INTR_CLAIMED || lbw == DDI_INTR_CLAIMED) { 810 return (DDI_INTR_CLAIMED); 811 } 812 813 return (DDI_INTR_UNCLAIMED); 814 } 815 816 /* ARGSUSED */ 817 int 818 pcie_open(dev_info_t *dip, dev_t *devp, int flags, int otyp, cred_t *credp) 819 { 820 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 821 822 /* 823 * Make sure the open is for the right file type. 824 */ 825 if (otyp != OTYP_CHR) 826 return (EINVAL); 827 828 /* 829 * Handle the open by tracking the device state. 830 */ 831 if ((bus_p->bus_soft_state == PCI_SOFT_STATE_OPEN_EXCL) || 832 ((flags & FEXCL) && 833 (bus_p->bus_soft_state != PCI_SOFT_STATE_CLOSED))) { 834 return (EBUSY); 835 } 836 837 if (flags & FEXCL) 838 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN_EXCL; 839 else 840 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN; 841 842 return (0); 843 } 844 845 /* ARGSUSED */ 846 int 847 pcie_close(dev_info_t *dip, dev_t dev, int flags, int otyp, cred_t *credp) 848 { 849 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 850 851 if (otyp != OTYP_CHR) 852 return (EINVAL); 853 854 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 855 856 return (0); 857 } 858 859 /* ARGSUSED */ 860 int 861 pcie_ioctl(dev_info_t *dip, dev_t dev, int cmd, intptr_t arg, int mode, 862 cred_t *credp, int *rvalp) 863 { 864 struct devctl_iocdata *dcp; 865 uint_t bus_state; 866 int rv = DDI_SUCCESS; 867 868 /* 869 * We can use the generic implementation for devctl ioctl 870 */ 871 switch (cmd) { 872 case DEVCTL_DEVICE_GETSTATE: 873 case DEVCTL_DEVICE_ONLINE: 874 case DEVCTL_DEVICE_OFFLINE: 875 case DEVCTL_BUS_GETSTATE: 876 return (ndi_devctl_ioctl(dip, cmd, arg, mode, 0)); 877 default: 878 break; 879 } 880 881 /* 882 * read devctl ioctl data 883 */ 884 if (ndi_dc_allochdl((void *)arg, &dcp) != NDI_SUCCESS) 885 return (EFAULT); 886 887 switch (cmd) { 888 case DEVCTL_BUS_QUIESCE: 889 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 890 if (bus_state == BUS_QUIESCED) 891 break; 892 (void) ndi_set_bus_state(dip, BUS_QUIESCED); 893 break; 894 case DEVCTL_BUS_UNQUIESCE: 895 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 896 if (bus_state == BUS_ACTIVE) 897 break; 898 (void) ndi_set_bus_state(dip, BUS_ACTIVE); 899 break; 900 case DEVCTL_BUS_RESET: 901 case DEVCTL_BUS_RESETALL: 902 case DEVCTL_DEVICE_RESET: 903 rv = ENOTSUP; 904 break; 905 default: 906 rv = ENOTTY; 907 } 908 909 ndi_dc_freehdl(dcp); 910 return (rv); 911 } 912 913 /* ARGSUSED */ 914 int 915 pcie_prop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op, 916 int flags, char *name, caddr_t valuep, int *lengthp) 917 { 918 if (dev == DDI_DEV_T_ANY) 919 goto skip; 920 921 if (PCIE_IS_HOTPLUG_CAPABLE(dip) && 922 strcmp(name, "pci-occupant") == 0) { 923 int pci_dev = PCI_MINOR_NUM_TO_PCI_DEVNUM(getminor(dev)); 924 925 pcie_hp_create_occupant_props(dip, dev, pci_dev); 926 } 927 928 skip: 929 return (ddi_prop_op(dev, dip, prop_op, flags, name, valuep, lengthp)); 930 } 931 932 int 933 pcie_init_cfghdl(dev_info_t *cdip) 934 { 935 pcie_bus_t *bus_p; 936 ddi_acc_handle_t eh = NULL; 937 938 bus_p = PCIE_DIP2BUS(cdip); 939 if (bus_p == NULL) 940 return (DDI_FAILURE); 941 942 /* Create an config access special to error handling */ 943 if (pci_config_setup(cdip, &eh) != DDI_SUCCESS) { 944 cmn_err(CE_WARN, "Cannot setup config access" 945 " for BDF 0x%x\n", bus_p->bus_bdf); 946 return (DDI_FAILURE); 947 } 948 949 bus_p->bus_cfg_hdl = eh; 950 return (DDI_SUCCESS); 951 } 952 953 void 954 pcie_fini_cfghdl(dev_info_t *cdip) 955 { 956 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 957 958 pci_config_teardown(&bus_p->bus_cfg_hdl); 959 } 960 961 void 962 pcie_determine_serial(dev_info_t *dip) 963 { 964 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 965 ddi_acc_handle_t h; 966 uint16_t cap; 967 uchar_t serial[8]; 968 uint32_t low, high; 969 970 if (!PCIE_IS_PCIE(bus_p)) 971 return; 972 973 h = bus_p->bus_cfg_hdl; 974 975 if ((PCI_CAP_LOCATE(h, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_SER), &cap)) == 976 DDI_FAILURE) 977 return; 978 979 high = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_UPPER_DW); 980 low = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_LOWER_DW); 981 982 /* 983 * Here, we're trying to figure out if we had an invalid PCIe read. From 984 * looking at the contents of the value, it can be hard to tell the 985 * difference between a value that has all 1s correctly versus if we had 986 * an error. In this case, we only assume it's invalid if both register 987 * reads are invalid. We also only use 32-bit reads as we're not sure if 988 * all devices will support these as 64-bit reads, while we know that 989 * they'll support these as 32-bit reads. 990 */ 991 if (high == PCI_EINVAL32 && low == PCI_EINVAL32) 992 return; 993 994 serial[0] = low & 0xff; 995 serial[1] = (low >> 8) & 0xff; 996 serial[2] = (low >> 16) & 0xff; 997 serial[3] = (low >> 24) & 0xff; 998 serial[4] = high & 0xff; 999 serial[5] = (high >> 8) & 0xff; 1000 serial[6] = (high >> 16) & 0xff; 1001 serial[7] = (high >> 24) & 0xff; 1002 1003 (void) ndi_prop_update_byte_array(DDI_DEV_T_NONE, dip, "pcie-serial", 1004 serial, sizeof (serial)); 1005 } 1006 1007 static void 1008 pcie_determine_aspm(dev_info_t *dip) 1009 { 1010 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1011 uint32_t linkcap; 1012 uint16_t linkctl; 1013 1014 if (!PCIE_IS_PCIE(bus_p)) 1015 return; 1016 1017 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1018 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 1019 1020 switch (linkcap & PCIE_LINKCAP_ASPM_SUP_MASK) { 1021 case PCIE_LINKCAP_ASPM_SUP_L0S: 1022 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1023 "pcie-aspm-support", "l0s"); 1024 break; 1025 case PCIE_LINKCAP_ASPM_SUP_L1: 1026 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1027 "pcie-aspm-support", "l1"); 1028 break; 1029 case PCIE_LINKCAP_ASPM_SUP_L0S_L1: 1030 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1031 "pcie-aspm-support", "l0s,l1"); 1032 break; 1033 default: 1034 return; 1035 } 1036 1037 switch (linkctl & PCIE_LINKCTL_ASPM_CTL_MASK) { 1038 case PCIE_LINKCTL_ASPM_CTL_DIS: 1039 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1040 "pcie-aspm-state", "disabled"); 1041 break; 1042 case PCIE_LINKCTL_ASPM_CTL_L0S: 1043 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1044 "pcie-aspm-state", "l0s"); 1045 break; 1046 case PCIE_LINKCTL_ASPM_CTL_L1: 1047 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1048 "pcie-aspm-state", "l1"); 1049 break; 1050 case PCIE_LINKCTL_ASPM_CTL_L0S_L1: 1051 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1052 "pcie-aspm-state", "l0s,l1"); 1053 break; 1054 } 1055 } 1056 1057 /* 1058 * PCI-Express child device initialization. Note, this only will be called on a 1059 * device or function if we actually attach a device driver to it. 1060 * 1061 * This function enables generic pci-express interrupts and error handling. 1062 * Note, tagging, the max packet size, and related are all set up before this 1063 * point and is performed in pcie_fabric_setup(). 1064 * 1065 * @param pdip root dip (root nexus's dip) 1066 * @param cdip child's dip (device's dip) 1067 * @return DDI_SUCCESS or DDI_FAILURE 1068 */ 1069 /* ARGSUSED */ 1070 int 1071 pcie_initchild(dev_info_t *cdip) 1072 { 1073 uint16_t tmp16, reg16; 1074 pcie_bus_t *bus_p; 1075 uint32_t devid, venid; 1076 1077 bus_p = PCIE_DIP2BUS(cdip); 1078 if (bus_p == NULL) { 1079 PCIE_DBG("%s: BUS not found.\n", 1080 ddi_driver_name(cdip)); 1081 1082 return (DDI_FAILURE); 1083 } 1084 1085 if (pcie_init_cfghdl(cdip) != DDI_SUCCESS) 1086 return (DDI_FAILURE); 1087 1088 /* 1089 * Update pcie_bus_t with real Vendor Id Device Id. 1090 * 1091 * For assigned devices in IOV environment, the OBP will return 1092 * faked device id/vendor id on configration read and for both 1093 * properties in root domain. translate_devid() function will 1094 * update the properties with real device-id/vendor-id on such 1095 * platforms, so that we can utilize the properties here to get 1096 * real device-id/vendor-id and overwrite the faked ids. 1097 * 1098 * For unassigned devices or devices in non-IOV environment, the 1099 * operation below won't make a difference. 1100 * 1101 * The IOV implementation only supports assignment of PCIE 1102 * endpoint devices. Devices under pci-pci bridges don't need 1103 * operation like this. 1104 */ 1105 devid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1106 "device-id", -1); 1107 venid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1108 "vendor-id", -1); 1109 bus_p->bus_dev_ven_id = (devid << 16) | (venid & 0xffff); 1110 1111 /* Clear the device's status register */ 1112 reg16 = PCIE_GET(16, bus_p, PCI_CONF_STAT); 1113 PCIE_PUT(16, bus_p, PCI_CONF_STAT, reg16); 1114 1115 /* Setup the device's command register */ 1116 reg16 = PCIE_GET(16, bus_p, PCI_CONF_COMM); 1117 tmp16 = (reg16 & pcie_command_default_fw) | pcie_command_default; 1118 1119 if (pcie_serr_disable_flag && PCIE_IS_PCIE(bus_p)) 1120 tmp16 &= ~PCI_COMM_SERR_ENABLE; 1121 1122 PCIE_PUT(16, bus_p, PCI_CONF_COMM, tmp16); 1123 PCIE_DBG_CFG(cdip, bus_p, "COMMAND", 16, PCI_CONF_COMM, reg16); 1124 1125 /* 1126 * If the device has a bus control register then program it 1127 * based on the settings in the command register. 1128 */ 1129 if (PCIE_IS_BDG(bus_p)) { 1130 /* Clear the device's secondary status register */ 1131 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_SEC_STATUS); 1132 PCIE_PUT(16, bus_p, PCI_BCNF_SEC_STATUS, reg16); 1133 1134 /* Setup the device's secondary command register */ 1135 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_BCNTRL); 1136 tmp16 = (reg16 & pcie_bdg_command_default_fw); 1137 1138 tmp16 |= PCI_BCNF_BCNTRL_SERR_ENABLE; 1139 /* 1140 * Workaround for this Nvidia bridge. Don't enable the SERR 1141 * enable bit in the bridge control register as it could lead to 1142 * bogus NMIs. 1143 */ 1144 if (bus_p->bus_dev_ven_id == 0x037010DE) 1145 tmp16 &= ~PCI_BCNF_BCNTRL_SERR_ENABLE; 1146 1147 if (pcie_command_default & PCI_COMM_PARITY_DETECT) 1148 tmp16 |= PCI_BCNF_BCNTRL_PARITY_ENABLE; 1149 1150 /* 1151 * Enable Master Abort Mode only if URs have not been masked. 1152 * For PCI and PCIe-PCI bridges, enabling this bit causes a 1153 * Master Aborts/UR to be forwarded as a UR/TA or SERR. If this 1154 * bit is masked, posted requests are dropped and non-posted 1155 * requests are returned with -1. 1156 */ 1157 if (pcie_aer_uce_mask & PCIE_AER_UCE_UR) 1158 tmp16 &= ~PCI_BCNF_BCNTRL_MAST_AB_MODE; 1159 else 1160 tmp16 |= PCI_BCNF_BCNTRL_MAST_AB_MODE; 1161 PCIE_PUT(16, bus_p, PCI_BCNF_BCNTRL, tmp16); 1162 PCIE_DBG_CFG(cdip, bus_p, "SEC CMD", 16, PCI_BCNF_BCNTRL, 1163 reg16); 1164 } 1165 1166 if (PCIE_IS_PCIE(bus_p)) { 1167 /* Setup PCIe device control register */ 1168 reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 1169 /* note: MPS/MRRS are initialized in pcie_initchild_mps() */ 1170 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 1171 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 1172 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 1173 PCIE_DEVCTL_MAX_PAYLOAD_MASK)); 1174 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 1175 PCIE_DBG_CAP(cdip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 1176 1177 /* Enable PCIe errors */ 1178 pcie_enable_errors(cdip); 1179 1180 pcie_determine_serial(cdip); 1181 1182 pcie_determine_aspm(cdip); 1183 1184 pcie_capture_speeds(cdip); 1185 } 1186 1187 bus_p->bus_ari = B_FALSE; 1188 if ((pcie_ari_is_enabled(ddi_get_parent(cdip)) 1189 == PCIE_ARI_FORW_ENABLED) && (pcie_ari_device(cdip) 1190 == PCIE_ARI_DEVICE)) { 1191 bus_p->bus_ari = B_TRUE; 1192 } 1193 1194 return (DDI_SUCCESS); 1195 } 1196 1197 static void 1198 pcie_init_pfd(dev_info_t *dip) 1199 { 1200 pf_data_t *pfd_p = PCIE_ZALLOC(pf_data_t); 1201 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1202 1203 PCIE_DIP2PFD(dip) = pfd_p; 1204 1205 pfd_p->pe_bus_p = bus_p; 1206 pfd_p->pe_severity_flags = 0; 1207 pfd_p->pe_severity_mask = 0; 1208 pfd_p->pe_orig_severity_flags = 0; 1209 pfd_p->pe_lock = B_FALSE; 1210 pfd_p->pe_valid = B_FALSE; 1211 1212 /* Allocate the root fault struct for both RC and RP */ 1213 if (PCIE_IS_ROOT(bus_p)) { 1214 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1215 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1216 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1217 } 1218 1219 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1220 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1221 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1222 1223 if (PCIE_IS_BDG(bus_p)) 1224 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1225 1226 if (PCIE_IS_PCIE(bus_p)) { 1227 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1228 1229 if (PCIE_IS_RP(bus_p)) 1230 PCIE_RP_REG(pfd_p) = 1231 PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1232 1233 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1234 PCIE_ADV_REG(pfd_p)->pcie_ue_tgt_bdf = PCIE_INVALID_BDF; 1235 1236 if (PCIE_IS_RP(bus_p)) { 1237 PCIE_ADV_RP_REG(pfd_p) = 1238 PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1239 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = 1240 PCIE_INVALID_BDF; 1241 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = 1242 PCIE_INVALID_BDF; 1243 } else if (PCIE_IS_PCIE_BDG(bus_p)) { 1244 PCIE_ADV_BDG_REG(pfd_p) = 1245 PCIE_ZALLOC(pf_pcie_adv_bdg_err_regs_t); 1246 PCIE_ADV_BDG_REG(pfd_p)->pcie_sue_tgt_bdf = 1247 PCIE_INVALID_BDF; 1248 } 1249 1250 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1251 PCIX_BDG_ERR_REG(pfd_p) = 1252 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1253 1254 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1255 PCIX_BDG_ECC_REG(pfd_p, 0) = 1256 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1257 PCIX_BDG_ECC_REG(pfd_p, 1) = 1258 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1259 } 1260 } 1261 1262 PCIE_SLOT_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_slot_regs_t); 1263 PCIE_SLOT_REG(pfd_p)->pcie_slot_regs_valid = B_FALSE; 1264 PCIE_SLOT_REG(pfd_p)->pcie_slot_cap = 0; 1265 PCIE_SLOT_REG(pfd_p)->pcie_slot_control = 0; 1266 PCIE_SLOT_REG(pfd_p)->pcie_slot_status = 0; 1267 1268 } else if (PCIE_IS_PCIX(bus_p)) { 1269 if (PCIE_IS_BDG(bus_p)) { 1270 PCIX_BDG_ERR_REG(pfd_p) = 1271 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1272 1273 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1274 PCIX_BDG_ECC_REG(pfd_p, 0) = 1275 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1276 PCIX_BDG_ECC_REG(pfd_p, 1) = 1277 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1278 } 1279 } else { 1280 PCIX_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcix_err_regs_t); 1281 1282 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1283 PCIX_ECC_REG(pfd_p) = 1284 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1285 } 1286 } 1287 } 1288 1289 static void 1290 pcie_fini_pfd(dev_info_t *dip) 1291 { 1292 pf_data_t *pfd_p = PCIE_DIP2PFD(dip); 1293 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1294 1295 if (PCIE_IS_PCIE(bus_p)) { 1296 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1297 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1298 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1299 sizeof (pf_pcix_ecc_regs_t)); 1300 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1301 sizeof (pf_pcix_ecc_regs_t)); 1302 } 1303 1304 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1305 sizeof (pf_pcix_bdg_err_regs_t)); 1306 } 1307 1308 if (PCIE_IS_RP(bus_p)) 1309 kmem_free(PCIE_ADV_RP_REG(pfd_p), 1310 sizeof (pf_pcie_adv_rp_err_regs_t)); 1311 else if (PCIE_IS_PCIE_BDG(bus_p)) 1312 kmem_free(PCIE_ADV_BDG_REG(pfd_p), 1313 sizeof (pf_pcie_adv_bdg_err_regs_t)); 1314 1315 kmem_free(PCIE_ADV_REG(pfd_p), 1316 sizeof (pf_pcie_adv_err_regs_t)); 1317 1318 if (PCIE_IS_RP(bus_p)) 1319 kmem_free(PCIE_RP_REG(pfd_p), 1320 sizeof (pf_pcie_rp_err_regs_t)); 1321 1322 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1323 } else if (PCIE_IS_PCIX(bus_p)) { 1324 if (PCIE_IS_BDG(bus_p)) { 1325 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1326 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1327 sizeof (pf_pcix_ecc_regs_t)); 1328 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1329 sizeof (pf_pcix_ecc_regs_t)); 1330 } 1331 1332 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1333 sizeof (pf_pcix_bdg_err_regs_t)); 1334 } else { 1335 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1336 kmem_free(PCIX_ECC_REG(pfd_p), 1337 sizeof (pf_pcix_ecc_regs_t)); 1338 1339 kmem_free(PCIX_ERR_REG(pfd_p), 1340 sizeof (pf_pcix_err_regs_t)); 1341 } 1342 } 1343 1344 if (PCIE_IS_BDG(bus_p)) 1345 kmem_free(PCI_BDG_ERR_REG(pfd_p), 1346 sizeof (pf_pci_bdg_err_regs_t)); 1347 1348 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1349 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1350 1351 if (PCIE_IS_ROOT(bus_p)) { 1352 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1353 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1354 } 1355 1356 kmem_free(PCIE_DIP2PFD(dip), sizeof (pf_data_t)); 1357 1358 PCIE_DIP2PFD(dip) = NULL; 1359 } 1360 1361 1362 /* 1363 * Special functions to allocate pf_data_t's for PCIe root complexes. 1364 * Note: Root Complex not Root Port 1365 */ 1366 void 1367 pcie_rc_init_pfd(dev_info_t *dip, pf_data_t *pfd_p) 1368 { 1369 pfd_p->pe_bus_p = PCIE_DIP2DOWNBUS(dip); 1370 pfd_p->pe_severity_flags = 0; 1371 pfd_p->pe_severity_mask = 0; 1372 pfd_p->pe_orig_severity_flags = 0; 1373 pfd_p->pe_lock = B_FALSE; 1374 pfd_p->pe_valid = B_FALSE; 1375 1376 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1377 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1378 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1379 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1380 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1381 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1382 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1383 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1384 PCIE_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1385 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1386 PCIE_ADV_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1387 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = PCIE_INVALID_BDF; 1388 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = PCIE_INVALID_BDF; 1389 1390 PCIE_ADV_REG(pfd_p)->pcie_ue_sev = pcie_aer_uce_severity; 1391 } 1392 1393 void 1394 pcie_rc_fini_pfd(pf_data_t *pfd_p) 1395 { 1396 kmem_free(PCIE_ADV_RP_REG(pfd_p), sizeof (pf_pcie_adv_rp_err_regs_t)); 1397 kmem_free(PCIE_ADV_REG(pfd_p), sizeof (pf_pcie_adv_err_regs_t)); 1398 kmem_free(PCIE_RP_REG(pfd_p), sizeof (pf_pcie_rp_err_regs_t)); 1399 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1400 kmem_free(PCI_BDG_ERR_REG(pfd_p), sizeof (pf_pci_bdg_err_regs_t)); 1401 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1402 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1403 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1404 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1405 } 1406 1407 /* 1408 * init pcie_bus_t for root complex 1409 * 1410 * Only a few of the fields in bus_t is valid for root complex. 1411 * The fields that are bracketed are initialized in this routine: 1412 * 1413 * dev_info_t * <bus_dip> 1414 * dev_info_t * bus_rp_dip 1415 * ddi_acc_handle_t bus_cfg_hdl 1416 * uint_t <bus_fm_flags> 1417 * pcie_req_id_t bus_bdf 1418 * pcie_req_id_t bus_rp_bdf 1419 * uint32_t bus_dev_ven_id 1420 * uint8_t bus_rev_id 1421 * uint8_t <bus_hdr_type> 1422 * uint16_t <bus_dev_type> 1423 * uint8_t bus_bdg_secbus 1424 * uint16_t bus_pcie_off 1425 * uint16_t <bus_aer_off> 1426 * uint16_t bus_pcix_off 1427 * uint16_t bus_ecc_ver 1428 * pci_bus_range_t bus_bus_range 1429 * ppb_ranges_t * bus_addr_ranges 1430 * int bus_addr_entries 1431 * pci_regspec_t * bus_assigned_addr 1432 * int bus_assigned_entries 1433 * pf_data_t * bus_pfd 1434 * pcie_domain_t * <bus_dom> 1435 * int bus_mps 1436 * uint64_t bus_cfgacc_base 1437 * void * bus_plat_private 1438 */ 1439 void 1440 pcie_rc_init_bus(dev_info_t *dip) 1441 { 1442 pcie_bus_t *bus_p; 1443 1444 bus_p = (pcie_bus_t *)kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1445 bus_p->bus_dip = dip; 1446 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_RC_PSEUDO; 1447 bus_p->bus_hdr_type = PCI_HEADER_ONE; 1448 1449 /* Fake that there are AER logs */ 1450 bus_p->bus_aer_off = (uint16_t)-1; 1451 1452 /* Needed only for handle lookup */ 1453 atomic_or_uint(&bus_p->bus_fm_flags, PF_FM_READY); 1454 1455 ndi_set_bus_private(dip, B_FALSE, DEVI_PORT_TYPE_PCI, bus_p); 1456 1457 PCIE_BUS2DOM(bus_p) = PCIE_ZALLOC(pcie_domain_t); 1458 } 1459 1460 void 1461 pcie_rc_fini_bus(dev_info_t *dip) 1462 { 1463 pcie_bus_t *bus_p = PCIE_DIP2DOWNBUS(dip); 1464 ndi_set_bus_private(dip, B_FALSE, 0, NULL); 1465 kmem_free(PCIE_BUS2DOM(bus_p), sizeof (pcie_domain_t)); 1466 kmem_free(bus_p, sizeof (pcie_bus_t)); 1467 } 1468 1469 static int 1470 pcie_width_to_int(pcie_link_width_t width) 1471 { 1472 switch (width) { 1473 case PCIE_LINK_WIDTH_X1: 1474 return (1); 1475 case PCIE_LINK_WIDTH_X2: 1476 return (2); 1477 case PCIE_LINK_WIDTH_X4: 1478 return (4); 1479 case PCIE_LINK_WIDTH_X8: 1480 return (8); 1481 case PCIE_LINK_WIDTH_X12: 1482 return (12); 1483 case PCIE_LINK_WIDTH_X16: 1484 return (16); 1485 case PCIE_LINK_WIDTH_X32: 1486 return (32); 1487 default: 1488 return (0); 1489 } 1490 } 1491 1492 /* 1493 * Return the speed in Transfers / second. This is a signed quantity to match 1494 * the ndi/ddi property interfaces. 1495 */ 1496 static int64_t 1497 pcie_speed_to_int(pcie_link_speed_t speed) 1498 { 1499 switch (speed) { 1500 case PCIE_LINK_SPEED_2_5: 1501 return (2500000000LL); 1502 case PCIE_LINK_SPEED_5: 1503 return (5000000000LL); 1504 case PCIE_LINK_SPEED_8: 1505 return (8000000000LL); 1506 case PCIE_LINK_SPEED_16: 1507 return (16000000000LL); 1508 case PCIE_LINK_SPEED_32: 1509 return (32000000000LL); 1510 case PCIE_LINK_SPEED_64: 1511 return (64000000000LL); 1512 default: 1513 return (0); 1514 } 1515 } 1516 1517 /* 1518 * Translate the recorded speed information into devinfo properties. 1519 */ 1520 static void 1521 pcie_speeds_to_devinfo(dev_info_t *dip, pcie_bus_t *bus_p) 1522 { 1523 if (bus_p->bus_max_width != PCIE_LINK_WIDTH_UNKNOWN) { 1524 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1525 "pcie-link-maximum-width", 1526 pcie_width_to_int(bus_p->bus_max_width)); 1527 } 1528 1529 if (bus_p->bus_cur_width != PCIE_LINK_WIDTH_UNKNOWN) { 1530 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1531 "pcie-link-current-width", 1532 pcie_width_to_int(bus_p->bus_cur_width)); 1533 } 1534 1535 if (bus_p->bus_cur_speed != PCIE_LINK_SPEED_UNKNOWN) { 1536 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1537 "pcie-link-current-speed", 1538 pcie_speed_to_int(bus_p->bus_cur_speed)); 1539 } 1540 1541 if (bus_p->bus_max_speed != PCIE_LINK_SPEED_UNKNOWN) { 1542 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1543 "pcie-link-maximum-speed", 1544 pcie_speed_to_int(bus_p->bus_max_speed)); 1545 } 1546 1547 if (bus_p->bus_target_speed != PCIE_LINK_SPEED_UNKNOWN) { 1548 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1549 "pcie-link-target-speed", 1550 pcie_speed_to_int(bus_p->bus_target_speed)); 1551 } 1552 1553 if ((bus_p->bus_speed_flags & PCIE_LINK_F_ADMIN_TARGET) != 0) { 1554 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 1555 "pcie-link-admin-target-speed"); 1556 } 1557 1558 if (bus_p->bus_sup_speed != PCIE_LINK_SPEED_UNKNOWN) { 1559 int64_t speeds[PCIE_NSPEEDS]; 1560 uint_t nspeeds = 0; 1561 1562 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_2_5) { 1563 speeds[nspeeds++] = 1564 pcie_speed_to_int(PCIE_LINK_SPEED_2_5); 1565 } 1566 1567 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_5) { 1568 speeds[nspeeds++] = 1569 pcie_speed_to_int(PCIE_LINK_SPEED_5); 1570 } 1571 1572 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_8) { 1573 speeds[nspeeds++] = 1574 pcie_speed_to_int(PCIE_LINK_SPEED_8); 1575 } 1576 1577 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_16) { 1578 speeds[nspeeds++] = 1579 pcie_speed_to_int(PCIE_LINK_SPEED_16); 1580 } 1581 1582 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_32) { 1583 speeds[nspeeds++] = 1584 pcie_speed_to_int(PCIE_LINK_SPEED_32); 1585 } 1586 1587 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_64) { 1588 speeds[nspeeds++] = 1589 pcie_speed_to_int(PCIE_LINK_SPEED_64); 1590 } 1591 1592 (void) ndi_prop_update_int64_array(DDI_DEV_T_NONE, dip, 1593 "pcie-link-supported-speeds", speeds, nspeeds); 1594 } 1595 } 1596 1597 /* 1598 * We need to capture the supported, maximum, and current device speed and 1599 * width. The way that this has been done has changed over time. 1600 * 1601 * Prior to PCIe Gen 3, there were only current and supported speed fields. 1602 * These were found in the link status and link capabilities registers of the 1603 * PCI express capability. With the change to PCIe Gen 3, the information in the 1604 * link capabilities changed to the maximum value. The supported speeds vector 1605 * was moved to the link capabilities 2 register. 1606 * 1607 * Now, a device may not implement some of these registers. To determine whether 1608 * or not it's here, we have to do the following. First, we need to check the 1609 * revision of the PCI express capability. The link capabilities 2 register did 1610 * not exist prior to version 2 of this capability. If a modern device does not 1611 * implement it, it is supposed to return zero for the register. 1612 */ 1613 static void 1614 pcie_capture_speeds(dev_info_t *dip) 1615 { 1616 uint16_t vers, status; 1617 uint32_t cap, cap2, ctl2; 1618 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1619 dev_info_t *rcdip; 1620 1621 if (!PCIE_IS_PCIE(bus_p)) 1622 return; 1623 1624 rcdip = pcie_get_rc_dip(dip); 1625 if (bus_p->bus_cfg_hdl == NULL) { 1626 vers = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1627 bus_p->bus_pcie_off + PCIE_PCIECAP); 1628 } else { 1629 vers = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 1630 } 1631 if (vers == PCI_EINVAL16) 1632 return; 1633 vers &= PCIE_PCIECAP_VER_MASK; 1634 1635 /* 1636 * Verify the capability's version. 1637 */ 1638 switch (vers) { 1639 case PCIE_PCIECAP_VER_1_0: 1640 cap2 = 0; 1641 ctl2 = 0; 1642 break; 1643 case PCIE_PCIECAP_VER_2_0: 1644 if (bus_p->bus_cfg_hdl == NULL) { 1645 cap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1646 bus_p->bus_pcie_off + PCIE_LINKCAP2); 1647 ctl2 = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1648 bus_p->bus_pcie_off + PCIE_LINKCTL2); 1649 } else { 1650 cap2 = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP2); 1651 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 1652 } 1653 if (cap2 == PCI_EINVAL32) 1654 cap2 = 0; 1655 if (ctl2 == PCI_EINVAL16) 1656 ctl2 = 0; 1657 break; 1658 default: 1659 /* Don't try and handle an unknown version */ 1660 return; 1661 } 1662 1663 if (bus_p->bus_cfg_hdl == NULL) { 1664 status = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1665 bus_p->bus_pcie_off + PCIE_LINKSTS); 1666 cap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1667 bus_p->bus_pcie_off + PCIE_LINKCAP); 1668 } else { 1669 status = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 1670 cap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1671 } 1672 if (status == PCI_EINVAL16 || cap == PCI_EINVAL32) 1673 return; 1674 1675 mutex_enter(&bus_p->bus_speed_mutex); 1676 1677 switch (status & PCIE_LINKSTS_SPEED_MASK) { 1678 case PCIE_LINKSTS_SPEED_2_5: 1679 bus_p->bus_cur_speed = PCIE_LINK_SPEED_2_5; 1680 break; 1681 case PCIE_LINKSTS_SPEED_5: 1682 bus_p->bus_cur_speed = PCIE_LINK_SPEED_5; 1683 break; 1684 case PCIE_LINKSTS_SPEED_8: 1685 bus_p->bus_cur_speed = PCIE_LINK_SPEED_8; 1686 break; 1687 case PCIE_LINKSTS_SPEED_16: 1688 bus_p->bus_cur_speed = PCIE_LINK_SPEED_16; 1689 break; 1690 case PCIE_LINKSTS_SPEED_32: 1691 bus_p->bus_cur_speed = PCIE_LINK_SPEED_32; 1692 break; 1693 case PCIE_LINKSTS_SPEED_64: 1694 bus_p->bus_cur_speed = PCIE_LINK_SPEED_64; 1695 break; 1696 default: 1697 bus_p->bus_cur_speed = PCIE_LINK_SPEED_UNKNOWN; 1698 break; 1699 } 1700 1701 switch (status & PCIE_LINKSTS_NEG_WIDTH_MASK) { 1702 case PCIE_LINKSTS_NEG_WIDTH_X1: 1703 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X1; 1704 break; 1705 case PCIE_LINKSTS_NEG_WIDTH_X2: 1706 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X2; 1707 break; 1708 case PCIE_LINKSTS_NEG_WIDTH_X4: 1709 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X4; 1710 break; 1711 case PCIE_LINKSTS_NEG_WIDTH_X8: 1712 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X8; 1713 break; 1714 case PCIE_LINKSTS_NEG_WIDTH_X12: 1715 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X12; 1716 break; 1717 case PCIE_LINKSTS_NEG_WIDTH_X16: 1718 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X16; 1719 break; 1720 case PCIE_LINKSTS_NEG_WIDTH_X32: 1721 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X32; 1722 break; 1723 default: 1724 bus_p->bus_cur_width = PCIE_LINK_WIDTH_UNKNOWN; 1725 break; 1726 } 1727 1728 switch (cap & PCIE_LINKCAP_MAX_WIDTH_MASK) { 1729 case PCIE_LINKCAP_MAX_WIDTH_X1: 1730 bus_p->bus_max_width = PCIE_LINK_WIDTH_X1; 1731 break; 1732 case PCIE_LINKCAP_MAX_WIDTH_X2: 1733 bus_p->bus_max_width = PCIE_LINK_WIDTH_X2; 1734 break; 1735 case PCIE_LINKCAP_MAX_WIDTH_X4: 1736 bus_p->bus_max_width = PCIE_LINK_WIDTH_X4; 1737 break; 1738 case PCIE_LINKCAP_MAX_WIDTH_X8: 1739 bus_p->bus_max_width = PCIE_LINK_WIDTH_X8; 1740 break; 1741 case PCIE_LINKCAP_MAX_WIDTH_X12: 1742 bus_p->bus_max_width = PCIE_LINK_WIDTH_X12; 1743 break; 1744 case PCIE_LINKCAP_MAX_WIDTH_X16: 1745 bus_p->bus_max_width = PCIE_LINK_WIDTH_X16; 1746 break; 1747 case PCIE_LINKCAP_MAX_WIDTH_X32: 1748 bus_p->bus_max_width = PCIE_LINK_WIDTH_X32; 1749 break; 1750 default: 1751 bus_p->bus_max_width = PCIE_LINK_WIDTH_UNKNOWN; 1752 break; 1753 } 1754 1755 /* 1756 * If we have the Link Capabilities 2, then we can get the supported 1757 * speeds from it and treat the bits in Link Capabilities 1 as the 1758 * maximum. If we don't, then we need to follow the Implementation Note 1759 * in the standard under Link Capabilities 2. Effectively, this means 1760 * that if the value of 10b is set in Link Capabilities register, that 1761 * it supports both 2.5 and 5 GT/s speeds. 1762 */ 1763 if (cap2 != 0) { 1764 if (cap2 & PCIE_LINKCAP2_SPEED_2_5) 1765 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_2_5; 1766 if (cap2 & PCIE_LINKCAP2_SPEED_5) 1767 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_5; 1768 if (cap2 & PCIE_LINKCAP2_SPEED_8) 1769 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_8; 1770 if (cap2 & PCIE_LINKCAP2_SPEED_16) 1771 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_16; 1772 if (cap2 & PCIE_LINKCAP2_SPEED_32) 1773 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_32; 1774 if (cap2 & PCIE_LINKCAP2_SPEED_64) 1775 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_64; 1776 1777 switch (cap & PCIE_LINKCAP_MAX_SPEED_MASK) { 1778 case PCIE_LINKCAP_MAX_SPEED_2_5: 1779 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1780 break; 1781 case PCIE_LINKCAP_MAX_SPEED_5: 1782 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1783 break; 1784 case PCIE_LINKCAP_MAX_SPEED_8: 1785 bus_p->bus_max_speed = PCIE_LINK_SPEED_8; 1786 break; 1787 case PCIE_LINKCAP_MAX_SPEED_16: 1788 bus_p->bus_max_speed = PCIE_LINK_SPEED_16; 1789 break; 1790 case PCIE_LINKCAP_MAX_SPEED_32: 1791 bus_p->bus_max_speed = PCIE_LINK_SPEED_32; 1792 break; 1793 case PCIE_LINKCAP_MAX_SPEED_64: 1794 bus_p->bus_max_speed = PCIE_LINK_SPEED_64; 1795 break; 1796 default: 1797 bus_p->bus_max_speed = PCIE_LINK_SPEED_UNKNOWN; 1798 break; 1799 } 1800 } else { 1801 if (cap & PCIE_LINKCAP_MAX_SPEED_5) { 1802 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1803 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5 | 1804 PCIE_LINK_SPEED_5; 1805 } else if (cap & PCIE_LINKCAP_MAX_SPEED_2_5) { 1806 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1807 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5; 1808 } 1809 } 1810 1811 switch (ctl2 & PCIE_LINKCTL2_TARGET_SPEED_MASK) { 1812 case PCIE_LINKCTL2_TARGET_SPEED_2_5: 1813 bus_p->bus_target_speed = PCIE_LINK_SPEED_2_5; 1814 break; 1815 case PCIE_LINKCTL2_TARGET_SPEED_5: 1816 bus_p->bus_target_speed = PCIE_LINK_SPEED_5; 1817 break; 1818 case PCIE_LINKCTL2_TARGET_SPEED_8: 1819 bus_p->bus_target_speed = PCIE_LINK_SPEED_8; 1820 break; 1821 case PCIE_LINKCTL2_TARGET_SPEED_16: 1822 bus_p->bus_target_speed = PCIE_LINK_SPEED_16; 1823 break; 1824 case PCIE_LINKCTL2_TARGET_SPEED_32: 1825 bus_p->bus_target_speed = PCIE_LINK_SPEED_32; 1826 break; 1827 case PCIE_LINKCTL2_TARGET_SPEED_64: 1828 bus_p->bus_target_speed = PCIE_LINK_SPEED_64; 1829 break; 1830 default: 1831 bus_p->bus_target_speed = PCIE_LINK_SPEED_UNKNOWN; 1832 break; 1833 } 1834 1835 pcie_speeds_to_devinfo(dip, bus_p); 1836 mutex_exit(&bus_p->bus_speed_mutex); 1837 } 1838 1839 /* 1840 * partially init pcie_bus_t for device (dip,bdf) for accessing pci 1841 * config space 1842 * 1843 * This routine is invoked during boot, either after creating a devinfo node 1844 * (x86 case) or during px driver attach (sparc case); it is also invoked 1845 * in hotplug context after a devinfo node is created. 1846 * 1847 * The fields that are bracketed are initialized if flag PCIE_BUS_INITIAL 1848 * is set: 1849 * 1850 * dev_info_t * <bus_dip> 1851 * dev_info_t * <bus_rp_dip> 1852 * ddi_acc_handle_t bus_cfg_hdl 1853 * uint_t bus_fm_flags 1854 * pcie_req_id_t <bus_bdf> 1855 * pcie_req_id_t <bus_rp_bdf> 1856 * uint32_t <bus_dev_ven_id> 1857 * uint8_t <bus_rev_id> 1858 * uint8_t <bus_hdr_type> 1859 * uint16_t <bus_dev_type> 1860 * uint8_t <bus_bdg_secbus 1861 * uint16_t <bus_pcie_off> 1862 * uint16_t <bus_aer_off> 1863 * uint16_t <bus_pcix_off> 1864 * uint16_t <bus_ecc_ver> 1865 * pci_bus_range_t bus_bus_range 1866 * ppb_ranges_t * bus_addr_ranges 1867 * int bus_addr_entries 1868 * pci_regspec_t * bus_assigned_addr 1869 * int bus_assigned_entries 1870 * pf_data_t * bus_pfd 1871 * pcie_domain_t * bus_dom 1872 * int bus_mps 1873 * uint64_t bus_cfgacc_base 1874 * void * bus_plat_private 1875 * 1876 * The fields that are bracketed are initialized if flag PCIE_BUS_FINAL 1877 * is set: 1878 * 1879 * dev_info_t * bus_dip 1880 * dev_info_t * bus_rp_dip 1881 * ddi_acc_handle_t bus_cfg_hdl 1882 * uint_t bus_fm_flags 1883 * pcie_req_id_t bus_bdf 1884 * pcie_req_id_t bus_rp_bdf 1885 * uint32_t bus_dev_ven_id 1886 * uint8_t bus_rev_id 1887 * uint8_t bus_hdr_type 1888 * uint16_t bus_dev_type 1889 * uint8_t <bus_bdg_secbus> 1890 * uint16_t bus_pcie_off 1891 * uint16_t bus_aer_off 1892 * uint16_t bus_pcix_off 1893 * uint16_t bus_ecc_ver 1894 * pci_bus_range_t <bus_bus_range> 1895 * ppb_ranges_t * <bus_addr_ranges> 1896 * int <bus_addr_entries> 1897 * pci_regspec_t * <bus_assigned_addr> 1898 * int <bus_assigned_entries> 1899 * pf_data_t * <bus_pfd> 1900 * pcie_domain_t * bus_dom 1901 * int bus_mps 1902 * uint64_t bus_cfgacc_base 1903 * void * <bus_plat_private> 1904 */ 1905 1906 pcie_bus_t * 1907 pcie_init_bus(dev_info_t *dip, pcie_req_id_t bdf, uint8_t flags) 1908 { 1909 uint16_t status, base, baseptr, num_cap; 1910 uint32_t capid; 1911 int range_size; 1912 pcie_bus_t *bus_p = NULL; 1913 dev_info_t *rcdip; 1914 dev_info_t *pdip; 1915 const char *errstr = NULL; 1916 1917 if (!(flags & PCIE_BUS_INITIAL)) 1918 goto initial_done; 1919 1920 bus_p = kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1921 1922 bus_p->bus_dip = dip; 1923 bus_p->bus_bdf = bdf; 1924 1925 rcdip = pcie_get_rc_dip(dip); 1926 ASSERT(rcdip != NULL); 1927 1928 /* Save the Vendor ID, Device ID and revision ID */ 1929 bus_p->bus_dev_ven_id = pci_cfgacc_get32(rcdip, bdf, PCI_CONF_VENID); 1930 bus_p->bus_rev_id = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_REVID); 1931 /* Save the Header Type */ 1932 bus_p->bus_hdr_type = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_HEADER); 1933 bus_p->bus_hdr_type &= PCI_HEADER_TYPE_M; 1934 1935 /* 1936 * Figure out the device type and all the relavant capability offsets 1937 */ 1938 /* set default value */ 1939 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_PCI_PSEUDO; 1940 1941 status = pci_cfgacc_get16(rcdip, bdf, PCI_CONF_STAT); 1942 if (status == PCI_CAP_EINVAL16 || !(status & PCI_STAT_CAP)) 1943 goto caps_done; /* capability not supported */ 1944 1945 /* Relevant conventional capabilities first */ 1946 1947 /* Conventional caps: PCI_CAP_ID_PCI_E, PCI_CAP_ID_PCIX */ 1948 num_cap = 2; 1949 1950 switch (bus_p->bus_hdr_type) { 1951 case PCI_HEADER_ZERO: 1952 baseptr = PCI_CONF_CAP_PTR; 1953 break; 1954 case PCI_HEADER_PPB: 1955 baseptr = PCI_BCNF_CAP_PTR; 1956 break; 1957 case PCI_HEADER_CARDBUS: 1958 baseptr = PCI_CBUS_CAP_PTR; 1959 break; 1960 default: 1961 cmn_err(CE_WARN, "%s: unexpected pci header type:%x", 1962 __func__, bus_p->bus_hdr_type); 1963 goto caps_done; 1964 } 1965 1966 base = baseptr; 1967 for (base = pci_cfgacc_get8(rcdip, bdf, base); base && num_cap; 1968 base = pci_cfgacc_get8(rcdip, bdf, base + PCI_CAP_NEXT_PTR)) { 1969 capid = pci_cfgacc_get8(rcdip, bdf, base); 1970 uint16_t pcap; 1971 1972 switch (capid) { 1973 case PCI_CAP_ID_PCI_E: 1974 bus_p->bus_pcie_off = base; 1975 pcap = pci_cfgacc_get16(rcdip, bdf, base + 1976 PCIE_PCIECAP); 1977 bus_p->bus_dev_type = pcap & PCIE_PCIECAP_DEV_TYPE_MASK; 1978 bus_p->bus_pcie_vers = pcap & PCIE_PCIECAP_VER_MASK; 1979 1980 /* Check and save PCIe hotplug capability information */ 1981 if ((PCIE_IS_RP(bus_p) || PCIE_IS_SWD(bus_p)) && 1982 (pci_cfgacc_get16(rcdip, bdf, base + PCIE_PCIECAP) 1983 & PCIE_PCIECAP_SLOT_IMPL) && 1984 (pci_cfgacc_get32(rcdip, bdf, base + PCIE_SLOTCAP) 1985 & PCIE_SLOTCAP_HP_CAPABLE)) 1986 bus_p->bus_hp_sup_modes |= PCIE_NATIVE_HP_MODE; 1987 1988 num_cap--; 1989 break; 1990 case PCI_CAP_ID_PCIX: 1991 bus_p->bus_pcix_off = base; 1992 if (PCIE_IS_BDG(bus_p)) 1993 bus_p->bus_ecc_ver = 1994 pci_cfgacc_get16(rcdip, bdf, base + 1995 PCI_PCIX_SEC_STATUS) & PCI_PCIX_VER_MASK; 1996 else 1997 bus_p->bus_ecc_ver = 1998 pci_cfgacc_get16(rcdip, bdf, base + 1999 PCI_PCIX_COMMAND) & PCI_PCIX_VER_MASK; 2000 num_cap--; 2001 break; 2002 default: 2003 break; 2004 } 2005 } 2006 2007 /* Check and save PCI hotplug (SHPC) capability information */ 2008 if (PCIE_IS_BDG(bus_p)) { 2009 base = baseptr; 2010 for (base = pci_cfgacc_get8(rcdip, bdf, base); 2011 base; base = pci_cfgacc_get8(rcdip, bdf, 2012 base + PCI_CAP_NEXT_PTR)) { 2013 capid = pci_cfgacc_get8(rcdip, bdf, base); 2014 if (capid == PCI_CAP_ID_PCI_HOTPLUG) { 2015 bus_p->bus_pci_hp_off = base; 2016 bus_p->bus_hp_sup_modes |= PCIE_PCI_HP_MODE; 2017 break; 2018 } 2019 } 2020 } 2021 2022 /* Then, relevant extended capabilities */ 2023 2024 if (!PCIE_IS_PCIE(bus_p)) 2025 goto caps_done; 2026 2027 /* Extended caps: PCIE_EXT_CAP_ID_AER */ 2028 for (base = PCIE_EXT_CAP; base; base = (capid >> 2029 PCIE_EXT_CAP_NEXT_PTR_SHIFT) & PCIE_EXT_CAP_NEXT_PTR_MASK) { 2030 capid = pci_cfgacc_get32(rcdip, bdf, base); 2031 if (capid == PCI_CAP_EINVAL32) 2032 break; 2033 switch ((capid >> PCIE_EXT_CAP_ID_SHIFT) & 2034 PCIE_EXT_CAP_ID_MASK) { 2035 case PCIE_EXT_CAP_ID_AER: 2036 bus_p->bus_aer_off = base; 2037 break; 2038 case PCIE_EXT_CAP_ID_DEV3: 2039 bus_p->bus_dev3_off = base; 2040 break; 2041 } 2042 } 2043 2044 caps_done: 2045 /* save RP dip and RP bdf */ 2046 if (PCIE_IS_RP(bus_p)) { 2047 bus_p->bus_rp_dip = dip; 2048 bus_p->bus_rp_bdf = bus_p->bus_bdf; 2049 2050 bus_p->bus_fab = PCIE_ZALLOC(pcie_fabric_data_t); 2051 } else { 2052 for (pdip = ddi_get_parent(dip); pdip; 2053 pdip = ddi_get_parent(pdip)) { 2054 pcie_bus_t *parent_bus_p = PCIE_DIP2BUS(pdip); 2055 2056 /* 2057 * If RP dip and RP bdf in parent's bus_t have 2058 * been initialized, simply use these instead of 2059 * continuing up to the RC. 2060 */ 2061 if (parent_bus_p->bus_rp_dip != NULL) { 2062 bus_p->bus_rp_dip = parent_bus_p->bus_rp_dip; 2063 bus_p->bus_rp_bdf = parent_bus_p->bus_rp_bdf; 2064 break; 2065 } 2066 2067 /* 2068 * When debugging be aware that some NVIDIA x86 2069 * architectures have 2 nodes for each RP, One at Bus 2070 * 0x0 and one at Bus 0x80. The requester is from Bus 2071 * 0x80 2072 */ 2073 if (PCIE_IS_ROOT(parent_bus_p)) { 2074 bus_p->bus_rp_dip = pdip; 2075 bus_p->bus_rp_bdf = parent_bus_p->bus_bdf; 2076 break; 2077 } 2078 } 2079 } 2080 2081 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 2082 (void) atomic_swap_uint(&bus_p->bus_fm_flags, 0); 2083 2084 ndi_set_bus_private(dip, B_TRUE, DEVI_PORT_TYPE_PCI, (void *)bus_p); 2085 2086 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) 2087 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 2088 "hotplug-capable"); 2089 2090 initial_done: 2091 if (!(flags & PCIE_BUS_FINAL)) 2092 goto final_done; 2093 2094 /* already initialized? */ 2095 bus_p = PCIE_DIP2BUS(dip); 2096 2097 /* Save the Range information if device is a switch/bridge */ 2098 if (PCIE_IS_BDG(bus_p)) { 2099 /* get "bus_range" property */ 2100 range_size = sizeof (pci_bus_range_t); 2101 if (ddi_getlongprop_buf(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2102 "bus-range", (caddr_t)&bus_p->bus_bus_range, &range_size) 2103 != DDI_PROP_SUCCESS) { 2104 errstr = "Cannot find \"bus-range\" property"; 2105 cmn_err(CE_WARN, 2106 "PCIE init err info failed BDF 0x%x:%s\n", 2107 bus_p->bus_bdf, errstr); 2108 } 2109 2110 /* get secondary bus number */ 2111 rcdip = pcie_get_rc_dip(dip); 2112 ASSERT(rcdip != NULL); 2113 2114 bus_p->bus_bdg_secbus = pci_cfgacc_get8(rcdip, 2115 bus_p->bus_bdf, PCI_BCNF_SECBUS); 2116 2117 /* Get "ranges" property */ 2118 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2119 "ranges", (caddr_t)&bus_p->bus_addr_ranges, 2120 &bus_p->bus_addr_entries) != DDI_PROP_SUCCESS) 2121 bus_p->bus_addr_entries = 0; 2122 bus_p->bus_addr_entries /= sizeof (ppb_ranges_t); 2123 } 2124 2125 /* save "assigned-addresses" property array, ignore failues */ 2126 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2127 "assigned-addresses", (caddr_t)&bus_p->bus_assigned_addr, 2128 &bus_p->bus_assigned_entries) == DDI_PROP_SUCCESS) 2129 bus_p->bus_assigned_entries /= sizeof (pci_regspec_t); 2130 else 2131 bus_p->bus_assigned_entries = 0; 2132 2133 pcie_init_pfd(dip); 2134 2135 pcie_init_plat(dip); 2136 2137 pcie_capture_speeds(dip); 2138 2139 final_done: 2140 2141 PCIE_DBG("Add %s(dip 0x%p, bdf 0x%x, secbus 0x%x)\n", 2142 ddi_driver_name(dip), (void *)dip, bus_p->bus_bdf, 2143 bus_p->bus_bdg_secbus); 2144 #ifdef DEBUG 2145 if (bus_p != NULL) { 2146 pcie_print_bus(bus_p); 2147 } 2148 #endif 2149 2150 return (bus_p); 2151 } 2152 2153 /* 2154 * Invoked before destroying devinfo node, mostly during hotplug 2155 * operation to free pcie_bus_t data structure 2156 */ 2157 /* ARGSUSED */ 2158 void 2159 pcie_fini_bus(dev_info_t *dip, uint8_t flags) 2160 { 2161 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2162 ASSERT(bus_p); 2163 2164 if (flags & PCIE_BUS_INITIAL) { 2165 pcie_fini_plat(dip); 2166 pcie_fini_pfd(dip); 2167 2168 if (PCIE_IS_RP(bus_p)) { 2169 kmem_free(bus_p->bus_fab, sizeof (pcie_fabric_data_t)); 2170 bus_p->bus_fab = NULL; 2171 } 2172 2173 kmem_free(bus_p->bus_assigned_addr, 2174 (sizeof (pci_regspec_t) * bus_p->bus_assigned_entries)); 2175 kmem_free(bus_p->bus_addr_ranges, 2176 (sizeof (ppb_ranges_t) * bus_p->bus_addr_entries)); 2177 /* zero out the fields that have been destroyed */ 2178 bus_p->bus_assigned_addr = NULL; 2179 bus_p->bus_addr_ranges = NULL; 2180 bus_p->bus_assigned_entries = 0; 2181 bus_p->bus_addr_entries = 0; 2182 } 2183 2184 if (flags & PCIE_BUS_FINAL) { 2185 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) { 2186 (void) ndi_prop_remove(DDI_DEV_T_NONE, dip, 2187 "hotplug-capable"); 2188 } 2189 2190 ndi_set_bus_private(dip, B_TRUE, 0, NULL); 2191 kmem_free(bus_p, sizeof (pcie_bus_t)); 2192 } 2193 } 2194 2195 int 2196 pcie_postattach_child(dev_info_t *cdip) 2197 { 2198 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 2199 2200 if (!bus_p) 2201 return (DDI_FAILURE); 2202 2203 return (pcie_enable_ce(cdip)); 2204 } 2205 2206 /* 2207 * PCI-Express child device de-initialization. 2208 * This function disables generic pci-express interrupts and error 2209 * handling. 2210 */ 2211 void 2212 pcie_uninitchild(dev_info_t *cdip) 2213 { 2214 pcie_disable_errors(cdip); 2215 pcie_fini_cfghdl(cdip); 2216 pcie_fini_dom(cdip); 2217 } 2218 2219 /* 2220 * find the root complex dip 2221 */ 2222 dev_info_t * 2223 pcie_get_rc_dip(dev_info_t *dip) 2224 { 2225 dev_info_t *rcdip; 2226 pcie_bus_t *rc_bus_p; 2227 2228 for (rcdip = ddi_get_parent(dip); rcdip; 2229 rcdip = ddi_get_parent(rcdip)) { 2230 rc_bus_p = PCIE_DIP2BUS(rcdip); 2231 if (rc_bus_p && PCIE_IS_RC(rc_bus_p)) 2232 break; 2233 } 2234 2235 return (rcdip); 2236 } 2237 2238 boolean_t 2239 pcie_is_pci_device(dev_info_t *dip) 2240 { 2241 dev_info_t *pdip; 2242 char *device_type; 2243 2244 pdip = ddi_get_parent(dip); 2245 if (pdip == NULL) 2246 return (B_FALSE); 2247 2248 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, DDI_PROP_DONTPASS, 2249 "device_type", &device_type) != DDI_PROP_SUCCESS) 2250 return (B_FALSE); 2251 2252 if (strcmp(device_type, "pciex") != 0 && 2253 strcmp(device_type, "pci") != 0) { 2254 ddi_prop_free(device_type); 2255 return (B_FALSE); 2256 } 2257 2258 ddi_prop_free(device_type); 2259 return (B_TRUE); 2260 } 2261 2262 typedef struct { 2263 boolean_t init; 2264 uint8_t flags; 2265 } pcie_bus_arg_t; 2266 2267 /*ARGSUSED*/ 2268 static int 2269 pcie_fab_do_init_fini(dev_info_t *dip, void *arg) 2270 { 2271 pcie_req_id_t bdf; 2272 pcie_bus_arg_t *bus_arg = (pcie_bus_arg_t *)arg; 2273 2274 if (!pcie_is_pci_device(dip)) 2275 goto out; 2276 2277 if (bus_arg->init) { 2278 if (pcie_get_bdf_from_dip(dip, &bdf) != DDI_SUCCESS) 2279 goto out; 2280 2281 (void) pcie_init_bus(dip, bdf, bus_arg->flags); 2282 } else { 2283 (void) pcie_fini_bus(dip, bus_arg->flags); 2284 } 2285 2286 return (DDI_WALK_CONTINUE); 2287 2288 out: 2289 return (DDI_WALK_PRUNECHILD); 2290 } 2291 2292 void 2293 pcie_fab_init_bus(dev_info_t *rcdip, uint8_t flags) 2294 { 2295 dev_info_t *dip = ddi_get_child(rcdip); 2296 pcie_bus_arg_t arg; 2297 2298 arg.init = B_TRUE; 2299 arg.flags = flags; 2300 2301 ndi_devi_enter(rcdip); 2302 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2303 ndi_devi_exit(rcdip); 2304 } 2305 2306 void 2307 pcie_fab_fini_bus(dev_info_t *rcdip, uint8_t flags) 2308 { 2309 dev_info_t *dip = ddi_get_child(rcdip); 2310 pcie_bus_arg_t arg; 2311 2312 arg.init = B_FALSE; 2313 arg.flags = flags; 2314 2315 ndi_devi_enter(rcdip); 2316 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2317 ndi_devi_exit(rcdip); 2318 } 2319 2320 void 2321 pcie_enable_errors(dev_info_t *dip) 2322 { 2323 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2324 uint16_t reg16, tmp16; 2325 uint32_t reg32, tmp32; 2326 2327 ASSERT(bus_p); 2328 2329 /* 2330 * Clear any pending errors 2331 */ 2332 pcie_clear_errors(dip); 2333 2334 if (!PCIE_IS_PCIE(bus_p)) 2335 return; 2336 2337 /* 2338 * Enable Baseline Error Handling but leave CE reporting off (poweron 2339 * default). 2340 */ 2341 if ((reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL)) != 2342 PCI_CAP_EINVAL16) { 2343 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 2344 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2345 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 2346 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2347 (pcie_base_err_default & (~PCIE_DEVCTL_CE_REPORTING_EN)); 2348 2349 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 2350 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 2351 } 2352 2353 /* Enable Root Port Baseline Error Receiving */ 2354 if (PCIE_IS_ROOT(bus_p) && 2355 (reg16 = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL)) != 2356 PCI_CAP_EINVAL16) { 2357 2358 tmp16 = pcie_serr_disable_flag ? 2359 (pcie_root_ctrl_default & ~PCIE_ROOT_SYS_ERR) : 2360 pcie_root_ctrl_default; 2361 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, tmp16); 2362 PCIE_DBG_CAP(dip, bus_p, "ROOT DEVCTL", 16, PCIE_ROOTCTL, 2363 reg16); 2364 } 2365 2366 /* 2367 * Enable PCI-Express Advanced Error Handling if Exists 2368 */ 2369 if (!PCIE_HAS_AER(bus_p)) 2370 return; 2371 2372 /* Set Uncorrectable Severity */ 2373 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_SERV)) != 2374 PCI_CAP_EINVAL32) { 2375 tmp32 = pcie_aer_uce_severity; 2376 2377 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_SERV, tmp32); 2378 PCIE_DBG_AER(dip, bus_p, "AER UCE SEV", 32, PCIE_AER_UCE_SERV, 2379 reg32); 2380 } 2381 2382 /* Enable Uncorrectable errors */ 2383 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_MASK)) != 2384 PCI_CAP_EINVAL32) { 2385 tmp32 = pcie_aer_uce_mask; 2386 2387 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, tmp32); 2388 PCIE_DBG_AER(dip, bus_p, "AER UCE MASK", 32, PCIE_AER_UCE_MASK, 2389 reg32); 2390 } 2391 2392 /* Enable ECRC generation and checking */ 2393 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2394 PCI_CAP_EINVAL32) { 2395 tmp32 = reg32 | pcie_ecrc_value; 2396 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, tmp32); 2397 PCIE_DBG_AER(dip, bus_p, "AER CTL", 32, PCIE_AER_CTL, reg32); 2398 } 2399 2400 /* Enable Secondary Uncorrectable errors if this is a bridge */ 2401 if (!PCIE_IS_PCIE_BDG(bus_p)) 2402 goto root; 2403 2404 /* Set Uncorrectable Severity */ 2405 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_SERV)) != 2406 PCI_CAP_EINVAL32) { 2407 tmp32 = pcie_aer_suce_severity; 2408 2409 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_SERV, tmp32); 2410 PCIE_DBG_AER(dip, bus_p, "AER SUCE SEV", 32, PCIE_AER_SUCE_SERV, 2411 reg32); 2412 } 2413 2414 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_MASK)) != 2415 PCI_CAP_EINVAL32) { 2416 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, pcie_aer_suce_mask); 2417 PCIE_DBG_AER(dip, bus_p, "AER SUCE MASK", 32, 2418 PCIE_AER_SUCE_MASK, reg32); 2419 } 2420 2421 root: 2422 /* 2423 * Enable Root Control this is a Root device 2424 */ 2425 if (!PCIE_IS_ROOT(bus_p)) 2426 return; 2427 2428 if ((reg16 = PCIE_AER_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2429 PCI_CAP_EINVAL16) { 2430 PCIE_AER_PUT(16, bus_p, PCIE_AER_RE_CMD, 2431 pcie_root_error_cmd_default); 2432 PCIE_DBG_AER(dip, bus_p, "AER Root Err Cmd", 16, 2433 PCIE_AER_RE_CMD, reg16); 2434 } 2435 } 2436 2437 /* 2438 * This function is used for enabling CE reporting and setting the AER CE mask. 2439 * When called from outside the pcie module it should always be preceded by 2440 * a call to pcie_enable_errors. 2441 */ 2442 int 2443 pcie_enable_ce(dev_info_t *dip) 2444 { 2445 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2446 uint16_t device_sts, device_ctl; 2447 uint32_t tmp_pcie_aer_ce_mask; 2448 2449 if (!PCIE_IS_PCIE(bus_p)) 2450 return (DDI_SUCCESS); 2451 2452 /* 2453 * The "pcie_ce_mask" property is used to control both the CE reporting 2454 * enable field in the device control register and the AER CE mask. We 2455 * leave CE reporting disabled if pcie_ce_mask is set to -1. 2456 */ 2457 2458 tmp_pcie_aer_ce_mask = (uint32_t)ddi_prop_get_int(DDI_DEV_T_ANY, dip, 2459 DDI_PROP_DONTPASS, "pcie_ce_mask", pcie_aer_ce_mask); 2460 2461 if (tmp_pcie_aer_ce_mask == (uint32_t)-1) { 2462 /* 2463 * Nothing to do since CE reporting has already been disabled. 2464 */ 2465 return (DDI_SUCCESS); 2466 } 2467 2468 if (PCIE_HAS_AER(bus_p)) { 2469 /* Enable AER CE */ 2470 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, tmp_pcie_aer_ce_mask); 2471 PCIE_DBG_AER(dip, bus_p, "AER CE MASK", 32, PCIE_AER_CE_MASK, 2472 0); 2473 2474 /* Clear any pending AER CE errors */ 2475 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_STS, -1); 2476 } 2477 2478 /* clear any pending CE errors */ 2479 if ((device_sts = PCIE_CAP_GET(16, bus_p, PCIE_DEVSTS)) != 2480 PCI_CAP_EINVAL16) 2481 PCIE_CAP_PUT(16, bus_p, PCIE_DEVSTS, 2482 device_sts & (~PCIE_DEVSTS_CE_DETECTED)); 2483 2484 /* Enable CE reporting */ 2485 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2486 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, 2487 (device_ctl & (~PCIE_DEVCTL_ERR_MASK)) | pcie_base_err_default); 2488 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, device_ctl); 2489 2490 return (DDI_SUCCESS); 2491 } 2492 2493 /* ARGSUSED */ 2494 void 2495 pcie_disable_errors(dev_info_t *dip) 2496 { 2497 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2498 uint16_t device_ctl; 2499 uint32_t aer_reg; 2500 2501 if (!PCIE_IS_PCIE(bus_p)) 2502 return; 2503 2504 /* 2505 * Disable PCI-Express Baseline Error Handling 2506 */ 2507 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2508 device_ctl &= ~PCIE_DEVCTL_ERR_MASK; 2509 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, device_ctl); 2510 2511 /* 2512 * Disable PCI-Express Advanced Error Handling if Exists 2513 */ 2514 if (!PCIE_HAS_AER(bus_p)) 2515 goto root; 2516 2517 /* Disable Uncorrectable errors */ 2518 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, PCIE_AER_UCE_BITS); 2519 2520 /* Disable Correctable errors */ 2521 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, PCIE_AER_CE_BITS); 2522 2523 /* Disable ECRC generation and checking */ 2524 if ((aer_reg = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2525 PCI_CAP_EINVAL32) { 2526 aer_reg &= ~(PCIE_AER_CTL_ECRC_GEN_ENA | 2527 PCIE_AER_CTL_ECRC_CHECK_ENA); 2528 2529 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, aer_reg); 2530 } 2531 /* 2532 * Disable Secondary Uncorrectable errors if this is a bridge 2533 */ 2534 if (!PCIE_IS_PCIE_BDG(bus_p)) 2535 goto root; 2536 2537 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, PCIE_AER_SUCE_BITS); 2538 2539 root: 2540 /* 2541 * disable Root Control this is a Root device 2542 */ 2543 if (!PCIE_IS_ROOT(bus_p)) 2544 return; 2545 2546 if (!pcie_serr_disable_flag) { 2547 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL); 2548 device_ctl &= ~PCIE_ROOT_SYS_ERR; 2549 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, device_ctl); 2550 } 2551 2552 if (!PCIE_HAS_AER(bus_p)) 2553 return; 2554 2555 if ((device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2556 PCI_CAP_EINVAL16) { 2557 device_ctl &= ~pcie_root_error_cmd_default; 2558 PCIE_CAP_PUT(16, bus_p, PCIE_AER_RE_CMD, device_ctl); 2559 } 2560 } 2561 2562 /* 2563 * Extract bdf from "reg" property. 2564 */ 2565 int 2566 pcie_get_bdf_from_dip(dev_info_t *dip, pcie_req_id_t *bdf) 2567 { 2568 pci_regspec_t *regspec; 2569 int reglen; 2570 2571 if (ddi_prop_lookup_int_array(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2572 "reg", (int **)®spec, (uint_t *)®len) != DDI_SUCCESS) 2573 return (DDI_FAILURE); 2574 2575 if (reglen < (sizeof (pci_regspec_t) / sizeof (int))) { 2576 ddi_prop_free(regspec); 2577 return (DDI_FAILURE); 2578 } 2579 2580 /* Get phys_hi from first element. All have same bdf. */ 2581 *bdf = (regspec->pci_phys_hi & (PCI_REG_BDFR_M ^ PCI_REG_REG_M)) >> 8; 2582 2583 ddi_prop_free(regspec); 2584 return (DDI_SUCCESS); 2585 } 2586 2587 dev_info_t * 2588 pcie_get_my_childs_dip(dev_info_t *dip, dev_info_t *rdip) 2589 { 2590 dev_info_t *cdip = rdip; 2591 2592 for (; ddi_get_parent(cdip) != dip; cdip = ddi_get_parent(cdip)) 2593 ; 2594 2595 return (cdip); 2596 } 2597 2598 uint32_t 2599 pcie_get_bdf_for_dma_xfer(dev_info_t *dip, dev_info_t *rdip) 2600 { 2601 dev_info_t *cdip; 2602 2603 /* 2604 * As part of the probing, the PCI fcode interpreter may setup a DMA 2605 * request if a given card has a fcode on it using dip and rdip of the 2606 * hotplug connector i.e, dip and rdip of px/pcieb driver. In this 2607 * case, return a invalid value for the bdf since we cannot get to the 2608 * bdf value of the actual device which will be initiating this DMA. 2609 */ 2610 if (rdip == dip) 2611 return (PCIE_INVALID_BDF); 2612 2613 cdip = pcie_get_my_childs_dip(dip, rdip); 2614 2615 /* 2616 * For a given rdip, return the bdf value of dip's (px or pcieb) 2617 * immediate child or secondary bus-id if dip is a PCIe2PCI bridge. 2618 * 2619 * XXX - For now, return a invalid bdf value for all PCI and PCI-X 2620 * devices since this needs more work. 2621 */ 2622 return (PCI_GET_PCIE2PCI_SECBUS(cdip) ? 2623 PCIE_INVALID_BDF : PCI_GET_BDF(cdip)); 2624 } 2625 2626 uint32_t 2627 pcie_get_aer_uce_mask() 2628 { 2629 return (pcie_aer_uce_mask); 2630 } 2631 uint32_t 2632 pcie_get_aer_ce_mask() 2633 { 2634 return (pcie_aer_ce_mask); 2635 } 2636 uint32_t 2637 pcie_get_aer_suce_mask() 2638 { 2639 return (pcie_aer_suce_mask); 2640 } 2641 uint32_t 2642 pcie_get_serr_mask() 2643 { 2644 return (pcie_serr_disable_flag); 2645 } 2646 2647 void 2648 pcie_set_aer_uce_mask(uint32_t mask) 2649 { 2650 pcie_aer_uce_mask = mask; 2651 if (mask & PCIE_AER_UCE_UR) 2652 pcie_base_err_default &= ~PCIE_DEVCTL_UR_REPORTING_EN; 2653 else 2654 pcie_base_err_default |= PCIE_DEVCTL_UR_REPORTING_EN; 2655 2656 if (mask & PCIE_AER_UCE_ECRC) 2657 pcie_ecrc_value = 0; 2658 } 2659 2660 void 2661 pcie_set_aer_ce_mask(uint32_t mask) 2662 { 2663 pcie_aer_ce_mask = mask; 2664 } 2665 void 2666 pcie_set_aer_suce_mask(uint32_t mask) 2667 { 2668 pcie_aer_suce_mask = mask; 2669 } 2670 void 2671 pcie_set_serr_mask(uint32_t mask) 2672 { 2673 pcie_serr_disable_flag = mask; 2674 } 2675 2676 /* 2677 * Is the rdip a child of dip. Used for checking certain CTLOPS from bubbling 2678 * up erronously. Ex. ISA ctlops to a PCI-PCI Bridge. 2679 */ 2680 boolean_t 2681 pcie_is_child(dev_info_t *dip, dev_info_t *rdip) 2682 { 2683 dev_info_t *cdip = ddi_get_child(dip); 2684 for (; cdip; cdip = ddi_get_next_sibling(cdip)) 2685 if (cdip == rdip) 2686 break; 2687 return (cdip != NULL); 2688 } 2689 2690 boolean_t 2691 pcie_is_link_disabled(dev_info_t *dip) 2692 { 2693 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2694 2695 if (PCIE_IS_PCIE(bus_p)) { 2696 if (PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL) & 2697 PCIE_LINKCTL_LINK_DISABLE) 2698 return (B_TRUE); 2699 } 2700 return (B_FALSE); 2701 } 2702 2703 /* 2704 * Determines if there are any root ports attached to a root complex. 2705 * 2706 * dip - dip of root complex 2707 * 2708 * Returns - DDI_SUCCESS if there is at least one root port otherwise 2709 * DDI_FAILURE. 2710 */ 2711 int 2712 pcie_root_port(dev_info_t *dip) 2713 { 2714 int port_type; 2715 uint16_t cap_ptr; 2716 ddi_acc_handle_t config_handle; 2717 dev_info_t *cdip = ddi_get_child(dip); 2718 2719 /* 2720 * Determine if any of the children of the passed in dip 2721 * are root ports. 2722 */ 2723 for (; cdip; cdip = ddi_get_next_sibling(cdip)) { 2724 2725 if (pci_config_setup(cdip, &config_handle) != DDI_SUCCESS) 2726 continue; 2727 2728 if ((PCI_CAP_LOCATE(config_handle, PCI_CAP_ID_PCI_E, 2729 &cap_ptr)) == DDI_FAILURE) { 2730 pci_config_teardown(&config_handle); 2731 continue; 2732 } 2733 2734 port_type = PCI_CAP_GET16(config_handle, 0, cap_ptr, 2735 PCIE_PCIECAP) & PCIE_PCIECAP_DEV_TYPE_MASK; 2736 2737 pci_config_teardown(&config_handle); 2738 2739 if (port_type == PCIE_PCIECAP_DEV_TYPE_ROOT) 2740 return (DDI_SUCCESS); 2741 } 2742 2743 /* No root ports were found */ 2744 2745 return (DDI_FAILURE); 2746 } 2747 2748 /* 2749 * Function that determines if a device a PCIe device. 2750 * 2751 * dip - dip of device. 2752 * 2753 * returns - DDI_SUCCESS if device is a PCIe device, otherwise DDI_FAILURE. 2754 */ 2755 int 2756 pcie_dev(dev_info_t *dip) 2757 { 2758 /* get parent device's device_type property */ 2759 char *device_type; 2760 int rc = DDI_FAILURE; 2761 dev_info_t *pdip = ddi_get_parent(dip); 2762 2763 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, 2764 DDI_PROP_DONTPASS, "device_type", &device_type) 2765 != DDI_PROP_SUCCESS) { 2766 return (DDI_FAILURE); 2767 } 2768 2769 if (strcmp(device_type, "pciex") == 0) 2770 rc = DDI_SUCCESS; 2771 else 2772 rc = DDI_FAILURE; 2773 2774 ddi_prop_free(device_type); 2775 return (rc); 2776 } 2777 2778 void 2779 pcie_set_rber_fatal(dev_info_t *dip, boolean_t val) 2780 { 2781 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2782 bus_p->bus_pfd->pe_rber_fatal = val; 2783 } 2784 2785 /* 2786 * Return parent Root Port's pe_rber_fatal value. 2787 */ 2788 boolean_t 2789 pcie_get_rber_fatal(dev_info_t *dip) 2790 { 2791 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2792 pcie_bus_t *rp_bus_p = PCIE_DIP2UPBUS(bus_p->bus_rp_dip); 2793 return (rp_bus_p->bus_pfd->pe_rber_fatal); 2794 } 2795 2796 int 2797 pcie_ari_supported(dev_info_t *dip) 2798 { 2799 uint32_t devcap2; 2800 uint16_t pciecap; 2801 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2802 uint8_t dev_type; 2803 2804 PCIE_DBG("pcie_ari_supported: dip=%p\n", dip); 2805 2806 if (bus_p == NULL) 2807 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2808 2809 dev_type = bus_p->bus_dev_type; 2810 2811 if ((dev_type != PCIE_PCIECAP_DEV_TYPE_DOWN) && 2812 (dev_type != PCIE_PCIECAP_DEV_TYPE_ROOT)) 2813 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2814 2815 if (pcie_disable_ari) { 2816 PCIE_DBG("pcie_ari_supported: dip=%p: ARI Disabled\n", dip); 2817 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2818 } 2819 2820 pciecap = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 2821 2822 if ((pciecap & PCIE_PCIECAP_VER_MASK) < PCIE_PCIECAP_VER_2_0) { 2823 PCIE_DBG("pcie_ari_supported: dip=%p: Not 2.0\n", dip); 2824 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2825 } 2826 2827 devcap2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCAP2); 2828 2829 PCIE_DBG("pcie_ari_supported: dip=%p: DevCap2=0x%x\n", 2830 dip, devcap2); 2831 2832 if (devcap2 & PCIE_DEVCAP2_ARI_FORWARD) { 2833 PCIE_DBG("pcie_ari_supported: " 2834 "dip=%p: ARI Forwarding is supported\n", dip); 2835 return (PCIE_ARI_FORW_SUPPORTED); 2836 } 2837 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2838 } 2839 2840 int 2841 pcie_ari_enable(dev_info_t *dip) 2842 { 2843 uint16_t devctl2; 2844 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2845 2846 PCIE_DBG("pcie_ari_enable: dip=%p\n", dip); 2847 2848 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2849 return (DDI_FAILURE); 2850 2851 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2852 devctl2 |= PCIE_DEVCTL2_ARI_FORWARD_EN; 2853 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2854 2855 PCIE_DBG("pcie_ari_enable: dip=%p: writing 0x%x to DevCtl2\n", 2856 dip, devctl2); 2857 2858 return (DDI_SUCCESS); 2859 } 2860 2861 int 2862 pcie_ari_disable(dev_info_t *dip) 2863 { 2864 uint16_t devctl2; 2865 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2866 2867 PCIE_DBG("pcie_ari_disable: dip=%p\n", dip); 2868 2869 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2870 return (DDI_FAILURE); 2871 2872 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2873 devctl2 &= ~PCIE_DEVCTL2_ARI_FORWARD_EN; 2874 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2875 2876 PCIE_DBG("pcie_ari_disable: dip=%p: writing 0x%x to DevCtl2\n", 2877 dip, devctl2); 2878 2879 return (DDI_SUCCESS); 2880 } 2881 2882 int 2883 pcie_ari_is_enabled(dev_info_t *dip) 2884 { 2885 uint16_t devctl2; 2886 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2887 2888 PCIE_DBG("pcie_ari_is_enabled: dip=%p\n", dip); 2889 2890 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2891 return (PCIE_ARI_FORW_DISABLED); 2892 2893 devctl2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCTL2); 2894 2895 PCIE_DBG("pcie_ari_is_enabled: dip=%p: DevCtl2=0x%x\n", 2896 dip, devctl2); 2897 2898 if (devctl2 & PCIE_DEVCTL2_ARI_FORWARD_EN) { 2899 PCIE_DBG("pcie_ari_is_enabled: " 2900 "dip=%p: ARI Forwarding is enabled\n", dip); 2901 return (PCIE_ARI_FORW_ENABLED); 2902 } 2903 2904 return (PCIE_ARI_FORW_DISABLED); 2905 } 2906 2907 int 2908 pcie_ari_device(dev_info_t *dip) 2909 { 2910 ddi_acc_handle_t handle; 2911 uint16_t cap_ptr; 2912 2913 PCIE_DBG("pcie_ari_device: dip=%p\n", dip); 2914 2915 /* 2916 * XXX - This function may be called before the bus_p structure 2917 * has been populated. This code can be changed to remove 2918 * pci_config_setup()/pci_config_teardown() when the RFE 2919 * to populate the bus_p structures early in boot is putback. 2920 */ 2921 2922 /* First make sure it is a PCIe device */ 2923 2924 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2925 return (PCIE_NOT_ARI_DEVICE); 2926 2927 if ((PCI_CAP_LOCATE(handle, PCI_CAP_ID_PCI_E, &cap_ptr)) 2928 != DDI_SUCCESS) { 2929 pci_config_teardown(&handle); 2930 return (PCIE_NOT_ARI_DEVICE); 2931 } 2932 2933 /* Locate the ARI Capability */ 2934 2935 if ((PCI_CAP_LOCATE(handle, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), 2936 &cap_ptr)) == DDI_FAILURE) { 2937 pci_config_teardown(&handle); 2938 return (PCIE_NOT_ARI_DEVICE); 2939 } 2940 2941 /* ARI Capability was found so it must be a ARI device */ 2942 PCIE_DBG("pcie_ari_device: ARI Device dip=%p\n", dip); 2943 2944 pci_config_teardown(&handle); 2945 return (PCIE_ARI_DEVICE); 2946 } 2947 2948 int 2949 pcie_ari_get_next_function(dev_info_t *dip, int *func) 2950 { 2951 uint32_t val; 2952 uint16_t cap_ptr, next_function; 2953 ddi_acc_handle_t handle; 2954 2955 /* 2956 * XXX - This function may be called before the bus_p structure 2957 * has been populated. This code can be changed to remove 2958 * pci_config_setup()/pci_config_teardown() when the RFE 2959 * to populate the bus_p structures early in boot is putback. 2960 */ 2961 2962 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2963 return (DDI_FAILURE); 2964 2965 if ((PCI_CAP_LOCATE(handle, 2966 PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), &cap_ptr)) == DDI_FAILURE) { 2967 pci_config_teardown(&handle); 2968 return (DDI_FAILURE); 2969 } 2970 2971 val = PCI_CAP_GET32(handle, 0, cap_ptr, PCIE_ARI_CAP); 2972 2973 next_function = (val >> PCIE_ARI_CAP_NEXT_FUNC_SHIFT) & 2974 PCIE_ARI_CAP_NEXT_FUNC_MASK; 2975 2976 pci_config_teardown(&handle); 2977 2978 *func = next_function; 2979 2980 return (DDI_SUCCESS); 2981 } 2982 2983 dev_info_t * 2984 pcie_func_to_dip(dev_info_t *dip, pcie_req_id_t function) 2985 { 2986 pcie_req_id_t child_bdf; 2987 dev_info_t *cdip; 2988 2989 for (cdip = ddi_get_child(dip); cdip; 2990 cdip = ddi_get_next_sibling(cdip)) { 2991 2992 if (pcie_get_bdf_from_dip(cdip, &child_bdf) == DDI_FAILURE) 2993 return (NULL); 2994 2995 if ((child_bdf & PCIE_REQ_ID_ARI_FUNC_MASK) == function) 2996 return (cdip); 2997 } 2998 return (NULL); 2999 } 3000 3001 #ifdef DEBUG 3002 3003 static void 3004 pcie_print_bus(pcie_bus_t *bus_p) 3005 { 3006 pcie_dbg("\tbus_dip = 0x%p\n", bus_p->bus_dip); 3007 pcie_dbg("\tbus_fm_flags = 0x%x\n", bus_p->bus_fm_flags); 3008 3009 pcie_dbg("\tbus_bdf = 0x%x\n", bus_p->bus_bdf); 3010 pcie_dbg("\tbus_dev_ven_id = 0x%x\n", bus_p->bus_dev_ven_id); 3011 pcie_dbg("\tbus_rev_id = 0x%x\n", bus_p->bus_rev_id); 3012 pcie_dbg("\tbus_hdr_type = 0x%x\n", bus_p->bus_hdr_type); 3013 pcie_dbg("\tbus_dev_type = 0x%x\n", bus_p->bus_dev_type); 3014 pcie_dbg("\tbus_bdg_secbus = 0x%x\n", bus_p->bus_bdg_secbus); 3015 pcie_dbg("\tbus_pcie_off = 0x%x\n", bus_p->bus_pcie_off); 3016 pcie_dbg("\tbus_aer_off = 0x%x\n", bus_p->bus_aer_off); 3017 pcie_dbg("\tbus_pcix_off = 0x%x\n", bus_p->bus_pcix_off); 3018 pcie_dbg("\tbus_ecc_ver = 0x%x\n", bus_p->bus_ecc_ver); 3019 } 3020 3021 /* 3022 * For debugging purposes set pcie_dbg_print != 0 to see printf messages 3023 * during interrupt. 3024 * 3025 * When a proper solution is in place this code will disappear. 3026 * Potential solutions are: 3027 * o circular buffers 3028 * o taskq to print at lower pil 3029 */ 3030 int pcie_dbg_print = 0; 3031 void 3032 pcie_dbg(char *fmt, ...) 3033 { 3034 va_list ap; 3035 3036 if (!pcie_debug_flags) { 3037 return; 3038 } 3039 va_start(ap, fmt); 3040 if (servicing_interrupt()) { 3041 if (pcie_dbg_print) { 3042 prom_vprintf(fmt, ap); 3043 } 3044 } else { 3045 prom_vprintf(fmt, ap); 3046 } 3047 va_end(ap); 3048 } 3049 #endif /* DEBUG */ 3050 3051 boolean_t 3052 pcie_link_bw_supported(dev_info_t *dip) 3053 { 3054 uint32_t linkcap; 3055 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3056 3057 if (!PCIE_IS_PCIE(bus_p)) { 3058 return (B_FALSE); 3059 } 3060 3061 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3062 return (B_FALSE); 3063 } 3064 3065 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 3066 return ((linkcap & PCIE_LINKCAP_LINK_BW_NOTIFY_CAP) != 0); 3067 } 3068 3069 int 3070 pcie_link_bw_enable(dev_info_t *dip) 3071 { 3072 uint16_t linkctl; 3073 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3074 3075 if (pcie_disable_lbw != 0) { 3076 return (DDI_FAILURE); 3077 } 3078 3079 if (!pcie_link_bw_supported(dip)) { 3080 return (DDI_FAILURE); 3081 } 3082 3083 mutex_init(&bus_p->bus_lbw_mutex, NULL, MUTEX_DRIVER, NULL); 3084 cv_init(&bus_p->bus_lbw_cv, NULL, CV_DRIVER, NULL); 3085 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3086 linkctl |= PCIE_LINKCTL_LINK_BW_INTR_EN; 3087 linkctl |= PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3088 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3089 3090 bus_p->bus_lbw_pbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3091 bus_p->bus_lbw_cbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3092 bus_p->bus_lbw_state |= PCIE_LBW_S_ENABLED; 3093 3094 return (DDI_SUCCESS); 3095 } 3096 3097 int 3098 pcie_link_bw_disable(dev_info_t *dip) 3099 { 3100 uint16_t linkctl; 3101 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3102 3103 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3104 return (DDI_FAILURE); 3105 } 3106 3107 mutex_enter(&bus_p->bus_lbw_mutex); 3108 while ((bus_p->bus_lbw_state & 3109 (PCIE_LBW_S_DISPATCHED | PCIE_LBW_S_RUNNING)) != 0) { 3110 cv_wait(&bus_p->bus_lbw_cv, &bus_p->bus_lbw_mutex); 3111 } 3112 mutex_exit(&bus_p->bus_lbw_mutex); 3113 3114 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3115 linkctl &= ~PCIE_LINKCTL_LINK_BW_INTR_EN; 3116 linkctl &= ~PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3117 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3118 3119 bus_p->bus_lbw_state &= ~PCIE_LBW_S_ENABLED; 3120 kmem_free(bus_p->bus_lbw_pbuf, MAXPATHLEN); 3121 kmem_free(bus_p->bus_lbw_cbuf, MAXPATHLEN); 3122 bus_p->bus_lbw_pbuf = NULL; 3123 bus_p->bus_lbw_cbuf = NULL; 3124 3125 mutex_destroy(&bus_p->bus_lbw_mutex); 3126 cv_destroy(&bus_p->bus_lbw_cv); 3127 3128 return (DDI_SUCCESS); 3129 } 3130 3131 void 3132 pcie_link_bw_taskq(void *arg) 3133 { 3134 dev_info_t *dip = arg; 3135 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3136 dev_info_t *cdip; 3137 boolean_t again; 3138 sysevent_t *se; 3139 sysevent_value_t se_val; 3140 sysevent_id_t eid; 3141 sysevent_attr_list_t *ev_attr_list; 3142 3143 top: 3144 ndi_devi_enter(dip); 3145 se = NULL; 3146 ev_attr_list = NULL; 3147 mutex_enter(&bus_p->bus_lbw_mutex); 3148 bus_p->bus_lbw_state &= ~PCIE_LBW_S_DISPATCHED; 3149 bus_p->bus_lbw_state |= PCIE_LBW_S_RUNNING; 3150 mutex_exit(&bus_p->bus_lbw_mutex); 3151 3152 /* 3153 * Update our own speeds as we've likely changed something. 3154 */ 3155 pcie_capture_speeds(dip); 3156 3157 /* 3158 * Walk our children. We only care about updating this on function 0 3159 * because the PCIe specification requires that these all be the same 3160 * otherwise. 3161 */ 3162 for (cdip = ddi_get_child(dip); cdip != NULL; 3163 cdip = ddi_get_next_sibling(cdip)) { 3164 pcie_bus_t *cbus_p = PCIE_DIP2BUS(cdip); 3165 3166 if (cbus_p == NULL) { 3167 continue; 3168 } 3169 3170 if ((cbus_p->bus_bdf & PCIE_REQ_ID_FUNC_MASK) != 0) { 3171 continue; 3172 } 3173 3174 /* 3175 * It's possible that this can fire while a child is otherwise 3176 * only partially constructed. Therefore, if we don't have the 3177 * config handle, don't bother updating the child. 3178 */ 3179 if (cbus_p->bus_cfg_hdl == NULL) { 3180 continue; 3181 } 3182 3183 pcie_capture_speeds(cdip); 3184 break; 3185 } 3186 3187 se = sysevent_alloc(EC_PCIE, ESC_PCIE_LINK_STATE, 3188 ILLUMOS_KERN_PUB "pcie", SE_SLEEP); 3189 3190 (void) ddi_pathname(dip, bus_p->bus_lbw_pbuf); 3191 se_val.value_type = SE_DATA_TYPE_STRING; 3192 se_val.value.sv_string = bus_p->bus_lbw_pbuf; 3193 if (sysevent_add_attr(&ev_attr_list, PCIE_EV_DETECTOR_PATH, &se_val, 3194 SE_SLEEP) != 0) { 3195 ndi_devi_exit(dip); 3196 goto err; 3197 } 3198 3199 if (cdip != NULL) { 3200 (void) ddi_pathname(cdip, bus_p->bus_lbw_cbuf); 3201 3202 se_val.value_type = SE_DATA_TYPE_STRING; 3203 se_val.value.sv_string = bus_p->bus_lbw_cbuf; 3204 3205 /* 3206 * If this fails, that's OK. We'd rather get the event off and 3207 * there's a chance that there may not be anything there for us. 3208 */ 3209 (void) sysevent_add_attr(&ev_attr_list, PCIE_EV_CHILD_PATH, 3210 &se_val, SE_SLEEP); 3211 } 3212 3213 ndi_devi_exit(dip); 3214 3215 /* 3216 * Before we generate and send down a sysevent, we need to tell the 3217 * system that parts of the devinfo cache need to be invalidated. While 3218 * the function below takes several args, it ignores them all. Because 3219 * this is a global invalidation, we don't bother trying to do much more 3220 * than requesting a global invalidation, lest we accidentally kick off 3221 * several in a row. 3222 */ 3223 ddi_prop_cache_invalidate(DDI_DEV_T_NONE, NULL, NULL, 0); 3224 3225 if (sysevent_attach_attributes(se, ev_attr_list) != 0) { 3226 goto err; 3227 } 3228 ev_attr_list = NULL; 3229 3230 if (log_sysevent(se, SE_SLEEP, &eid) != 0) { 3231 goto err; 3232 } 3233 3234 err: 3235 sysevent_free_attr(ev_attr_list); 3236 sysevent_free(se); 3237 3238 mutex_enter(&bus_p->bus_lbw_mutex); 3239 bus_p->bus_lbw_state &= ~PCIE_LBW_S_RUNNING; 3240 cv_broadcast(&bus_p->bus_lbw_cv); 3241 again = (bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) != 0; 3242 mutex_exit(&bus_p->bus_lbw_mutex); 3243 3244 if (again) { 3245 goto top; 3246 } 3247 } 3248 3249 int 3250 pcie_link_bw_intr(dev_info_t *dip) 3251 { 3252 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3253 uint16_t linksts; 3254 uint16_t flags = PCIE_LINKSTS_LINK_BW_MGMT | PCIE_LINKSTS_AUTO_BW; 3255 hrtime_t now; 3256 3257 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3258 return (DDI_INTR_UNCLAIMED); 3259 } 3260 3261 linksts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3262 if ((linksts & flags) == 0) { 3263 return (DDI_INTR_UNCLAIMED); 3264 } 3265 3266 now = gethrtime(); 3267 3268 /* 3269 * Check if we've already dispatched this event. If we have already 3270 * dispatched it, then there's nothing else to do, we coalesce multiple 3271 * events. 3272 */ 3273 mutex_enter(&bus_p->bus_lbw_mutex); 3274 bus_p->bus_lbw_nevents++; 3275 bus_p->bus_lbw_last_ts = now; 3276 if ((bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) == 0) { 3277 if ((bus_p->bus_lbw_state & PCIE_LBW_S_RUNNING) == 0) { 3278 taskq_dispatch_ent(pcie_link_tq, pcie_link_bw_taskq, 3279 dip, 0, &bus_p->bus_lbw_ent); 3280 } 3281 3282 bus_p->bus_lbw_state |= PCIE_LBW_S_DISPATCHED; 3283 } 3284 mutex_exit(&bus_p->bus_lbw_mutex); 3285 3286 PCIE_CAP_PUT(16, bus_p, PCIE_LINKSTS, flags); 3287 return (DDI_INTR_CLAIMED); 3288 } 3289 3290 int 3291 pcie_link_set_target(dev_info_t *dip, pcie_link_speed_t speed) 3292 { 3293 uint16_t ctl2, rval; 3294 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3295 3296 if (!PCIE_IS_PCIE(bus_p)) { 3297 return (ENOTSUP); 3298 } 3299 3300 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3301 return (ENOTSUP); 3302 } 3303 3304 if (bus_p->bus_pcie_vers < 2) { 3305 return (ENOTSUP); 3306 } 3307 3308 switch (speed) { 3309 case PCIE_LINK_SPEED_2_5: 3310 rval = PCIE_LINKCTL2_TARGET_SPEED_2_5; 3311 break; 3312 case PCIE_LINK_SPEED_5: 3313 rval = PCIE_LINKCTL2_TARGET_SPEED_5; 3314 break; 3315 case PCIE_LINK_SPEED_8: 3316 rval = PCIE_LINKCTL2_TARGET_SPEED_8; 3317 break; 3318 case PCIE_LINK_SPEED_16: 3319 rval = PCIE_LINKCTL2_TARGET_SPEED_16; 3320 break; 3321 case PCIE_LINK_SPEED_32: 3322 rval = PCIE_LINKCTL2_TARGET_SPEED_32; 3323 break; 3324 case PCIE_LINK_SPEED_64: 3325 rval = PCIE_LINKCTL2_TARGET_SPEED_64; 3326 break; 3327 default: 3328 return (EINVAL); 3329 } 3330 3331 mutex_enter(&bus_p->bus_speed_mutex); 3332 if ((bus_p->bus_sup_speed & speed) == 0) { 3333 mutex_exit(&bus_p->bus_speed_mutex); 3334 return (ENOTSUP); 3335 } 3336 3337 bus_p->bus_target_speed = speed; 3338 bus_p->bus_speed_flags |= PCIE_LINK_F_ADMIN_TARGET; 3339 3340 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 3341 ctl2 &= ~PCIE_LINKCTL2_TARGET_SPEED_MASK; 3342 ctl2 |= rval; 3343 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL2, ctl2); 3344 mutex_exit(&bus_p->bus_speed_mutex); 3345 3346 /* 3347 * Make sure our updates have been reflected in devinfo. 3348 */ 3349 pcie_capture_speeds(dip); 3350 3351 return (0); 3352 } 3353 3354 int 3355 pcie_link_retrain(dev_info_t *dip) 3356 { 3357 uint16_t ctl; 3358 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3359 3360 if (!PCIE_IS_PCIE(bus_p)) { 3361 return (ENOTSUP); 3362 } 3363 3364 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3365 return (ENOTSUP); 3366 } 3367 3368 /* 3369 * The PCIe specification suggests that we make sure that the link isn't 3370 * in training before issuing this command in case there was a state 3371 * machine transition prior to when we got here. We wait and then go 3372 * ahead and issue the command anyways. 3373 */ 3374 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3375 uint16_t sts; 3376 3377 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3378 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3379 break; 3380 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3381 } 3382 3383 ctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3384 ctl |= PCIE_LINKCTL_RETRAIN_LINK; 3385 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, ctl); 3386 3387 /* 3388 * Wait again to see if it clears before returning to the user. 3389 */ 3390 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3391 uint16_t sts; 3392 3393 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3394 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3395 break; 3396 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3397 } 3398 3399 return (0); 3400 } 3401 3402 /* 3403 * Here we're going through and grabbing information about a given PCIe device. 3404 * Our situation is a little bit complicated at this point. This gets invoked 3405 * both during early initialization and during hotplug events. We cannot rely on 3406 * the device node having been fully set up, that is, while the pcie_bus_t 3407 * normally contains a ddi_acc_handle_t for configuration space, that may not be 3408 * valid yet as this can occur before child initialization or we may be dealing 3409 * with a function that will never have a handle. 3410 * 3411 * However, we should always have a fully furnished pcie_bus_t, which means that 3412 * we can get its bdf and use that to access the devices configuration space. 3413 */ 3414 static int 3415 pcie_fabric_feature_scan(dev_info_t *dip, void *arg) 3416 { 3417 pcie_bus_t *bus_p; 3418 uint32_t devcap; 3419 uint16_t mps; 3420 dev_info_t *rcdip; 3421 pcie_fabric_data_t *fab = arg; 3422 3423 /* 3424 * Skip over non-PCIe devices. If we encounter something here, we don't 3425 * bother going through any of its children because we don't have reason 3426 * to believe that a PCIe device that this will impact will exist below 3427 * this. While it is possible that there's a PCIe fabric downstream an 3428 * intermediate old PCI/PCI-X bus, at that point, we'll still trigger 3429 * our complex fabric detection and use the minimums. 3430 * 3431 * The reason this doesn't trigger an immediate flagging as a complex 3432 * case like the one below is because we could be scanning a device that 3433 * is a nexus driver and has children already (albeit that would be 3434 * somewhat surprising as we don't anticipate being called at this 3435 * point). 3436 */ 3437 if (pcie_dev(dip) != DDI_SUCCESS) { 3438 return (DDI_WALK_PRUNECHILD); 3439 } 3440 3441 /* 3442 * If we fail to find a pcie_bus_t for some reason, that's somewhat 3443 * surprising. We log this fact and set the complex flag and indicate it 3444 * was because of this case. This immediately transitions us to a 3445 * "complex" case which means use the minimal, safe, settings. 3446 */ 3447 bus_p = PCIE_DIP2BUS(dip); 3448 if (bus_p == NULL) { 3449 dev_err(dip, CE_WARN, "failed to find associated pcie_bus_t " 3450 "during fabric scan"); 3451 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3452 return (DDI_WALK_TERMINATE); 3453 } 3454 3455 /* 3456 * In a similar case, there is hardware out there which is a PCIe 3457 * device, but does not advertise a PCIe capability. An example of this 3458 * is the IDT Tsi382A which can hide its PCIe capability. If this is 3459 * the case, we immediately terminate scanning and flag this as a 3460 * 'complex' case which causes us to use guaranteed safe settings. 3461 */ 3462 if (bus_p->bus_pcie_off == 0) { 3463 dev_err(dip, CE_WARN, "encountered PCIe device without PCIe " 3464 "capability"); 3465 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3466 return (DDI_WALK_TERMINATE); 3467 } 3468 3469 rcdip = pcie_get_rc_dip(dip); 3470 3471 /* 3472 * First, start by determining what the device's tagging and max packet 3473 * size is. All PCIe devices will always have the 8-bit tag information 3474 * as this has existed since PCIe 1.0. 10-bit tagging requires a V2 3475 * PCIe capability. 14-bit requires the DEV3 cap. If we are missing a 3476 * version or capability, then we always treat that as lacking the bits 3477 * in the fabric. 3478 */ 3479 ASSERT3U(bus_p->bus_pcie_off, !=, 0); 3480 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3481 PCIE_DEVCAP); 3482 mps = devcap & PCIE_DEVCAP_MAX_PAYLOAD_MASK; 3483 if (mps < fab->pfd_mps_found) { 3484 fab->pfd_mps_found = mps; 3485 } 3486 3487 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) == 0) { 3488 fab->pfd_tag_found &= ~PCIE_TAG_8B; 3489 } 3490 3491 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0) { 3492 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3493 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3494 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_COMP_SUP) == 0) { 3495 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3496 } 3497 } else { 3498 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3499 } 3500 3501 if (bus_p->bus_dev3_off != 0) { 3502 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3503 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3504 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_COMP_SUP) == 0) { 3505 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3506 } 3507 } else { 3508 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3509 } 3510 3511 /* 3512 * Now that we have captured device information, we must go and ask 3513 * questions of the topology here. The big theory statement enumerates 3514 * several types of cases. The big question we need to answer is have we 3515 * encountered a hotpluggable bridge that means we need to mark this as 3516 * complex. 3517 * 3518 * The big theory statement notes several different kinds of hotplug 3519 * topologies that exist that we can theoretically support. Right now we 3520 * opt to keep our lives simple and focus solely on (4) and (5). These 3521 * can both be summarized by a single, fairly straightforward rule: 3522 * 3523 * The only allowed hotpluggable entity is a root port. 3524 * 3525 * The reason that this can work and detect cases like (6), (7), and our 3526 * other invalid ones is that the hotplug code will scan and find all 3527 * children before we are called into here. 3528 */ 3529 if (bus_p->bus_hp_sup_modes != 0) { 3530 /* 3531 * We opt to terminate in this case because there's no value in 3532 * scanning the rest of the tree at this point. 3533 */ 3534 if (!PCIE_IS_RP(bus_p)) { 3535 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3536 return (DDI_WALK_TERMINATE); 3537 } 3538 3539 fab->pfd_flags |= PCIE_FABRIC_F_RP_HP; 3540 } 3541 3542 /* 3543 * As our walk starts at a root port, we need to make sure that we don't 3544 * pick up any of its siblings and their children as those would be 3545 * different PCIe fabric domains for us to scan. In many hardware 3546 * platforms multiple root ports are all at the same level in the tree. 3547 */ 3548 if (bus_p->bus_rp_dip == dip) { 3549 return (DDI_WALK_PRUNESIB); 3550 } 3551 3552 return (DDI_WALK_CONTINUE); 3553 } 3554 3555 static int 3556 pcie_fabric_feature_set(dev_info_t *dip, void *arg) 3557 { 3558 pcie_bus_t *bus_p; 3559 dev_info_t *rcdip; 3560 pcie_fabric_data_t *fab = arg; 3561 uint32_t devcap, devctl; 3562 3563 if (pcie_dev(dip) != DDI_SUCCESS) { 3564 return (DDI_WALK_PRUNECHILD); 3565 } 3566 3567 /* 3568 * The missing bus_t sent us into the complex case previously. We still 3569 * need to make sure all devices have values we expect here and thus 3570 * don't terminate like the above. The same is true for the case where 3571 * there is no PCIe capability. 3572 */ 3573 bus_p = PCIE_DIP2BUS(dip); 3574 if (bus_p == NULL || bus_p->bus_pcie_off == 0) { 3575 return (DDI_WALK_CONTINUE); 3576 } 3577 rcdip = pcie_get_rc_dip(dip); 3578 3579 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3580 PCIE_DEVCAP); 3581 devctl = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3582 PCIE_DEVCTL); 3583 3584 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) != 0 && 3585 (fab->pfd_tag_act & PCIE_TAG_8B) != 0) { 3586 devctl |= PCIE_DEVCTL_EXT_TAG_FIELD_EN; 3587 } 3588 3589 devctl &= ~PCIE_DEVCTL_MAX_PAYLOAD_MASK; 3590 ASSERT0(fab->pfd_mps_act & ~PCIE_DEVCAP_MAX_PAYLOAD_MASK); 3591 devctl |= fab->pfd_mps_act << PCIE_DEVCTL_MAX_PAYLOAD_SHIFT; 3592 3593 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3594 PCIE_DEVCTL, devctl); 3595 3596 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0 && 3597 (fab->pfd_tag_act & PCIE_TAG_10B_COMP) != 0) { 3598 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3599 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3600 3601 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_REQ_SUP) == 0) { 3602 uint16_t devctl2 = pci_cfgacc_get16(rcdip, 3603 bus_p->bus_bdf, bus_p->bus_pcie_off + PCIE_DEVCTL2); 3604 devctl2 |= PCIE_DEVCTL2_10B_TAG_REQ_EN; 3605 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3606 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl2); 3607 } 3608 } 3609 3610 if (bus_p->bus_dev3_off != 0 && 3611 (fab->pfd_tag_act & PCIE_TAG_14B_COMP) != 0) { 3612 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3613 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3614 3615 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_REQ_SUP) == 0) { 3616 uint16_t devctl3 = pci_cfgacc_get16(rcdip, 3617 bus_p->bus_bdf, bus_p->bus_dev3_off + PCIE_DEVCTL3); 3618 devctl3 |= PCIE_DEVCTL3_14B_TAG_REQ_EN; 3619 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3620 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl3); 3621 } 3622 } 3623 3624 /* 3625 * As our walk starts at a root port, we need to make sure that we don't 3626 * pick up any of its siblings and their children as those would be 3627 * different PCIe fabric domains for us to scan. In many hardware 3628 * platforms multiple root ports are all at the same level in the tree. 3629 */ 3630 if (bus_p->bus_rp_dip == dip) { 3631 return (DDI_WALK_PRUNESIB); 3632 } 3633 3634 return (DDI_WALK_CONTINUE); 3635 } 3636 3637 /* 3638 * This is used to scan and determine the total set of PCIe fabric settings that 3639 * we should have in the system for everything downstream of this specified root 3640 * port. Note, it is only really safe to call this while working from the 3641 * perspective of a root port as we will be walking down the entire device tree. 3642 * 3643 * However, our callers, particularly hoptlug, don't have all the information 3644 * we'd like. In particular, we need to check that: 3645 * 3646 * o This is actually a PCIe device. 3647 * o That this is a root port (see the big theory statement to understand this 3648 * constraint). 3649 */ 3650 void 3651 pcie_fabric_setup(dev_info_t *dip) 3652 { 3653 pcie_bus_t *bus_p; 3654 pcie_fabric_data_t *fab; 3655 dev_info_t *pdip; 3656 3657 bus_p = PCIE_DIP2BUS(dip); 3658 if (bus_p == NULL || !PCIE_IS_RP(bus_p)) { 3659 return; 3660 } 3661 3662 VERIFY3P(bus_p->bus_fab, !=, NULL); 3663 fab = bus_p->bus_fab; 3664 3665 /* 3666 * For us to call ddi_walk_devs(), our parent needs to be held. 3667 * ddi_walk_devs() will take care of grabbing our dip as part of its 3668 * walk before we iterate over our children. 3669 * 3670 * A reasonable question to ask here is why is it safe to ask for our 3671 * parent? In this case, because we have entered here through some 3672 * thread that's operating on us whether as part of attach or a hotplug 3673 * event, our dip somewhat by definition has to be valid. If we were 3674 * looking at our dip's children and then asking them for a parent, then 3675 * that would be a race condition. 3676 */ 3677 pdip = ddi_get_parent(dip); 3678 VERIFY3P(pdip, !=, NULL); 3679 ndi_devi_enter(pdip); 3680 fab->pfd_flags |= PCIE_FABRIC_F_SCANNING; 3681 3682 /* 3683 * Reinitialize the tracking structure to basically set the maximum 3684 * caps. These will be chipped away during the scan. 3685 */ 3686 fab->pfd_mps_found = PCIE_DEVCAP_MAX_PAYLOAD_4096; 3687 fab->pfd_tag_found = PCIE_TAG_ALL; 3688 fab->pfd_flags &= ~PCIE_FABRIC_F_COMPLEX; 3689 3690 ddi_walk_devs(dip, pcie_fabric_feature_scan, fab); 3691 3692 if ((fab->pfd_flags & PCIE_FABRIC_F_COMPLEX) != 0) { 3693 fab->pfd_tag_act = PCIE_TAG_5B; 3694 fab->pfd_mps_act = PCIE_DEVCAP_MAX_PAYLOAD_128; 3695 } else { 3696 fab->pfd_tag_act = fab->pfd_tag_found; 3697 fab->pfd_mps_act = fab->pfd_mps_found; 3698 } 3699 3700 ddi_walk_devs(dip, pcie_fabric_feature_set, fab); 3701 3702 fab->pfd_flags &= ~PCIE_FABRIC_F_SCANNING; 3703 ndi_devi_exit(pdip); 3704 } 3705