1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 24 * Copyright 2019 Joyent, Inc. 25 * Copyright 2022 Oxide Computer Company 26 */ 27 28 /* 29 * PCIe Initialization 30 * ------------------- 31 * 32 * The PCIe subsystem is split about and initializes itself in a couple of 33 * different places. This is due to the platform-specific nature of initializing 34 * resources and the nature of the SPARC PROM and how that influenced the 35 * subsystem. Note that traditional PCI (mostly seen these days in Virtual 36 * Machines) follows most of the same basic path outlined here, but skips a 37 * large chunk of PCIe-specific initialization. 38 * 39 * First, there is an initial device discovery phase that is taken care of by 40 * the platform. This is where we discover the set of devices that are present 41 * at system power on. These devices may or may not be hot-pluggable. In 42 * particular, this happens in a platform-specific way right now. In general, we 43 * expect most discovery to be driven by scanning each bus, device, and 44 * function, and seeing what actually exists and responds to configuration space 45 * reads. This is driven via pci_boot.c on x86. This may be seeded by something 46 * like device tree, a PROM, supplemented with ACPI, or by knowledge that the 47 * underlying platform has. 48 * 49 * As a part of this discovery process, the full set of resources that exist in 50 * the system for PCIe are: 51 * 52 * o PCI buses 53 * o Prefetchable Memory 54 * o Non-prefetchable memory 55 * o I/O ports 56 * 57 * This process is driven by a platform's PCI platform Resource Discovery (PRD) 58 * module. The PRD definitions can be found in <sys/plat/pci_prd.h> and are used 59 * to discover these resources, which will be converted into the initial set of 60 * the standard properties in the system: 'regs', 'available', 'ranges', etc. 61 * Currently it is up to platform-specific code (which should ideally be 62 * consolidated at some point) to set up all these properties. 63 * 64 * As a part of the discovery process, the platform code will create a device 65 * node (dev_info_t) for each discovered function and will create a PCIe nexus 66 * for each overall root complex that exists in the system. Most root complexes 67 * will have multiple root ports, each of which is the foundation of an 68 * independent PCIe bus due to the point-to-point nature of PCIe. When a root 69 * complex is found, a nexus driver such as npe (Nexus for PCIe Express) is 70 * attached. In the case of a non-PCIe-capable system this is where the older 71 * pci nexus driver would be used instead. 72 * 73 * To track data about a given device on a bus, a 'pcie_bus_t' structure is 74 * created for and assigned to every PCIe-based dev_info_t. This can be used to 75 * find the root port and get basic information about the device, its faults, 76 * and related information. This contains pointers to the corresponding root 77 * port as well. 78 * 79 * A root complex has its pcie_bus_t initialized as part of the device discovery 80 * process. That is, because we're trying to bootstrap the actual tree and most 81 * platforms don't have a representation for this that's explicitly 82 * discoverable, this is created manually. See callers of pcie_rc_init_bus(). 83 * 84 * For other devices, bridges, and switches, the process is split into two. 85 * There is an initial pcie_bus_t that is created which will exist before we go 86 * through the actual driver attachment process. For example, on x86 this is 87 * done as part of the device and function discovery. The second pass of 88 * initialization is done only after the nexus driver actually is attached and 89 * it goes through and finishes processing all of its children. 90 * 91 * Child Initialization 92 * -------------------- 93 * 94 * Generally speaking, the platform will first enumerate all PCIe devices that 95 * are in the sytem before it actually creates a device tree. This is part of 96 * the bus/device/function scanning that is performed and from that dev_info_t 97 * nodes are created for each discovered device and are inserted into the 98 * broader device tree. Later in boot, the actual device tree is walked and the 99 * nodes go through the standard dev_info_t initialization process (DS_PROTO, 100 * DS_LINKED, DS_BOUND, etc.). 101 * 102 * PCIe-specific initialization can roughly be broken into the following pieces: 103 * 104 * 1. Platform initial discovery and resource assignment 105 * 2. The pcie_bus_t initialization 106 * 3. Nexus driver child initialization 107 * 4. Fabric initialization 108 * 5. Device driver-specific initialization 109 * 110 * The first part of this (1) and (2) are discussed in the previous section. 111 * Part (1) in particular is a combination of the PRD (platform resource 112 * discovery) and general device initialization. After this, because we have a 113 * device tree, most of the standard nexus initialization happens. 114 * 115 * (5) is somewhat simple, so let's get into it before we discuss (3) and (4). 116 * This is the last thing that is called and that happens after all of the 117 * others are done. This is the logic that occurs in a driver's attach(9E) entry 118 * point. This is always device-specific and generally speaking should not be 119 * manipulating standard PCIe registers directly on their own. For example, the 120 * MSI/MSI-X, AER, Serial Number, etc. capabilities will be automatically dealt 121 * with by the framework in (3) and (4) below. In many cases, particularly 122 * things that are part of (4), adjusting them in the individual driver is not 123 * safe. 124 * 125 * Finally, let's talk about (3) and (4) as these are related. The NDI provides 126 * for a standard hook for a nexus to initialize its children. In our platforms, 127 * there are basically two possible PCIe nexus drivers: there is the generic 128 * pcieb -- PCIe bridge -- driver which is used for standard root ports, 129 * switches, etc. Then there is the platform-specific primary nexus driver, 130 * which is being slowly consolidated into a single one where it makes sense. An 131 * example of this is npe. 132 * 133 * Each of these has a child initialization function which is called from their 134 * DDI_CTLOPS_INITCHILD operation on the bus_ctl function pointer. This goes 135 * through and initializes a large number of different pieces of PCIe-based 136 * settings through the common pcie_initchild() function. This takes care of 137 * things like: 138 * 139 * o Advanced Error Reporting 140 * o Alternative Routing 141 * o Capturing information around link speed, width, serial numbers, etc. 142 * o Setting common properties around aborts 143 * 144 * There are a few caveats with this that need to be kept in mind: 145 * 146 * o A dev_info_t indicates a specific function. This means that a 147 * multi-function device will not all be initialized at the same time and 148 * there is no guarantee that all children will be initialized before one of 149 * them is attached. 150 * o A child is only initialized if we have found a driver that matches an 151 * alias in the dev_info_t's compatible array property. While a lot of 152 * multi-function devices are often multiple instances of the same thing 153 * (e.g. a multi-port NIC with a function / NIC), this is not always the 154 * case and one cannot make any assumptions here. 155 * 156 * This in turn leads to the next form of initialization that takes place in the 157 * case of (4). This is where we take care of things that need to be consistent 158 * across either entire devices or more generally across an entire root port and 159 * all of its children. There are a few different examples of this: 160 * 161 * o Setting the maximum packet size 162 * o Determining the tag width 163 * 164 * Note that features which are only based on function 0, such as ASPM (Active 165 * State Power Management), hardware autonomous width disable, etc. ultimately 166 * do not go through this path today. There are some implications here in that 167 * today several of these things are captured on functions which may not have 168 * any control here. This is an area of needed improvement. 169 * 170 * The settings in (4) are initialized in a common way, via 171 * pcie_fabric_setup(). This is called into from two different parts of 172 * the stack: 173 * 174 * 1. When we attach a root port, which is driven by pcieb. 175 * 2. When we have a hotplug event that adds a device. 176 * 177 * In general here we are going to use the term 'fabric' to refer to everything 178 * that is downstream of a root port. This corresponds to what the PCIe 179 * specification calls a 'hierarchy domain'. Strictly speaking, this is fine 180 * until peer-to-peer requests begin to happen that cause you to need to forward 181 * things across root ports. At that point the scope of the fabric increases and 182 * these settings become more complicated. We currently optimize for the much 183 * more common case, which is that each root port is effectively independent 184 * from a PCIe transaction routing perspective. 185 * 186 * Put differently, we use the term 'fabric' to refer to a set of PCIe devices 187 * that can route transactions to one another, which is generally constrained to 188 * everything under a root port and that root ports are independent. If this 189 * constraint changes, then all one needs to do is replace the discussion of the 190 * root port below with the broader root complex and system. 191 * 192 * A challenge with these settings is that once they're set and devices are 193 * actively making requests, we cannot really change them without resetting the 194 * links and cancelling all outstanding transactions via device resets. Because 195 * this is not something that we want to do, we instead look at how and when we 196 * set this to constrain what's going on. 197 * 198 * Because of this we basically say that if a given fabric has more than one 199 * hot-plug capable device that's encountered, then we have to use safe defaults 200 * (which we can allow an operator to tune eventually via pcieadm). If we have a 201 * mix of non-hotpluggable slots with downstream endpoints present and 202 * hot-pluggable slots, then we're in this case. If we don't have hot-pluggable 203 * slots, then we can have an arbitrarily complex setup. Let's look at a few of 204 * these visually: 205 * 206 * In the following diagrams, RP stands for Root Port, EP stands for Endpoint. 207 * If something is hot-pluggable, then we label it with (HP). 208 * 209 * (1) RP --> EP 210 * (2) RP --> Switch --> EP 211 * +--> EP 212 * +--> EP 213 * 214 * (3) RP --> Switch --> EP 215 * +--> EP 216 * +--> Switch --> EP 217 * +--> EP 218 * +--> EP 219 * 220 * 221 * (4) RP (HP) --> EP 222 * (5) RP (HP) --> Switch --> EP 223 * +--> EP 224 * +--> EP 225 * 226 * (6) RP --> Switch (HP) --> EP 227 * (7) RP (HP) --> Switch (HP) --> EP 228 * 229 * If we look at all of these, these are all cases where it's safe for us to set 230 * things based on all devices. (1), (2), and (3) are straightforward because 231 * they have no hot-pluggable elements. This means that nothing should come/go 232 * on the system and we can set up fabric-wide properties as part of the root 233 * port. 234 * 235 * Case (4) is the most standard one that we encounter for hot-plug. Here you 236 * have a root port directly connected to an endpoint. The most common example 237 * would be an NVMe device plugged into a root port. Case (5) is interesting to 238 * highlight. While there is a switch and multiple endpoints there, they are 239 * showing up as a unit. This ends up being a weirder variant of (4), but it is 240 * safe for us to set advanced properties because we can figure out what the 241 * total set should be. 242 * 243 * Now, the more interesting bits here are (6) and (7). The reason that (6) 244 * works is that ultimately there is only a single down-stream port here that is 245 * hot-pluggable and all non-hotpluggable ports do not have a device present, 246 * which suggests that they will never have a device present. (7) also could be 247 * made to work by making the observation that if there's truly only one 248 * endpoint in a fabric, it doesn't matter how many switches there are that are 249 * hot-pluggable. This would only hold if we can assume for some reason that no 250 * other endpoints could be added. 251 * 252 * In turn, let's look at several cases that we believe aren't safe: 253 * 254 * (8) RP --> Switch --> EP 255 * +--> EP 256 * (HP) +--> EP 257 * 258 * (9) RP --> Switch (HP) +--> EP 259 * (HP) +--> EP 260 * 261 * (10) RP (HP) --> Switch (HP) +--> EP 262 * (HP) +--> EP 263 * 264 * All of these are situations where it's much more explicitly unsafe. Let's 265 * take (8). The problem here is that the devices on the non-hotpluggable 266 * downstream switches are always there and we should assume all device drivers 267 * will be active and performing I/O when the hot-pluggable slot changes. If the 268 * hot-pluggable slot has a lower max payload size, then we're mostly out of 269 * luck. The case of (9) is very similar to (8), just that we have more hot-plug 270 * capable slots. 271 * 272 * Finally (10) is a case of multiple instances of hotplug. (9) and (10) are the 273 * more general case of (6) and (7). While we can try to detect (6) and (7) more 274 * generally or try to make it safe, we're going to start with a simpler form of 275 * detection for this, which roughly follows the following rules: 276 * 277 * o If there are no hot-pluggable slots in an entire fabric, then we can set 278 * all fabric properties based on device capabilities. 279 * o If we encounter a hot-pluggable slot, we can only set fabric properties 280 * based on device capabilities if: 281 * 282 * 1. The hotpluggable slot is a root port. 283 * 2. There are no other hotpluggable devices downstream of it. 284 * 285 * Otherwise, if neither of the above is true, then we must use the basic PCIe 286 * defaults for various fabric-wide properties (discussed below). Even in these 287 * more complicated cases, device-specific properties such as the configuration 288 * of AERs, ASPM, etc. are still handled in the general pcie_init_bus() and 289 * related discussed earlier here. 290 * 291 * Because the only fabrics that we'll change are those that correspond to root 292 * ports, we will only call into the actual fabric feature setup when one of 293 * those changes. This has the side effect of simplifying locking. When we make 294 * changes here we need to be able to hold the entire device tree under the root 295 * port (including the root port and its parent). This is much harder to do 296 * safely when starting in the middle of the tree. 297 * 298 * Handling of Specific Properties 299 * ------------------------------- 300 * 301 * This section goes into the rationale behind how we initialize and program 302 * various parts of the PCIe stack. 303 * 304 * 5-, 8-, 10- AND 14-BIT TAGS 305 * 306 * Tags are part of PCIe transactions and when combined with a device identifier 307 * are used to uniquely identify a transaction. In PCIe parlance, a Requester 308 * (someone who initiates a PCIe request) sets a unique tag in the request and 309 * the Completer (someone who processes and responds to a PCIe request) echoes 310 * the tag back. This means that a requester generally is responsible for 311 * ensuring that they don't reuse a tag between transactions. 312 * 313 * Thus the number of tags that a device has relates to the number of 314 * outstanding transactions that it can have, which are usually tied to the 315 * number of outstanding DMA transfers. The size of these transactions is also 316 * then scoped by the handling of the Maximum Packet Payload. 317 * 318 * In PCIe 1.0, devices default to a 5-bit tag. There was also an option to 319 * support an 8-bit tag. The 8-bit extended tag did not distinguish between a 320 * Requester or Completer. There was a bit to indicate device support of 8-bit 321 * tags in the Device Capabilities Register of the PCIe Capability and a 322 * separate bit to enable it in the Device Control Register of the PCIe 323 * Capability. 324 * 325 * In PCIe 4.0, support for a 10-bit tag was added. The specification broke 326 * apart the support bit into multiple pieces. In particular, in the Device 327 * Capabilities 2 register of the PCIe Capability there is a separate bit to 328 * indicate whether the device supports 10-bit completions and 10-bit requests. 329 * All PCIe 4.0 compliant devices are required to support 10-bit tags if they 330 * operate at 16.0 GT/s speed (a PCIe Gen 4 compliant device does not have to 331 * operate at Gen 4 speeds). 332 * 333 * This allows a device to support 10-bit completions but not 10-bit requests. 334 * A device that supports 10-bit requests is required to support 10-bit 335 * completions. There is no ability to enable or disable 10-bit completion 336 * support in the Device Capabilities 2 register. There is only a bit to enable 337 * 10-bit requests. This distinction makes our life easier as this means that as 338 * long as the entire fabric supports 10-bit completions, it doesn't matter if 339 * not all devices support 10-bit requests and we can enable them as required. 340 * More on this in a bit. 341 * 342 * In PCIe 6.0, another set of bits was added for 14-bit tags. These follow the 343 * same pattern as the 10-bit tags. The biggest difference is that the 344 * capabilities and control for these are found in the Device Capabilities 3 345 * and Device Control 3 register of the Device 3 Extended Capability. Similar to 346 * what we see with 10-bit tags, requesters are required to support the 347 * completer capability. The only control bit is for whether or not they enable 348 * a 14-bit requester. 349 * 350 * PCIe switches which sit between root ports and endpoints and show up to 351 * software as a set of bridges. Bridges generally don't have to know about tags 352 * as they are usually neither requesters or completers (unless directly talking 353 * to the bridge instance). That is they are generally required to forward 354 * packets without modifying them. This works until we deal with switch error 355 * handling. At that point, the switch may try to interpret the transaction and 356 * if it doesn't understand the tagging scheme in use, return the transaction to 357 * with the wrong tag and also an incorrectly diagnosed error (usually a 358 * malformed TLP). 359 * 360 * With all this, we construct a somewhat simple policy of how and when we 361 * enable extended tags: 362 * 363 * o If we have a complex hotplug-capable fabric (based on the discussion 364 * earlier in fabric-specific settings), then we cannot enable any of the 365 * 8-bit, 10-bit, and 14-bit tagging features. This is due to the issues 366 * with intermediate PCIe switches and related. 367 * 368 * o If every device supports 8-bit capable tags, then we will go through and 369 * enable those everywhere. 370 * 371 * o If every device supports 10-bit capable completions, then we will enable 372 * 10-bit requester on every device that supports it. 373 * 374 * o If every device supports 14-bit capable completions, then we will enable 375 * 14-bit requesters on every device that supports it. 376 * 377 * This is the simpler end of the policy and one that is relatively easy to 378 * implement. While we could attempt to relax the constraint that every device 379 * in the fabric implement these features by making assumptions about peer-to- 380 * peer requests (that is devices at the same layer in the tree won't talk to 381 * one another), that is a lot of complexity. For now, we leave such an 382 * implementation to those who need it in the future. 383 * 384 * MAX PAYLOAD SIZE 385 * 386 * When performing transactions on the PCIe bus, a given transaction has a 387 * maximum allowed size. This size is called the MPS or 'Maximum Payload Size'. 388 * A given device reports its maximum supported size in the Device Capabilities 389 * register of the PCIe Capability. It is then set in the Device Control 390 * register. 391 * 392 * One of the challenges with this value is that different functions of a device 393 * have independent values, but strictly speaking are required to actually have 394 * the same value programmed in all of them lest device behavior goes awry. When 395 * a device has the ARI (alternative routing ID) capability enabled, then only 396 * function 0 controls the actual payload size. 397 * 398 * The settings for this need to be consistent throughout the fabric. A 399 * Transmitter is not allowed to create a TLP that exceeds its maximum packet 400 * size and a Receiver is not allowed to receive a packet that exceeds its 401 * maximum packet size. In all of these cases, this would result in something 402 * like a malformed TLP error. 403 * 404 * Effectively, this means that everything on a given fabric must have the same 405 * value programmed in its Device Control register for this value. While in the 406 * case of tags, switches generally weren't completers or requesters, here every 407 * device along the path is subject to this. This makes the actual value that we 408 * set throughout the fabric even more important and the constraints of hotplug 409 * even worse to deal with. 410 * 411 * Because a hotplug device can be inserted with any packet size, if we hit 412 * anything other than the simple hotplug cases discussed in the fabric-specific 413 * settings section, then we must use the smallest size of 128 byte payloads. 414 * This is because a device could be plugged in that supports something smaller 415 * than we had otherwise set. If there are other active devices, those could not 416 * be changed without quiescing the entire fabric. As such our algorithm is as 417 * follows: 418 * 419 * 1. Scan the entire fabric, keeping track of the smallest seen MPS in the 420 * Device Capabilities Register. 421 * 2. If we have a complex fabric, program each Device Control register with 422 * a 128 byte maximum payload size, otherwise, program it with the 423 * discovered value. 424 * 425 * 426 * MAX READ REQUEST SIZE 427 * 428 * The maximum read request size (mrrs) is a much more confusing thing when 429 * compared to the maximum payload size counterpart. The maximum payload size 430 * (MPS) above is what restricts the actual size of a TLP. The mrrs value 431 * is used to control part of the behavior of Memory Read Request, which is not 432 * strictly speaking subject to the MPS. A PCIe device is allowed to respond to 433 * a Memory Read Request with less bytes than were actually requested in a 434 * single completion. In general, the default size that a root complex and its 435 * root port will reply to are based around the length of a cache line. 436 * 437 * What this ultimately controls is the number of requests that the Requester 438 * has to make and trades off bandwidth, bus sharing, and related here. For 439 * example, if the maximum read request size is 4 KiB, then the requester would 440 * only issue a single read request asking for 4 KiB. It would still receive 441 * these as multiple packets in units of the MPS. If however, the maximum read 442 * request was only say 512 B, then it would need to make 8 separate requests, 443 * potentially increasing latency. On the other hand, if systems are relying on 444 * total requests for QoS, then it's important to set it to something that's 445 * closer to the actual MPS. 446 * 447 * Traditionally, the OS has not been the most straightforward about this. It's 448 * important to remember that setting this up is also somewhat in the realm of 449 * system firmware. Due to the PCI Firmware specification, the firmware may have 450 * set up a value for not just the MRRS but also the MPS. As such, our logic 451 * basically left the MRRS alone and used whatever the device had there as long 452 * as we weren't shrinking the device's MPS. If we were, then we'd set it to the 453 * MPS. If the device was a root port, then it was just left at a system wide 454 * and PCIe default of 512 bytes. 455 * 456 * If we survey firmware (which isn't easy due to its nature), we have seen most 457 * cases where the firmware just doesn't do anything and leaves it to the 458 * device's default, which is basically just the PCIe default, unless it has a 459 * specific knowledge of something like say wanting to do something for an NVMe 460 * device. The same is generally true of other systems, leaving it at its 461 * default unless otherwise set by a device driver. 462 * 463 * Because this value doesn't really have the same constraints as other fabric 464 * properties, this becomes much simpler and we instead opt to set it as part of 465 * the device node initialization. In addition, there are no real rules about 466 * different functions having different values here as it doesn't really impact 467 * the TLP processing the same way that the MPS does. 468 * 469 * While we should add a fuller way of setting this and allowing operator 470 * override of the MRRS based on things like device class, etc. that is driven 471 * by pcieadm, that is left to the future. For now we opt to that all devices 472 * are kept at their default (512 bytes or whatever firmware left behind) and we 473 * ensure that root ports always have the mrrs set to 512. 474 */ 475 476 #include <sys/sysmacros.h> 477 #include <sys/types.h> 478 #include <sys/kmem.h> 479 #include <sys/modctl.h> 480 #include <sys/ddi.h> 481 #include <sys/sunddi.h> 482 #include <sys/sunndi.h> 483 #include <sys/fm/protocol.h> 484 #include <sys/fm/util.h> 485 #include <sys/promif.h> 486 #include <sys/disp.h> 487 #include <sys/stat.h> 488 #include <sys/file.h> 489 #include <sys/pci_cap.h> 490 #include <sys/pci_impl.h> 491 #include <sys/pcie_impl.h> 492 #include <sys/hotplug/pci/pcie_hp.h> 493 #include <sys/hotplug/pci/pciehpc.h> 494 #include <sys/hotplug/pci/pcishpc.h> 495 #include <sys/hotplug/pci/pcicfg.h> 496 #include <sys/pci_cfgacc.h> 497 #include <sys/sysevent.h> 498 #include <sys/sysevent/eventdefs.h> 499 #include <sys/sysevent/pcie.h> 500 501 /* Local functions prototypes */ 502 static void pcie_init_pfd(dev_info_t *); 503 static void pcie_fini_pfd(dev_info_t *); 504 505 #if defined(__x86) 506 static void pcie_check_io_mem_range(ddi_acc_handle_t, boolean_t *, boolean_t *); 507 #endif /* defined(__x86) */ 508 509 #ifdef DEBUG 510 uint_t pcie_debug_flags = 0; 511 static void pcie_print_bus(pcie_bus_t *bus_p); 512 void pcie_dbg(char *fmt, ...); 513 #endif /* DEBUG */ 514 515 /* Variable to control default PCI-Express config settings */ 516 ushort_t pcie_command_default = 517 PCI_COMM_SERR_ENABLE | 518 PCI_COMM_WAIT_CYC_ENAB | 519 PCI_COMM_PARITY_DETECT | 520 PCI_COMM_ME | 521 PCI_COMM_MAE | 522 PCI_COMM_IO; 523 524 /* xxx_fw are bits that are controlled by FW and should not be modified */ 525 ushort_t pcie_command_default_fw = 526 PCI_COMM_SPEC_CYC | 527 PCI_COMM_MEMWR_INVAL | 528 PCI_COMM_PALETTE_SNOOP | 529 PCI_COMM_WAIT_CYC_ENAB | 530 0xF800; /* Reserved Bits */ 531 532 ushort_t pcie_bdg_command_default_fw = 533 PCI_BCNF_BCNTRL_ISA_ENABLE | 534 PCI_BCNF_BCNTRL_VGA_ENABLE | 535 0xF000; /* Reserved Bits */ 536 537 /* PCI-Express Base error defaults */ 538 ushort_t pcie_base_err_default = 539 PCIE_DEVCTL_CE_REPORTING_EN | 540 PCIE_DEVCTL_NFE_REPORTING_EN | 541 PCIE_DEVCTL_FE_REPORTING_EN | 542 PCIE_DEVCTL_UR_REPORTING_EN; 543 544 /* PCI-Express Device Control Register */ 545 uint16_t pcie_devctl_default = PCIE_DEVCTL_RO_EN | 546 PCIE_DEVCTL_MAX_READ_REQ_512; 547 548 /* PCI-Express AER Root Control Register */ 549 #define PCIE_ROOT_SYS_ERR (PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | \ 550 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | \ 551 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN) 552 553 ushort_t pcie_root_ctrl_default = 554 PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | 555 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | 556 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN; 557 558 /* PCI-Express Root Error Command Register */ 559 ushort_t pcie_root_error_cmd_default = 560 PCIE_AER_RE_CMD_CE_REP_EN | 561 PCIE_AER_RE_CMD_NFE_REP_EN | 562 PCIE_AER_RE_CMD_FE_REP_EN; 563 564 /* ECRC settings in the PCIe AER Control Register */ 565 uint32_t pcie_ecrc_value = 566 PCIE_AER_CTL_ECRC_GEN_ENA | 567 PCIE_AER_CTL_ECRC_CHECK_ENA; 568 569 /* 570 * If a particular platform wants to disable certain errors such as UR/MA, 571 * instead of using #defines have the platform's PCIe Root Complex driver set 572 * these masks using the pcie_get_XXX_mask and pcie_set_XXX_mask functions. For 573 * x86 the closest thing to a PCIe root complex driver is NPE. For SPARC the 574 * closest PCIe root complex driver is PX. 575 * 576 * pcie_serr_disable_flag : disable SERR only (in RCR and command reg) x86 577 * systems may want to disable SERR in general. For root ports, enabling SERR 578 * causes NMIs which are not handled and results in a watchdog timeout error. 579 */ 580 uint32_t pcie_aer_uce_mask = 0; /* AER UE Mask */ 581 uint32_t pcie_aer_ce_mask = 0; /* AER CE Mask */ 582 uint32_t pcie_aer_suce_mask = 0; /* AER Secondary UE Mask */ 583 uint32_t pcie_serr_disable_flag = 0; /* Disable SERR */ 584 585 /* Default severities needed for eversholt. Error handling doesn't care */ 586 uint32_t pcie_aer_uce_severity = PCIE_AER_UCE_MTLP | PCIE_AER_UCE_RO | \ 587 PCIE_AER_UCE_FCP | PCIE_AER_UCE_SD | PCIE_AER_UCE_DLP | \ 588 PCIE_AER_UCE_TRAINING; 589 uint32_t pcie_aer_suce_severity = PCIE_AER_SUCE_SERR_ASSERT | \ 590 PCIE_AER_SUCE_UC_ADDR_ERR | PCIE_AER_SUCE_UC_ATTR_ERR | \ 591 PCIE_AER_SUCE_USC_MSG_DATA_ERR; 592 593 int pcie_disable_ari = 0; 594 595 /* 596 * On some platforms, such as the AMD B450 chipset, we've seen an odd 597 * relationship between enabling link bandwidth notifications and AERs about 598 * ECRC errors. This provides a mechanism to disable it. 599 */ 600 int pcie_disable_lbw = 0; 601 602 /* 603 * Amount of time to wait for an in-progress retraining. The default is to try 604 * 500 times in 10ms chunks, thus a total of 5s. 605 */ 606 uint32_t pcie_link_retrain_count = 500; 607 uint32_t pcie_link_retrain_delay_ms = 10; 608 609 taskq_t *pcie_link_tq; 610 kmutex_t pcie_link_tq_mutex; 611 612 static int pcie_link_bw_intr(dev_info_t *); 613 static void pcie_capture_speeds(dev_info_t *); 614 615 dev_info_t *pcie_get_rc_dip(dev_info_t *dip); 616 617 /* 618 * modload support 619 */ 620 621 static struct modlmisc modlmisc = { 622 &mod_miscops, /* Type of module */ 623 "PCI Express Framework Module" 624 }; 625 626 static struct modlinkage modlinkage = { 627 MODREV_1, 628 (void *)&modlmisc, 629 NULL 630 }; 631 632 /* 633 * Global Variables needed for a non-atomic version of ddi_fm_ereport_post. 634 * Currently used to send the pci.fabric ereports whose payload depends on the 635 * type of PCI device it is being sent for. 636 */ 637 char *pcie_nv_buf; 638 nv_alloc_t *pcie_nvap; 639 nvlist_t *pcie_nvl; 640 641 int 642 _init(void) 643 { 644 int rval; 645 646 pcie_nv_buf = kmem_alloc(ERPT_DATA_SZ, KM_SLEEP); 647 pcie_nvap = fm_nva_xcreate(pcie_nv_buf, ERPT_DATA_SZ); 648 pcie_nvl = fm_nvlist_create(pcie_nvap); 649 mutex_init(&pcie_link_tq_mutex, NULL, MUTEX_DRIVER, NULL); 650 651 if ((rval = mod_install(&modlinkage)) != 0) { 652 mutex_destroy(&pcie_link_tq_mutex); 653 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 654 fm_nva_xdestroy(pcie_nvap); 655 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 656 } 657 return (rval); 658 } 659 660 int 661 _fini() 662 { 663 int rval; 664 665 if ((rval = mod_remove(&modlinkage)) == 0) { 666 if (pcie_link_tq != NULL) { 667 taskq_destroy(pcie_link_tq); 668 } 669 mutex_destroy(&pcie_link_tq_mutex); 670 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 671 fm_nva_xdestroy(pcie_nvap); 672 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 673 } 674 return (rval); 675 } 676 677 int 678 _info(struct modinfo *modinfop) 679 { 680 return (mod_info(&modlinkage, modinfop)); 681 } 682 683 /* ARGSUSED */ 684 int 685 pcie_init(dev_info_t *dip, caddr_t arg) 686 { 687 int ret = DDI_SUCCESS; 688 689 /* 690 * Our _init function is too early to create a taskq. Create the pcie 691 * link management taskq here now instead. 692 */ 693 mutex_enter(&pcie_link_tq_mutex); 694 if (pcie_link_tq == NULL) { 695 pcie_link_tq = taskq_create("pcie_link", 1, minclsyspri, 0, 0, 696 0); 697 } 698 mutex_exit(&pcie_link_tq_mutex); 699 700 701 /* 702 * Create a "devctl" minor node to support DEVCTL_DEVICE_* 703 * and DEVCTL_BUS_* ioctls to this bus. 704 */ 705 if ((ret = ddi_create_minor_node(dip, "devctl", S_IFCHR, 706 PCI_MINOR_NUM(ddi_get_instance(dip), PCI_DEVCTL_MINOR), 707 DDI_NT_NEXUS, 0)) != DDI_SUCCESS) { 708 PCIE_DBG("Failed to create devctl minor node for %s%d\n", 709 ddi_driver_name(dip), ddi_get_instance(dip)); 710 711 return (ret); 712 } 713 714 if ((ret = pcie_hp_init(dip, arg)) != DDI_SUCCESS) { 715 /* 716 * On some x86 platforms, we observed unexpected hotplug 717 * initialization failures in recent years. The known cause 718 * is a hardware issue: while the problem PCI bridges have 719 * the Hotplug Capable registers set, the machine actually 720 * does not implement the expected ACPI object. 721 * 722 * We don't want to stop PCI driver attach and system boot 723 * just because of this hotplug initialization failure. 724 * Continue with a debug message printed. 725 */ 726 PCIE_DBG("%s%d: Failed setting hotplug framework\n", 727 ddi_driver_name(dip), ddi_get_instance(dip)); 728 729 #if defined(__sparc) 730 ddi_remove_minor_node(dip, "devctl"); 731 732 return (ret); 733 #endif /* defined(__sparc) */ 734 } 735 736 return (DDI_SUCCESS); 737 } 738 739 /* ARGSUSED */ 740 int 741 pcie_uninit(dev_info_t *dip) 742 { 743 int ret = DDI_SUCCESS; 744 745 if (pcie_ari_is_enabled(dip) == PCIE_ARI_FORW_ENABLED) 746 (void) pcie_ari_disable(dip); 747 748 if ((ret = pcie_hp_uninit(dip)) != DDI_SUCCESS) { 749 PCIE_DBG("Failed to uninitialize hotplug for %s%d\n", 750 ddi_driver_name(dip), ddi_get_instance(dip)); 751 752 return (ret); 753 } 754 755 if (pcie_link_bw_supported(dip)) { 756 (void) pcie_link_bw_disable(dip); 757 } 758 759 ddi_remove_minor_node(dip, "devctl"); 760 761 return (ret); 762 } 763 764 /* 765 * PCIe module interface for enabling hotplug interrupt. 766 * 767 * It should be called after pcie_init() is done and bus driver's 768 * interrupt handlers have being attached. 769 */ 770 int 771 pcie_hpintr_enable(dev_info_t *dip) 772 { 773 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 774 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 775 776 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 777 (void) (ctrl_p->hc_ops.enable_hpc_intr)(ctrl_p); 778 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 779 (void) pcishpc_enable_irqs(ctrl_p); 780 } 781 return (DDI_SUCCESS); 782 } 783 784 /* 785 * PCIe module interface for disabling hotplug interrupt. 786 * 787 * It should be called before pcie_uninit() is called and bus driver's 788 * interrupt handlers is dettached. 789 */ 790 int 791 pcie_hpintr_disable(dev_info_t *dip) 792 { 793 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 794 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 795 796 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 797 (void) (ctrl_p->hc_ops.disable_hpc_intr)(ctrl_p); 798 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 799 (void) pcishpc_disable_irqs(ctrl_p); 800 } 801 return (DDI_SUCCESS); 802 } 803 804 /* ARGSUSED */ 805 int 806 pcie_intr(dev_info_t *dip) 807 { 808 int hp, lbw; 809 810 hp = pcie_hp_intr(dip); 811 lbw = pcie_link_bw_intr(dip); 812 813 if (hp == DDI_INTR_CLAIMED || lbw == DDI_INTR_CLAIMED) { 814 return (DDI_INTR_CLAIMED); 815 } 816 817 return (DDI_INTR_UNCLAIMED); 818 } 819 820 /* ARGSUSED */ 821 int 822 pcie_open(dev_info_t *dip, dev_t *devp, int flags, int otyp, cred_t *credp) 823 { 824 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 825 826 /* 827 * Make sure the open is for the right file type. 828 */ 829 if (otyp != OTYP_CHR) 830 return (EINVAL); 831 832 /* 833 * Handle the open by tracking the device state. 834 */ 835 if ((bus_p->bus_soft_state == PCI_SOFT_STATE_OPEN_EXCL) || 836 ((flags & FEXCL) && 837 (bus_p->bus_soft_state != PCI_SOFT_STATE_CLOSED))) { 838 return (EBUSY); 839 } 840 841 if (flags & FEXCL) 842 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN_EXCL; 843 else 844 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN; 845 846 return (0); 847 } 848 849 /* ARGSUSED */ 850 int 851 pcie_close(dev_info_t *dip, dev_t dev, int flags, int otyp, cred_t *credp) 852 { 853 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 854 855 if (otyp != OTYP_CHR) 856 return (EINVAL); 857 858 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 859 860 return (0); 861 } 862 863 /* ARGSUSED */ 864 int 865 pcie_ioctl(dev_info_t *dip, dev_t dev, int cmd, intptr_t arg, int mode, 866 cred_t *credp, int *rvalp) 867 { 868 struct devctl_iocdata *dcp; 869 uint_t bus_state; 870 int rv = DDI_SUCCESS; 871 872 /* 873 * We can use the generic implementation for devctl ioctl 874 */ 875 switch (cmd) { 876 case DEVCTL_DEVICE_GETSTATE: 877 case DEVCTL_DEVICE_ONLINE: 878 case DEVCTL_DEVICE_OFFLINE: 879 case DEVCTL_BUS_GETSTATE: 880 return (ndi_devctl_ioctl(dip, cmd, arg, mode, 0)); 881 default: 882 break; 883 } 884 885 /* 886 * read devctl ioctl data 887 */ 888 if (ndi_dc_allochdl((void *)arg, &dcp) != NDI_SUCCESS) 889 return (EFAULT); 890 891 switch (cmd) { 892 case DEVCTL_BUS_QUIESCE: 893 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 894 if (bus_state == BUS_QUIESCED) 895 break; 896 (void) ndi_set_bus_state(dip, BUS_QUIESCED); 897 break; 898 case DEVCTL_BUS_UNQUIESCE: 899 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 900 if (bus_state == BUS_ACTIVE) 901 break; 902 (void) ndi_set_bus_state(dip, BUS_ACTIVE); 903 break; 904 case DEVCTL_BUS_RESET: 905 case DEVCTL_BUS_RESETALL: 906 case DEVCTL_DEVICE_RESET: 907 rv = ENOTSUP; 908 break; 909 default: 910 rv = ENOTTY; 911 } 912 913 ndi_dc_freehdl(dcp); 914 return (rv); 915 } 916 917 /* ARGSUSED */ 918 int 919 pcie_prop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op, 920 int flags, char *name, caddr_t valuep, int *lengthp) 921 { 922 if (dev == DDI_DEV_T_ANY) 923 goto skip; 924 925 if (PCIE_IS_HOTPLUG_CAPABLE(dip) && 926 strcmp(name, "pci-occupant") == 0) { 927 int pci_dev = PCI_MINOR_NUM_TO_PCI_DEVNUM(getminor(dev)); 928 929 pcie_hp_create_occupant_props(dip, dev, pci_dev); 930 } 931 932 skip: 933 return (ddi_prop_op(dev, dip, prop_op, flags, name, valuep, lengthp)); 934 } 935 936 int 937 pcie_init_cfghdl(dev_info_t *cdip) 938 { 939 pcie_bus_t *bus_p; 940 ddi_acc_handle_t eh = NULL; 941 942 bus_p = PCIE_DIP2BUS(cdip); 943 if (bus_p == NULL) 944 return (DDI_FAILURE); 945 946 /* Create an config access special to error handling */ 947 if (pci_config_setup(cdip, &eh) != DDI_SUCCESS) { 948 cmn_err(CE_WARN, "Cannot setup config access" 949 " for BDF 0x%x\n", bus_p->bus_bdf); 950 return (DDI_FAILURE); 951 } 952 953 bus_p->bus_cfg_hdl = eh; 954 return (DDI_SUCCESS); 955 } 956 957 void 958 pcie_fini_cfghdl(dev_info_t *cdip) 959 { 960 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 961 962 pci_config_teardown(&bus_p->bus_cfg_hdl); 963 } 964 965 void 966 pcie_determine_serial(dev_info_t *dip) 967 { 968 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 969 ddi_acc_handle_t h; 970 uint16_t cap; 971 uchar_t serial[8]; 972 uint32_t low, high; 973 974 if (!PCIE_IS_PCIE(bus_p)) 975 return; 976 977 h = bus_p->bus_cfg_hdl; 978 979 if ((PCI_CAP_LOCATE(h, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_SER), &cap)) == 980 DDI_FAILURE) 981 return; 982 983 high = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_UPPER_DW); 984 low = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_LOWER_DW); 985 986 /* 987 * Here, we're trying to figure out if we had an invalid PCIe read. From 988 * looking at the contents of the value, it can be hard to tell the 989 * difference between a value that has all 1s correctly versus if we had 990 * an error. In this case, we only assume it's invalid if both register 991 * reads are invalid. We also only use 32-bit reads as we're not sure if 992 * all devices will support these as 64-bit reads, while we know that 993 * they'll support these as 32-bit reads. 994 */ 995 if (high == PCI_EINVAL32 && low == PCI_EINVAL32) 996 return; 997 998 serial[0] = low & 0xff; 999 serial[1] = (low >> 8) & 0xff; 1000 serial[2] = (low >> 16) & 0xff; 1001 serial[3] = (low >> 24) & 0xff; 1002 serial[4] = high & 0xff; 1003 serial[5] = (high >> 8) & 0xff; 1004 serial[6] = (high >> 16) & 0xff; 1005 serial[7] = (high >> 24) & 0xff; 1006 1007 (void) ndi_prop_update_byte_array(DDI_DEV_T_NONE, dip, "pcie-serial", 1008 serial, sizeof (serial)); 1009 } 1010 1011 static void 1012 pcie_determine_aspm(dev_info_t *dip) 1013 { 1014 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1015 uint32_t linkcap; 1016 uint16_t linkctl; 1017 1018 if (!PCIE_IS_PCIE(bus_p)) 1019 return; 1020 1021 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1022 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 1023 1024 switch (linkcap & PCIE_LINKCAP_ASPM_SUP_MASK) { 1025 case PCIE_LINKCAP_ASPM_SUP_L0S: 1026 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1027 "pcie-aspm-support", "l0s"); 1028 break; 1029 case PCIE_LINKCAP_ASPM_SUP_L1: 1030 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1031 "pcie-aspm-support", "l1"); 1032 break; 1033 case PCIE_LINKCAP_ASPM_SUP_L0S_L1: 1034 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1035 "pcie-aspm-support", "l0s,l1"); 1036 break; 1037 default: 1038 return; 1039 } 1040 1041 switch (linkctl & PCIE_LINKCTL_ASPM_CTL_MASK) { 1042 case PCIE_LINKCTL_ASPM_CTL_DIS: 1043 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1044 "pcie-aspm-state", "disabled"); 1045 break; 1046 case PCIE_LINKCTL_ASPM_CTL_L0S: 1047 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1048 "pcie-aspm-state", "l0s"); 1049 break; 1050 case PCIE_LINKCTL_ASPM_CTL_L1: 1051 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1052 "pcie-aspm-state", "l1"); 1053 break; 1054 case PCIE_LINKCTL_ASPM_CTL_L0S_L1: 1055 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1056 "pcie-aspm-state", "l0s,l1"); 1057 break; 1058 } 1059 } 1060 1061 /* 1062 * PCI-Express child device initialization. Note, this only will be called on a 1063 * device or function if we actually attach a device driver to it. 1064 * 1065 * This function enables generic pci-express interrupts and error handling. 1066 * Note, tagging, the max packet size, and related are all set up before this 1067 * point and is performed in pcie_fabric_setup(). 1068 * 1069 * @param pdip root dip (root nexus's dip) 1070 * @param cdip child's dip (device's dip) 1071 * @return DDI_SUCCESS or DDI_FAILURE 1072 */ 1073 /* ARGSUSED */ 1074 int 1075 pcie_initchild(dev_info_t *cdip) 1076 { 1077 uint16_t tmp16, reg16; 1078 pcie_bus_t *bus_p; 1079 uint32_t devid, venid; 1080 1081 bus_p = PCIE_DIP2BUS(cdip); 1082 if (bus_p == NULL) { 1083 PCIE_DBG("%s: BUS not found.\n", 1084 ddi_driver_name(cdip)); 1085 1086 return (DDI_FAILURE); 1087 } 1088 1089 if (pcie_init_cfghdl(cdip) != DDI_SUCCESS) 1090 return (DDI_FAILURE); 1091 1092 /* 1093 * Update pcie_bus_t with real Vendor Id Device Id. 1094 * 1095 * For assigned devices in IOV environment, the OBP will return 1096 * faked device id/vendor id on configration read and for both 1097 * properties in root domain. translate_devid() function will 1098 * update the properties with real device-id/vendor-id on such 1099 * platforms, so that we can utilize the properties here to get 1100 * real device-id/vendor-id and overwrite the faked ids. 1101 * 1102 * For unassigned devices or devices in non-IOV environment, the 1103 * operation below won't make a difference. 1104 * 1105 * The IOV implementation only supports assignment of PCIE 1106 * endpoint devices. Devices under pci-pci bridges don't need 1107 * operation like this. 1108 */ 1109 devid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1110 "device-id", -1); 1111 venid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1112 "vendor-id", -1); 1113 bus_p->bus_dev_ven_id = (devid << 16) | (venid & 0xffff); 1114 1115 /* Clear the device's status register */ 1116 reg16 = PCIE_GET(16, bus_p, PCI_CONF_STAT); 1117 PCIE_PUT(16, bus_p, PCI_CONF_STAT, reg16); 1118 1119 /* Setup the device's command register */ 1120 reg16 = PCIE_GET(16, bus_p, PCI_CONF_COMM); 1121 tmp16 = (reg16 & pcie_command_default_fw) | pcie_command_default; 1122 1123 #if defined(__x86) 1124 boolean_t empty_io_range = B_FALSE; 1125 boolean_t empty_mem_range = B_FALSE; 1126 /* 1127 * Check for empty IO and Mem ranges on bridges. If so disable IO/Mem 1128 * access as it can cause a hang if enabled. 1129 */ 1130 pcie_check_io_mem_range(bus_p->bus_cfg_hdl, &empty_io_range, 1131 &empty_mem_range); 1132 if ((empty_io_range == B_TRUE) && 1133 (pcie_command_default & PCI_COMM_IO)) { 1134 tmp16 &= ~PCI_COMM_IO; 1135 PCIE_DBG("No I/O range found for %s, bdf 0x%x\n", 1136 ddi_driver_name(cdip), bus_p->bus_bdf); 1137 } 1138 if ((empty_mem_range == B_TRUE) && 1139 (pcie_command_default & PCI_COMM_MAE)) { 1140 tmp16 &= ~PCI_COMM_MAE; 1141 PCIE_DBG("No Mem range found for %s, bdf 0x%x\n", 1142 ddi_driver_name(cdip), bus_p->bus_bdf); 1143 } 1144 #endif /* defined(__x86) */ 1145 1146 if (pcie_serr_disable_flag && PCIE_IS_PCIE(bus_p)) 1147 tmp16 &= ~PCI_COMM_SERR_ENABLE; 1148 1149 PCIE_PUT(16, bus_p, PCI_CONF_COMM, tmp16); 1150 PCIE_DBG_CFG(cdip, bus_p, "COMMAND", 16, PCI_CONF_COMM, reg16); 1151 1152 /* 1153 * If the device has a bus control register then program it 1154 * based on the settings in the command register. 1155 */ 1156 if (PCIE_IS_BDG(bus_p)) { 1157 /* Clear the device's secondary status register */ 1158 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_SEC_STATUS); 1159 PCIE_PUT(16, bus_p, PCI_BCNF_SEC_STATUS, reg16); 1160 1161 /* Setup the device's secondary command register */ 1162 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_BCNTRL); 1163 tmp16 = (reg16 & pcie_bdg_command_default_fw); 1164 1165 tmp16 |= PCI_BCNF_BCNTRL_SERR_ENABLE; 1166 /* 1167 * Workaround for this Nvidia bridge. Don't enable the SERR 1168 * enable bit in the bridge control register as it could lead to 1169 * bogus NMIs. 1170 */ 1171 if (bus_p->bus_dev_ven_id == 0x037010DE) 1172 tmp16 &= ~PCI_BCNF_BCNTRL_SERR_ENABLE; 1173 1174 if (pcie_command_default & PCI_COMM_PARITY_DETECT) 1175 tmp16 |= PCI_BCNF_BCNTRL_PARITY_ENABLE; 1176 1177 /* 1178 * Enable Master Abort Mode only if URs have not been masked. 1179 * For PCI and PCIe-PCI bridges, enabling this bit causes a 1180 * Master Aborts/UR to be forwarded as a UR/TA or SERR. If this 1181 * bit is masked, posted requests are dropped and non-posted 1182 * requests are returned with -1. 1183 */ 1184 if (pcie_aer_uce_mask & PCIE_AER_UCE_UR) 1185 tmp16 &= ~PCI_BCNF_BCNTRL_MAST_AB_MODE; 1186 else 1187 tmp16 |= PCI_BCNF_BCNTRL_MAST_AB_MODE; 1188 PCIE_PUT(16, bus_p, PCI_BCNF_BCNTRL, tmp16); 1189 PCIE_DBG_CFG(cdip, bus_p, "SEC CMD", 16, PCI_BCNF_BCNTRL, 1190 reg16); 1191 } 1192 1193 if (PCIE_IS_PCIE(bus_p)) { 1194 /* Setup PCIe device control register */ 1195 reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 1196 /* note: MPS/MRRS are initialized in pcie_initchild_mps() */ 1197 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 1198 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 1199 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 1200 PCIE_DEVCTL_MAX_PAYLOAD_MASK)); 1201 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 1202 PCIE_DBG_CAP(cdip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 1203 1204 /* Enable PCIe errors */ 1205 pcie_enable_errors(cdip); 1206 1207 pcie_determine_serial(cdip); 1208 1209 pcie_determine_aspm(cdip); 1210 1211 pcie_capture_speeds(cdip); 1212 } 1213 1214 bus_p->bus_ari = B_FALSE; 1215 if ((pcie_ari_is_enabled(ddi_get_parent(cdip)) 1216 == PCIE_ARI_FORW_ENABLED) && (pcie_ari_device(cdip) 1217 == PCIE_ARI_DEVICE)) { 1218 bus_p->bus_ari = B_TRUE; 1219 } 1220 1221 return (DDI_SUCCESS); 1222 } 1223 1224 static void 1225 pcie_init_pfd(dev_info_t *dip) 1226 { 1227 pf_data_t *pfd_p = PCIE_ZALLOC(pf_data_t); 1228 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1229 1230 PCIE_DIP2PFD(dip) = pfd_p; 1231 1232 pfd_p->pe_bus_p = bus_p; 1233 pfd_p->pe_severity_flags = 0; 1234 pfd_p->pe_severity_mask = 0; 1235 pfd_p->pe_orig_severity_flags = 0; 1236 pfd_p->pe_lock = B_FALSE; 1237 pfd_p->pe_valid = B_FALSE; 1238 1239 /* Allocate the root fault struct for both RC and RP */ 1240 if (PCIE_IS_ROOT(bus_p)) { 1241 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1242 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1243 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1244 } 1245 1246 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1247 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1248 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1249 1250 if (PCIE_IS_BDG(bus_p)) 1251 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1252 1253 if (PCIE_IS_PCIE(bus_p)) { 1254 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1255 1256 if (PCIE_IS_RP(bus_p)) 1257 PCIE_RP_REG(pfd_p) = 1258 PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1259 1260 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1261 PCIE_ADV_REG(pfd_p)->pcie_ue_tgt_bdf = PCIE_INVALID_BDF; 1262 1263 if (PCIE_IS_RP(bus_p)) { 1264 PCIE_ADV_RP_REG(pfd_p) = 1265 PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1266 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = 1267 PCIE_INVALID_BDF; 1268 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = 1269 PCIE_INVALID_BDF; 1270 } else if (PCIE_IS_PCIE_BDG(bus_p)) { 1271 PCIE_ADV_BDG_REG(pfd_p) = 1272 PCIE_ZALLOC(pf_pcie_adv_bdg_err_regs_t); 1273 PCIE_ADV_BDG_REG(pfd_p)->pcie_sue_tgt_bdf = 1274 PCIE_INVALID_BDF; 1275 } 1276 1277 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1278 PCIX_BDG_ERR_REG(pfd_p) = 1279 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1280 1281 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1282 PCIX_BDG_ECC_REG(pfd_p, 0) = 1283 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1284 PCIX_BDG_ECC_REG(pfd_p, 1) = 1285 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1286 } 1287 } 1288 1289 PCIE_SLOT_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_slot_regs_t); 1290 PCIE_SLOT_REG(pfd_p)->pcie_slot_regs_valid = B_FALSE; 1291 PCIE_SLOT_REG(pfd_p)->pcie_slot_cap = 0; 1292 PCIE_SLOT_REG(pfd_p)->pcie_slot_control = 0; 1293 PCIE_SLOT_REG(pfd_p)->pcie_slot_status = 0; 1294 1295 } else if (PCIE_IS_PCIX(bus_p)) { 1296 if (PCIE_IS_BDG(bus_p)) { 1297 PCIX_BDG_ERR_REG(pfd_p) = 1298 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1299 1300 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1301 PCIX_BDG_ECC_REG(pfd_p, 0) = 1302 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1303 PCIX_BDG_ECC_REG(pfd_p, 1) = 1304 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1305 } 1306 } else { 1307 PCIX_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcix_err_regs_t); 1308 1309 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1310 PCIX_ECC_REG(pfd_p) = 1311 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1312 } 1313 } 1314 } 1315 1316 static void 1317 pcie_fini_pfd(dev_info_t *dip) 1318 { 1319 pf_data_t *pfd_p = PCIE_DIP2PFD(dip); 1320 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1321 1322 if (PCIE_IS_PCIE(bus_p)) { 1323 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1324 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1325 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1326 sizeof (pf_pcix_ecc_regs_t)); 1327 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1328 sizeof (pf_pcix_ecc_regs_t)); 1329 } 1330 1331 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1332 sizeof (pf_pcix_bdg_err_regs_t)); 1333 } 1334 1335 if (PCIE_IS_RP(bus_p)) 1336 kmem_free(PCIE_ADV_RP_REG(pfd_p), 1337 sizeof (pf_pcie_adv_rp_err_regs_t)); 1338 else if (PCIE_IS_PCIE_BDG(bus_p)) 1339 kmem_free(PCIE_ADV_BDG_REG(pfd_p), 1340 sizeof (pf_pcie_adv_bdg_err_regs_t)); 1341 1342 kmem_free(PCIE_ADV_REG(pfd_p), 1343 sizeof (pf_pcie_adv_err_regs_t)); 1344 1345 if (PCIE_IS_RP(bus_p)) 1346 kmem_free(PCIE_RP_REG(pfd_p), 1347 sizeof (pf_pcie_rp_err_regs_t)); 1348 1349 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1350 } else if (PCIE_IS_PCIX(bus_p)) { 1351 if (PCIE_IS_BDG(bus_p)) { 1352 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1353 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1354 sizeof (pf_pcix_ecc_regs_t)); 1355 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1356 sizeof (pf_pcix_ecc_regs_t)); 1357 } 1358 1359 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1360 sizeof (pf_pcix_bdg_err_regs_t)); 1361 } else { 1362 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1363 kmem_free(PCIX_ECC_REG(pfd_p), 1364 sizeof (pf_pcix_ecc_regs_t)); 1365 1366 kmem_free(PCIX_ERR_REG(pfd_p), 1367 sizeof (pf_pcix_err_regs_t)); 1368 } 1369 } 1370 1371 if (PCIE_IS_BDG(bus_p)) 1372 kmem_free(PCI_BDG_ERR_REG(pfd_p), 1373 sizeof (pf_pci_bdg_err_regs_t)); 1374 1375 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1376 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1377 1378 if (PCIE_IS_ROOT(bus_p)) { 1379 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1380 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1381 } 1382 1383 kmem_free(PCIE_DIP2PFD(dip), sizeof (pf_data_t)); 1384 1385 PCIE_DIP2PFD(dip) = NULL; 1386 } 1387 1388 1389 /* 1390 * Special functions to allocate pf_data_t's for PCIe root complexes. 1391 * Note: Root Complex not Root Port 1392 */ 1393 void 1394 pcie_rc_init_pfd(dev_info_t *dip, pf_data_t *pfd_p) 1395 { 1396 pfd_p->pe_bus_p = PCIE_DIP2DOWNBUS(dip); 1397 pfd_p->pe_severity_flags = 0; 1398 pfd_p->pe_severity_mask = 0; 1399 pfd_p->pe_orig_severity_flags = 0; 1400 pfd_p->pe_lock = B_FALSE; 1401 pfd_p->pe_valid = B_FALSE; 1402 1403 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1404 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1405 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1406 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1407 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1408 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1409 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1410 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1411 PCIE_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1412 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1413 PCIE_ADV_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1414 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = PCIE_INVALID_BDF; 1415 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = PCIE_INVALID_BDF; 1416 1417 PCIE_ADV_REG(pfd_p)->pcie_ue_sev = pcie_aer_uce_severity; 1418 } 1419 1420 void 1421 pcie_rc_fini_pfd(pf_data_t *pfd_p) 1422 { 1423 kmem_free(PCIE_ADV_RP_REG(pfd_p), sizeof (pf_pcie_adv_rp_err_regs_t)); 1424 kmem_free(PCIE_ADV_REG(pfd_p), sizeof (pf_pcie_adv_err_regs_t)); 1425 kmem_free(PCIE_RP_REG(pfd_p), sizeof (pf_pcie_rp_err_regs_t)); 1426 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1427 kmem_free(PCI_BDG_ERR_REG(pfd_p), sizeof (pf_pci_bdg_err_regs_t)); 1428 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1429 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1430 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1431 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1432 } 1433 1434 /* 1435 * init pcie_bus_t for root complex 1436 * 1437 * Only a few of the fields in bus_t is valid for root complex. 1438 * The fields that are bracketed are initialized in this routine: 1439 * 1440 * dev_info_t * <bus_dip> 1441 * dev_info_t * bus_rp_dip 1442 * ddi_acc_handle_t bus_cfg_hdl 1443 * uint_t <bus_fm_flags> 1444 * pcie_req_id_t bus_bdf 1445 * pcie_req_id_t bus_rp_bdf 1446 * uint32_t bus_dev_ven_id 1447 * uint8_t bus_rev_id 1448 * uint8_t <bus_hdr_type> 1449 * uint16_t <bus_dev_type> 1450 * uint8_t bus_bdg_secbus 1451 * uint16_t bus_pcie_off 1452 * uint16_t <bus_aer_off> 1453 * uint16_t bus_pcix_off 1454 * uint16_t bus_ecc_ver 1455 * pci_bus_range_t bus_bus_range 1456 * ppb_ranges_t * bus_addr_ranges 1457 * int bus_addr_entries 1458 * pci_regspec_t * bus_assigned_addr 1459 * int bus_assigned_entries 1460 * pf_data_t * bus_pfd 1461 * pcie_domain_t * <bus_dom> 1462 * int bus_mps 1463 * uint64_t bus_cfgacc_base 1464 * void * bus_plat_private 1465 */ 1466 void 1467 pcie_rc_init_bus(dev_info_t *dip) 1468 { 1469 pcie_bus_t *bus_p; 1470 1471 bus_p = (pcie_bus_t *)kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1472 bus_p->bus_dip = dip; 1473 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_RC_PSEUDO; 1474 bus_p->bus_hdr_type = PCI_HEADER_ONE; 1475 1476 /* Fake that there are AER logs */ 1477 bus_p->bus_aer_off = (uint16_t)-1; 1478 1479 /* Needed only for handle lookup */ 1480 atomic_or_uint(&bus_p->bus_fm_flags, PF_FM_READY); 1481 1482 ndi_set_bus_private(dip, B_FALSE, DEVI_PORT_TYPE_PCI, bus_p); 1483 1484 PCIE_BUS2DOM(bus_p) = PCIE_ZALLOC(pcie_domain_t); 1485 } 1486 1487 void 1488 pcie_rc_fini_bus(dev_info_t *dip) 1489 { 1490 pcie_bus_t *bus_p = PCIE_DIP2DOWNBUS(dip); 1491 ndi_set_bus_private(dip, B_FALSE, 0, NULL); 1492 kmem_free(PCIE_BUS2DOM(bus_p), sizeof (pcie_domain_t)); 1493 kmem_free(bus_p, sizeof (pcie_bus_t)); 1494 } 1495 1496 static int 1497 pcie_width_to_int(pcie_link_width_t width) 1498 { 1499 switch (width) { 1500 case PCIE_LINK_WIDTH_X1: 1501 return (1); 1502 case PCIE_LINK_WIDTH_X2: 1503 return (2); 1504 case PCIE_LINK_WIDTH_X4: 1505 return (4); 1506 case PCIE_LINK_WIDTH_X8: 1507 return (8); 1508 case PCIE_LINK_WIDTH_X12: 1509 return (12); 1510 case PCIE_LINK_WIDTH_X16: 1511 return (16); 1512 case PCIE_LINK_WIDTH_X32: 1513 return (32); 1514 default: 1515 return (0); 1516 } 1517 } 1518 1519 /* 1520 * Return the speed in Transfers / second. This is a signed quantity to match 1521 * the ndi/ddi property interfaces. 1522 */ 1523 static int64_t 1524 pcie_speed_to_int(pcie_link_speed_t speed) 1525 { 1526 switch (speed) { 1527 case PCIE_LINK_SPEED_2_5: 1528 return (2500000000LL); 1529 case PCIE_LINK_SPEED_5: 1530 return (5000000000LL); 1531 case PCIE_LINK_SPEED_8: 1532 return (8000000000LL); 1533 case PCIE_LINK_SPEED_16: 1534 return (16000000000LL); 1535 case PCIE_LINK_SPEED_32: 1536 return (32000000000LL); 1537 case PCIE_LINK_SPEED_64: 1538 return (64000000000LL); 1539 default: 1540 return (0); 1541 } 1542 } 1543 1544 /* 1545 * Translate the recorded speed information into devinfo properties. 1546 */ 1547 static void 1548 pcie_speeds_to_devinfo(dev_info_t *dip, pcie_bus_t *bus_p) 1549 { 1550 if (bus_p->bus_max_width != PCIE_LINK_WIDTH_UNKNOWN) { 1551 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1552 "pcie-link-maximum-width", 1553 pcie_width_to_int(bus_p->bus_max_width)); 1554 } 1555 1556 if (bus_p->bus_cur_width != PCIE_LINK_WIDTH_UNKNOWN) { 1557 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1558 "pcie-link-current-width", 1559 pcie_width_to_int(bus_p->bus_cur_width)); 1560 } 1561 1562 if (bus_p->bus_cur_speed != PCIE_LINK_SPEED_UNKNOWN) { 1563 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1564 "pcie-link-current-speed", 1565 pcie_speed_to_int(bus_p->bus_cur_speed)); 1566 } 1567 1568 if (bus_p->bus_max_speed != PCIE_LINK_SPEED_UNKNOWN) { 1569 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1570 "pcie-link-maximum-speed", 1571 pcie_speed_to_int(bus_p->bus_max_speed)); 1572 } 1573 1574 if (bus_p->bus_target_speed != PCIE_LINK_SPEED_UNKNOWN) { 1575 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1576 "pcie-link-target-speed", 1577 pcie_speed_to_int(bus_p->bus_target_speed)); 1578 } 1579 1580 if ((bus_p->bus_speed_flags & PCIE_LINK_F_ADMIN_TARGET) != 0) { 1581 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 1582 "pcie-link-admin-target-speed"); 1583 } 1584 1585 if (bus_p->bus_sup_speed != PCIE_LINK_SPEED_UNKNOWN) { 1586 int64_t speeds[PCIE_NSPEEDS]; 1587 uint_t nspeeds = 0; 1588 1589 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_2_5) { 1590 speeds[nspeeds++] = 1591 pcie_speed_to_int(PCIE_LINK_SPEED_2_5); 1592 } 1593 1594 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_5) { 1595 speeds[nspeeds++] = 1596 pcie_speed_to_int(PCIE_LINK_SPEED_5); 1597 } 1598 1599 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_8) { 1600 speeds[nspeeds++] = 1601 pcie_speed_to_int(PCIE_LINK_SPEED_8); 1602 } 1603 1604 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_16) { 1605 speeds[nspeeds++] = 1606 pcie_speed_to_int(PCIE_LINK_SPEED_16); 1607 } 1608 1609 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_32) { 1610 speeds[nspeeds++] = 1611 pcie_speed_to_int(PCIE_LINK_SPEED_32); 1612 } 1613 1614 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_64) { 1615 speeds[nspeeds++] = 1616 pcie_speed_to_int(PCIE_LINK_SPEED_64); 1617 } 1618 1619 (void) ndi_prop_update_int64_array(DDI_DEV_T_NONE, dip, 1620 "pcie-link-supported-speeds", speeds, nspeeds); 1621 } 1622 } 1623 1624 /* 1625 * We need to capture the supported, maximum, and current device speed and 1626 * width. The way that this has been done has changed over time. 1627 * 1628 * Prior to PCIe Gen 3, there were only current and supported speed fields. 1629 * These were found in the link status and link capabilities registers of the 1630 * PCI express capability. With the change to PCIe Gen 3, the information in the 1631 * link capabilities changed to the maximum value. The supported speeds vector 1632 * was moved to the link capabilities 2 register. 1633 * 1634 * Now, a device may not implement some of these registers. To determine whether 1635 * or not it's here, we have to do the following. First, we need to check the 1636 * revision of the PCI express capability. The link capabilities 2 register did 1637 * not exist prior to version 2 of this capability. If a modern device does not 1638 * implement it, it is supposed to return zero for the register. 1639 */ 1640 static void 1641 pcie_capture_speeds(dev_info_t *dip) 1642 { 1643 uint16_t vers, status; 1644 uint32_t cap, cap2, ctl2; 1645 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1646 dev_info_t *rcdip; 1647 1648 if (!PCIE_IS_PCIE(bus_p)) 1649 return; 1650 1651 rcdip = pcie_get_rc_dip(dip); 1652 if (bus_p->bus_cfg_hdl == NULL) { 1653 vers = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1654 bus_p->bus_pcie_off + PCIE_PCIECAP); 1655 } else { 1656 vers = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 1657 } 1658 if (vers == PCI_EINVAL16) 1659 return; 1660 vers &= PCIE_PCIECAP_VER_MASK; 1661 1662 /* 1663 * Verify the capability's version. 1664 */ 1665 switch (vers) { 1666 case PCIE_PCIECAP_VER_1_0: 1667 cap2 = 0; 1668 ctl2 = 0; 1669 break; 1670 case PCIE_PCIECAP_VER_2_0: 1671 if (bus_p->bus_cfg_hdl == NULL) { 1672 cap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1673 bus_p->bus_pcie_off + PCIE_LINKCAP2); 1674 ctl2 = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1675 bus_p->bus_pcie_off + PCIE_LINKCTL2); 1676 } else { 1677 cap2 = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP2); 1678 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 1679 } 1680 if (cap2 == PCI_EINVAL32) 1681 cap2 = 0; 1682 if (ctl2 == PCI_EINVAL16) 1683 ctl2 = 0; 1684 break; 1685 default: 1686 /* Don't try and handle an unknown version */ 1687 return; 1688 } 1689 1690 if (bus_p->bus_cfg_hdl == NULL) { 1691 status = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1692 bus_p->bus_pcie_off + PCIE_LINKSTS); 1693 cap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1694 bus_p->bus_pcie_off + PCIE_LINKCAP); 1695 } else { 1696 status = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 1697 cap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1698 } 1699 if (status == PCI_EINVAL16 || cap == PCI_EINVAL32) 1700 return; 1701 1702 mutex_enter(&bus_p->bus_speed_mutex); 1703 1704 switch (status & PCIE_LINKSTS_SPEED_MASK) { 1705 case PCIE_LINKSTS_SPEED_2_5: 1706 bus_p->bus_cur_speed = PCIE_LINK_SPEED_2_5; 1707 break; 1708 case PCIE_LINKSTS_SPEED_5: 1709 bus_p->bus_cur_speed = PCIE_LINK_SPEED_5; 1710 break; 1711 case PCIE_LINKSTS_SPEED_8: 1712 bus_p->bus_cur_speed = PCIE_LINK_SPEED_8; 1713 break; 1714 case PCIE_LINKSTS_SPEED_16: 1715 bus_p->bus_cur_speed = PCIE_LINK_SPEED_16; 1716 break; 1717 case PCIE_LINKSTS_SPEED_32: 1718 bus_p->bus_cur_speed = PCIE_LINK_SPEED_32; 1719 break; 1720 case PCIE_LINKSTS_SPEED_64: 1721 bus_p->bus_cur_speed = PCIE_LINK_SPEED_64; 1722 break; 1723 default: 1724 bus_p->bus_cur_speed = PCIE_LINK_SPEED_UNKNOWN; 1725 break; 1726 } 1727 1728 switch (status & PCIE_LINKSTS_NEG_WIDTH_MASK) { 1729 case PCIE_LINKSTS_NEG_WIDTH_X1: 1730 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X1; 1731 break; 1732 case PCIE_LINKSTS_NEG_WIDTH_X2: 1733 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X2; 1734 break; 1735 case PCIE_LINKSTS_NEG_WIDTH_X4: 1736 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X4; 1737 break; 1738 case PCIE_LINKSTS_NEG_WIDTH_X8: 1739 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X8; 1740 break; 1741 case PCIE_LINKSTS_NEG_WIDTH_X12: 1742 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X12; 1743 break; 1744 case PCIE_LINKSTS_NEG_WIDTH_X16: 1745 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X16; 1746 break; 1747 case PCIE_LINKSTS_NEG_WIDTH_X32: 1748 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X32; 1749 break; 1750 default: 1751 bus_p->bus_cur_width = PCIE_LINK_WIDTH_UNKNOWN; 1752 break; 1753 } 1754 1755 switch (cap & PCIE_LINKCAP_MAX_WIDTH_MASK) { 1756 case PCIE_LINKCAP_MAX_WIDTH_X1: 1757 bus_p->bus_max_width = PCIE_LINK_WIDTH_X1; 1758 break; 1759 case PCIE_LINKCAP_MAX_WIDTH_X2: 1760 bus_p->bus_max_width = PCIE_LINK_WIDTH_X2; 1761 break; 1762 case PCIE_LINKCAP_MAX_WIDTH_X4: 1763 bus_p->bus_max_width = PCIE_LINK_WIDTH_X4; 1764 break; 1765 case PCIE_LINKCAP_MAX_WIDTH_X8: 1766 bus_p->bus_max_width = PCIE_LINK_WIDTH_X8; 1767 break; 1768 case PCIE_LINKCAP_MAX_WIDTH_X12: 1769 bus_p->bus_max_width = PCIE_LINK_WIDTH_X12; 1770 break; 1771 case PCIE_LINKCAP_MAX_WIDTH_X16: 1772 bus_p->bus_max_width = PCIE_LINK_WIDTH_X16; 1773 break; 1774 case PCIE_LINKCAP_MAX_WIDTH_X32: 1775 bus_p->bus_max_width = PCIE_LINK_WIDTH_X32; 1776 break; 1777 default: 1778 bus_p->bus_max_width = PCIE_LINK_WIDTH_UNKNOWN; 1779 break; 1780 } 1781 1782 /* 1783 * If we have the Link Capabilities 2, then we can get the supported 1784 * speeds from it and treat the bits in Link Capabilities 1 as the 1785 * maximum. If we don't, then we need to follow the Implementation Note 1786 * in the standard under Link Capabilities 2. Effectively, this means 1787 * that if the value of 10b is set in Link Capabilities register, that 1788 * it supports both 2.5 and 5 GT/s speeds. 1789 */ 1790 if (cap2 != 0) { 1791 if (cap2 & PCIE_LINKCAP2_SPEED_2_5) 1792 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_2_5; 1793 if (cap2 & PCIE_LINKCAP2_SPEED_5) 1794 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_5; 1795 if (cap2 & PCIE_LINKCAP2_SPEED_8) 1796 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_8; 1797 if (cap2 & PCIE_LINKCAP2_SPEED_16) 1798 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_16; 1799 if (cap2 & PCIE_LINKCAP2_SPEED_32) 1800 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_32; 1801 if (cap2 & PCIE_LINKCAP2_SPEED_64) 1802 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_64; 1803 1804 switch (cap & PCIE_LINKCAP_MAX_SPEED_MASK) { 1805 case PCIE_LINKCAP_MAX_SPEED_2_5: 1806 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1807 break; 1808 case PCIE_LINKCAP_MAX_SPEED_5: 1809 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1810 break; 1811 case PCIE_LINKCAP_MAX_SPEED_8: 1812 bus_p->bus_max_speed = PCIE_LINK_SPEED_8; 1813 break; 1814 case PCIE_LINKCAP_MAX_SPEED_16: 1815 bus_p->bus_max_speed = PCIE_LINK_SPEED_16; 1816 break; 1817 case PCIE_LINKCAP_MAX_SPEED_32: 1818 bus_p->bus_max_speed = PCIE_LINK_SPEED_32; 1819 break; 1820 case PCIE_LINKCAP_MAX_SPEED_64: 1821 bus_p->bus_max_speed = PCIE_LINK_SPEED_64; 1822 break; 1823 default: 1824 bus_p->bus_max_speed = PCIE_LINK_SPEED_UNKNOWN; 1825 break; 1826 } 1827 } else { 1828 if (cap & PCIE_LINKCAP_MAX_SPEED_5) { 1829 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1830 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5 | 1831 PCIE_LINK_SPEED_5; 1832 } else if (cap & PCIE_LINKCAP_MAX_SPEED_2_5) { 1833 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1834 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5; 1835 } 1836 } 1837 1838 switch (ctl2 & PCIE_LINKCTL2_TARGET_SPEED_MASK) { 1839 case PCIE_LINKCTL2_TARGET_SPEED_2_5: 1840 bus_p->bus_target_speed = PCIE_LINK_SPEED_2_5; 1841 break; 1842 case PCIE_LINKCTL2_TARGET_SPEED_5: 1843 bus_p->bus_target_speed = PCIE_LINK_SPEED_5; 1844 break; 1845 case PCIE_LINKCTL2_TARGET_SPEED_8: 1846 bus_p->bus_target_speed = PCIE_LINK_SPEED_8; 1847 break; 1848 case PCIE_LINKCTL2_TARGET_SPEED_16: 1849 bus_p->bus_target_speed = PCIE_LINK_SPEED_16; 1850 break; 1851 case PCIE_LINKCTL2_TARGET_SPEED_32: 1852 bus_p->bus_target_speed = PCIE_LINK_SPEED_32; 1853 break; 1854 case PCIE_LINKCTL2_TARGET_SPEED_64: 1855 bus_p->bus_target_speed = PCIE_LINK_SPEED_64; 1856 break; 1857 default: 1858 bus_p->bus_target_speed = PCIE_LINK_SPEED_UNKNOWN; 1859 break; 1860 } 1861 1862 pcie_speeds_to_devinfo(dip, bus_p); 1863 mutex_exit(&bus_p->bus_speed_mutex); 1864 } 1865 1866 /* 1867 * partially init pcie_bus_t for device (dip,bdf) for accessing pci 1868 * config space 1869 * 1870 * This routine is invoked during boot, either after creating a devinfo node 1871 * (x86 case) or during px driver attach (sparc case); it is also invoked 1872 * in hotplug context after a devinfo node is created. 1873 * 1874 * The fields that are bracketed are initialized if flag PCIE_BUS_INITIAL 1875 * is set: 1876 * 1877 * dev_info_t * <bus_dip> 1878 * dev_info_t * <bus_rp_dip> 1879 * ddi_acc_handle_t bus_cfg_hdl 1880 * uint_t bus_fm_flags 1881 * pcie_req_id_t <bus_bdf> 1882 * pcie_req_id_t <bus_rp_bdf> 1883 * uint32_t <bus_dev_ven_id> 1884 * uint8_t <bus_rev_id> 1885 * uint8_t <bus_hdr_type> 1886 * uint16_t <bus_dev_type> 1887 * uint8_t <bus_bdg_secbus 1888 * uint16_t <bus_pcie_off> 1889 * uint16_t <bus_aer_off> 1890 * uint16_t <bus_pcix_off> 1891 * uint16_t <bus_ecc_ver> 1892 * pci_bus_range_t bus_bus_range 1893 * ppb_ranges_t * bus_addr_ranges 1894 * int bus_addr_entries 1895 * pci_regspec_t * bus_assigned_addr 1896 * int bus_assigned_entries 1897 * pf_data_t * bus_pfd 1898 * pcie_domain_t * bus_dom 1899 * int bus_mps 1900 * uint64_t bus_cfgacc_base 1901 * void * bus_plat_private 1902 * 1903 * The fields that are bracketed are initialized if flag PCIE_BUS_FINAL 1904 * is set: 1905 * 1906 * dev_info_t * bus_dip 1907 * dev_info_t * bus_rp_dip 1908 * ddi_acc_handle_t bus_cfg_hdl 1909 * uint_t bus_fm_flags 1910 * pcie_req_id_t bus_bdf 1911 * pcie_req_id_t bus_rp_bdf 1912 * uint32_t bus_dev_ven_id 1913 * uint8_t bus_rev_id 1914 * uint8_t bus_hdr_type 1915 * uint16_t bus_dev_type 1916 * uint8_t <bus_bdg_secbus> 1917 * uint16_t bus_pcie_off 1918 * uint16_t bus_aer_off 1919 * uint16_t bus_pcix_off 1920 * uint16_t bus_ecc_ver 1921 * pci_bus_range_t <bus_bus_range> 1922 * ppb_ranges_t * <bus_addr_ranges> 1923 * int <bus_addr_entries> 1924 * pci_regspec_t * <bus_assigned_addr> 1925 * int <bus_assigned_entries> 1926 * pf_data_t * <bus_pfd> 1927 * pcie_domain_t * bus_dom 1928 * int bus_mps 1929 * uint64_t bus_cfgacc_base 1930 * void * <bus_plat_private> 1931 */ 1932 1933 pcie_bus_t * 1934 pcie_init_bus(dev_info_t *dip, pcie_req_id_t bdf, uint8_t flags) 1935 { 1936 uint16_t status, base, baseptr, num_cap; 1937 uint32_t capid; 1938 int range_size; 1939 pcie_bus_t *bus_p = NULL; 1940 dev_info_t *rcdip; 1941 dev_info_t *pdip; 1942 const char *errstr = NULL; 1943 1944 if (!(flags & PCIE_BUS_INITIAL)) 1945 goto initial_done; 1946 1947 bus_p = kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1948 1949 bus_p->bus_dip = dip; 1950 bus_p->bus_bdf = bdf; 1951 1952 rcdip = pcie_get_rc_dip(dip); 1953 ASSERT(rcdip != NULL); 1954 1955 /* Save the Vendor ID, Device ID and revision ID */ 1956 bus_p->bus_dev_ven_id = pci_cfgacc_get32(rcdip, bdf, PCI_CONF_VENID); 1957 bus_p->bus_rev_id = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_REVID); 1958 /* Save the Header Type */ 1959 bus_p->bus_hdr_type = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_HEADER); 1960 bus_p->bus_hdr_type &= PCI_HEADER_TYPE_M; 1961 1962 /* 1963 * Figure out the device type and all the relavant capability offsets 1964 */ 1965 /* set default value */ 1966 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_PCI_PSEUDO; 1967 1968 status = pci_cfgacc_get16(rcdip, bdf, PCI_CONF_STAT); 1969 if (status == PCI_CAP_EINVAL16 || !(status & PCI_STAT_CAP)) 1970 goto caps_done; /* capability not supported */ 1971 1972 /* Relevant conventional capabilities first */ 1973 1974 /* Conventional caps: PCI_CAP_ID_PCI_E, PCI_CAP_ID_PCIX */ 1975 num_cap = 2; 1976 1977 switch (bus_p->bus_hdr_type) { 1978 case PCI_HEADER_ZERO: 1979 baseptr = PCI_CONF_CAP_PTR; 1980 break; 1981 case PCI_HEADER_PPB: 1982 baseptr = PCI_BCNF_CAP_PTR; 1983 break; 1984 case PCI_HEADER_CARDBUS: 1985 baseptr = PCI_CBUS_CAP_PTR; 1986 break; 1987 default: 1988 cmn_err(CE_WARN, "%s: unexpected pci header type:%x", 1989 __func__, bus_p->bus_hdr_type); 1990 goto caps_done; 1991 } 1992 1993 base = baseptr; 1994 for (base = pci_cfgacc_get8(rcdip, bdf, base); base && num_cap; 1995 base = pci_cfgacc_get8(rcdip, bdf, base + PCI_CAP_NEXT_PTR)) { 1996 capid = pci_cfgacc_get8(rcdip, bdf, base); 1997 uint16_t pcap; 1998 1999 switch (capid) { 2000 case PCI_CAP_ID_PCI_E: 2001 bus_p->bus_pcie_off = base; 2002 pcap = pci_cfgacc_get16(rcdip, bdf, base + 2003 PCIE_PCIECAP); 2004 bus_p->bus_dev_type = pcap & PCIE_PCIECAP_DEV_TYPE_MASK; 2005 bus_p->bus_pcie_vers = pcap & PCIE_PCIECAP_VER_MASK; 2006 2007 /* Check and save PCIe hotplug capability information */ 2008 if ((PCIE_IS_RP(bus_p) || PCIE_IS_SWD(bus_p)) && 2009 (pci_cfgacc_get16(rcdip, bdf, base + PCIE_PCIECAP) 2010 & PCIE_PCIECAP_SLOT_IMPL) && 2011 (pci_cfgacc_get32(rcdip, bdf, base + PCIE_SLOTCAP) 2012 & PCIE_SLOTCAP_HP_CAPABLE)) 2013 bus_p->bus_hp_sup_modes |= PCIE_NATIVE_HP_MODE; 2014 2015 num_cap--; 2016 break; 2017 case PCI_CAP_ID_PCIX: 2018 bus_p->bus_pcix_off = base; 2019 if (PCIE_IS_BDG(bus_p)) 2020 bus_p->bus_ecc_ver = 2021 pci_cfgacc_get16(rcdip, bdf, base + 2022 PCI_PCIX_SEC_STATUS) & PCI_PCIX_VER_MASK; 2023 else 2024 bus_p->bus_ecc_ver = 2025 pci_cfgacc_get16(rcdip, bdf, base + 2026 PCI_PCIX_COMMAND) & PCI_PCIX_VER_MASK; 2027 num_cap--; 2028 break; 2029 default: 2030 break; 2031 } 2032 } 2033 2034 /* Check and save PCI hotplug (SHPC) capability information */ 2035 if (PCIE_IS_BDG(bus_p)) { 2036 base = baseptr; 2037 for (base = pci_cfgacc_get8(rcdip, bdf, base); 2038 base; base = pci_cfgacc_get8(rcdip, bdf, 2039 base + PCI_CAP_NEXT_PTR)) { 2040 capid = pci_cfgacc_get8(rcdip, bdf, base); 2041 if (capid == PCI_CAP_ID_PCI_HOTPLUG) { 2042 bus_p->bus_pci_hp_off = base; 2043 bus_p->bus_hp_sup_modes |= PCIE_PCI_HP_MODE; 2044 break; 2045 } 2046 } 2047 } 2048 2049 /* Then, relevant extended capabilities */ 2050 2051 if (!PCIE_IS_PCIE(bus_p)) 2052 goto caps_done; 2053 2054 /* Extended caps: PCIE_EXT_CAP_ID_AER */ 2055 for (base = PCIE_EXT_CAP; base; base = (capid >> 2056 PCIE_EXT_CAP_NEXT_PTR_SHIFT) & PCIE_EXT_CAP_NEXT_PTR_MASK) { 2057 capid = pci_cfgacc_get32(rcdip, bdf, base); 2058 if (capid == PCI_CAP_EINVAL32) 2059 break; 2060 switch ((capid >> PCIE_EXT_CAP_ID_SHIFT) & 2061 PCIE_EXT_CAP_ID_MASK) { 2062 case PCIE_EXT_CAP_ID_AER: 2063 bus_p->bus_aer_off = base; 2064 break; 2065 case PCIE_EXT_CAP_ID_DEV3: 2066 bus_p->bus_dev3_off = base; 2067 break; 2068 } 2069 } 2070 2071 caps_done: 2072 /* save RP dip and RP bdf */ 2073 if (PCIE_IS_RP(bus_p)) { 2074 bus_p->bus_rp_dip = dip; 2075 bus_p->bus_rp_bdf = bus_p->bus_bdf; 2076 2077 bus_p->bus_fab = PCIE_ZALLOC(pcie_fabric_data_t); 2078 } else { 2079 for (pdip = ddi_get_parent(dip); pdip; 2080 pdip = ddi_get_parent(pdip)) { 2081 pcie_bus_t *parent_bus_p = PCIE_DIP2BUS(pdip); 2082 2083 /* 2084 * If RP dip and RP bdf in parent's bus_t have 2085 * been initialized, simply use these instead of 2086 * continuing up to the RC. 2087 */ 2088 if (parent_bus_p->bus_rp_dip != NULL) { 2089 bus_p->bus_rp_dip = parent_bus_p->bus_rp_dip; 2090 bus_p->bus_rp_bdf = parent_bus_p->bus_rp_bdf; 2091 break; 2092 } 2093 2094 /* 2095 * When debugging be aware that some NVIDIA x86 2096 * architectures have 2 nodes for each RP, One at Bus 2097 * 0x0 and one at Bus 0x80. The requester is from Bus 2098 * 0x80 2099 */ 2100 if (PCIE_IS_ROOT(parent_bus_p)) { 2101 bus_p->bus_rp_dip = pdip; 2102 bus_p->bus_rp_bdf = parent_bus_p->bus_bdf; 2103 break; 2104 } 2105 } 2106 } 2107 2108 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 2109 (void) atomic_swap_uint(&bus_p->bus_fm_flags, 0); 2110 2111 ndi_set_bus_private(dip, B_TRUE, DEVI_PORT_TYPE_PCI, (void *)bus_p); 2112 2113 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) 2114 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 2115 "hotplug-capable"); 2116 2117 initial_done: 2118 if (!(flags & PCIE_BUS_FINAL)) 2119 goto final_done; 2120 2121 /* already initialized? */ 2122 bus_p = PCIE_DIP2BUS(dip); 2123 2124 /* Save the Range information if device is a switch/bridge */ 2125 if (PCIE_IS_BDG(bus_p)) { 2126 /* get "bus_range" property */ 2127 range_size = sizeof (pci_bus_range_t); 2128 if (ddi_getlongprop_buf(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2129 "bus-range", (caddr_t)&bus_p->bus_bus_range, &range_size) 2130 != DDI_PROP_SUCCESS) { 2131 errstr = "Cannot find \"bus-range\" property"; 2132 cmn_err(CE_WARN, 2133 "PCIE init err info failed BDF 0x%x:%s\n", 2134 bus_p->bus_bdf, errstr); 2135 } 2136 2137 /* get secondary bus number */ 2138 rcdip = pcie_get_rc_dip(dip); 2139 ASSERT(rcdip != NULL); 2140 2141 bus_p->bus_bdg_secbus = pci_cfgacc_get8(rcdip, 2142 bus_p->bus_bdf, PCI_BCNF_SECBUS); 2143 2144 /* Get "ranges" property */ 2145 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2146 "ranges", (caddr_t)&bus_p->bus_addr_ranges, 2147 &bus_p->bus_addr_entries) != DDI_PROP_SUCCESS) 2148 bus_p->bus_addr_entries = 0; 2149 bus_p->bus_addr_entries /= sizeof (ppb_ranges_t); 2150 } 2151 2152 /* save "assigned-addresses" property array, ignore failues */ 2153 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2154 "assigned-addresses", (caddr_t)&bus_p->bus_assigned_addr, 2155 &bus_p->bus_assigned_entries) == DDI_PROP_SUCCESS) 2156 bus_p->bus_assigned_entries /= sizeof (pci_regspec_t); 2157 else 2158 bus_p->bus_assigned_entries = 0; 2159 2160 pcie_init_pfd(dip); 2161 2162 pcie_init_plat(dip); 2163 2164 pcie_capture_speeds(dip); 2165 2166 final_done: 2167 2168 PCIE_DBG("Add %s(dip 0x%p, bdf 0x%x, secbus 0x%x)\n", 2169 ddi_driver_name(dip), (void *)dip, bus_p->bus_bdf, 2170 bus_p->bus_bdg_secbus); 2171 #ifdef DEBUG 2172 if (bus_p != NULL) { 2173 pcie_print_bus(bus_p); 2174 } 2175 #endif 2176 2177 return (bus_p); 2178 } 2179 2180 /* 2181 * Invoked before destroying devinfo node, mostly during hotplug 2182 * operation to free pcie_bus_t data structure 2183 */ 2184 /* ARGSUSED */ 2185 void 2186 pcie_fini_bus(dev_info_t *dip, uint8_t flags) 2187 { 2188 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2189 ASSERT(bus_p); 2190 2191 if (flags & PCIE_BUS_INITIAL) { 2192 pcie_fini_plat(dip); 2193 pcie_fini_pfd(dip); 2194 2195 if (PCIE_IS_RP(bus_p)) { 2196 kmem_free(bus_p->bus_fab, sizeof (pcie_fabric_data_t)); 2197 bus_p->bus_fab = NULL; 2198 } 2199 2200 kmem_free(bus_p->bus_assigned_addr, 2201 (sizeof (pci_regspec_t) * bus_p->bus_assigned_entries)); 2202 kmem_free(bus_p->bus_addr_ranges, 2203 (sizeof (ppb_ranges_t) * bus_p->bus_addr_entries)); 2204 /* zero out the fields that have been destroyed */ 2205 bus_p->bus_assigned_addr = NULL; 2206 bus_p->bus_addr_ranges = NULL; 2207 bus_p->bus_assigned_entries = 0; 2208 bus_p->bus_addr_entries = 0; 2209 } 2210 2211 if (flags & PCIE_BUS_FINAL) { 2212 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) { 2213 (void) ndi_prop_remove(DDI_DEV_T_NONE, dip, 2214 "hotplug-capable"); 2215 } 2216 2217 ndi_set_bus_private(dip, B_TRUE, 0, NULL); 2218 kmem_free(bus_p, sizeof (pcie_bus_t)); 2219 } 2220 } 2221 2222 int 2223 pcie_postattach_child(dev_info_t *cdip) 2224 { 2225 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 2226 2227 if (!bus_p) 2228 return (DDI_FAILURE); 2229 2230 return (pcie_enable_ce(cdip)); 2231 } 2232 2233 /* 2234 * PCI-Express child device de-initialization. 2235 * This function disables generic pci-express interrupts and error 2236 * handling. 2237 */ 2238 void 2239 pcie_uninitchild(dev_info_t *cdip) 2240 { 2241 pcie_disable_errors(cdip); 2242 pcie_fini_cfghdl(cdip); 2243 pcie_fini_dom(cdip); 2244 } 2245 2246 /* 2247 * find the root complex dip 2248 */ 2249 dev_info_t * 2250 pcie_get_rc_dip(dev_info_t *dip) 2251 { 2252 dev_info_t *rcdip; 2253 pcie_bus_t *rc_bus_p; 2254 2255 for (rcdip = ddi_get_parent(dip); rcdip; 2256 rcdip = ddi_get_parent(rcdip)) { 2257 rc_bus_p = PCIE_DIP2BUS(rcdip); 2258 if (rc_bus_p && PCIE_IS_RC(rc_bus_p)) 2259 break; 2260 } 2261 2262 return (rcdip); 2263 } 2264 2265 boolean_t 2266 pcie_is_pci_device(dev_info_t *dip) 2267 { 2268 dev_info_t *pdip; 2269 char *device_type; 2270 2271 pdip = ddi_get_parent(dip); 2272 if (pdip == NULL) 2273 return (B_FALSE); 2274 2275 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, DDI_PROP_DONTPASS, 2276 "device_type", &device_type) != DDI_PROP_SUCCESS) 2277 return (B_FALSE); 2278 2279 if (strcmp(device_type, "pciex") != 0 && 2280 strcmp(device_type, "pci") != 0) { 2281 ddi_prop_free(device_type); 2282 return (B_FALSE); 2283 } 2284 2285 ddi_prop_free(device_type); 2286 return (B_TRUE); 2287 } 2288 2289 typedef struct { 2290 boolean_t init; 2291 uint8_t flags; 2292 } pcie_bus_arg_t; 2293 2294 /*ARGSUSED*/ 2295 static int 2296 pcie_fab_do_init_fini(dev_info_t *dip, void *arg) 2297 { 2298 pcie_req_id_t bdf; 2299 pcie_bus_arg_t *bus_arg = (pcie_bus_arg_t *)arg; 2300 2301 if (!pcie_is_pci_device(dip)) 2302 goto out; 2303 2304 if (bus_arg->init) { 2305 if (pcie_get_bdf_from_dip(dip, &bdf) != DDI_SUCCESS) 2306 goto out; 2307 2308 (void) pcie_init_bus(dip, bdf, bus_arg->flags); 2309 } else { 2310 (void) pcie_fini_bus(dip, bus_arg->flags); 2311 } 2312 2313 return (DDI_WALK_CONTINUE); 2314 2315 out: 2316 return (DDI_WALK_PRUNECHILD); 2317 } 2318 2319 void 2320 pcie_fab_init_bus(dev_info_t *rcdip, uint8_t flags) 2321 { 2322 int circular_count; 2323 dev_info_t *dip = ddi_get_child(rcdip); 2324 pcie_bus_arg_t arg; 2325 2326 arg.init = B_TRUE; 2327 arg.flags = flags; 2328 2329 ndi_devi_enter(rcdip, &circular_count); 2330 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2331 ndi_devi_exit(rcdip, circular_count); 2332 } 2333 2334 void 2335 pcie_fab_fini_bus(dev_info_t *rcdip, uint8_t flags) 2336 { 2337 int circular_count; 2338 dev_info_t *dip = ddi_get_child(rcdip); 2339 pcie_bus_arg_t arg; 2340 2341 arg.init = B_FALSE; 2342 arg.flags = flags; 2343 2344 ndi_devi_enter(rcdip, &circular_count); 2345 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2346 ndi_devi_exit(rcdip, circular_count); 2347 } 2348 2349 void 2350 pcie_enable_errors(dev_info_t *dip) 2351 { 2352 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2353 uint16_t reg16, tmp16; 2354 uint32_t reg32, tmp32; 2355 2356 ASSERT(bus_p); 2357 2358 /* 2359 * Clear any pending errors 2360 */ 2361 pcie_clear_errors(dip); 2362 2363 if (!PCIE_IS_PCIE(bus_p)) 2364 return; 2365 2366 /* 2367 * Enable Baseline Error Handling but leave CE reporting off (poweron 2368 * default). 2369 */ 2370 if ((reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL)) != 2371 PCI_CAP_EINVAL16) { 2372 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 2373 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2374 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 2375 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2376 (pcie_base_err_default & (~PCIE_DEVCTL_CE_REPORTING_EN)); 2377 2378 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 2379 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 2380 } 2381 2382 /* Enable Root Port Baseline Error Receiving */ 2383 if (PCIE_IS_ROOT(bus_p) && 2384 (reg16 = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL)) != 2385 PCI_CAP_EINVAL16) { 2386 2387 tmp16 = pcie_serr_disable_flag ? 2388 (pcie_root_ctrl_default & ~PCIE_ROOT_SYS_ERR) : 2389 pcie_root_ctrl_default; 2390 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, tmp16); 2391 PCIE_DBG_CAP(dip, bus_p, "ROOT DEVCTL", 16, PCIE_ROOTCTL, 2392 reg16); 2393 } 2394 2395 /* 2396 * Enable PCI-Express Advanced Error Handling if Exists 2397 */ 2398 if (!PCIE_HAS_AER(bus_p)) 2399 return; 2400 2401 /* Set Uncorrectable Severity */ 2402 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_SERV)) != 2403 PCI_CAP_EINVAL32) { 2404 tmp32 = pcie_aer_uce_severity; 2405 2406 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_SERV, tmp32); 2407 PCIE_DBG_AER(dip, bus_p, "AER UCE SEV", 32, PCIE_AER_UCE_SERV, 2408 reg32); 2409 } 2410 2411 /* Enable Uncorrectable errors */ 2412 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_MASK)) != 2413 PCI_CAP_EINVAL32) { 2414 tmp32 = pcie_aer_uce_mask; 2415 2416 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, tmp32); 2417 PCIE_DBG_AER(dip, bus_p, "AER UCE MASK", 32, PCIE_AER_UCE_MASK, 2418 reg32); 2419 } 2420 2421 /* Enable ECRC generation and checking */ 2422 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2423 PCI_CAP_EINVAL32) { 2424 tmp32 = reg32 | pcie_ecrc_value; 2425 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, tmp32); 2426 PCIE_DBG_AER(dip, bus_p, "AER CTL", 32, PCIE_AER_CTL, reg32); 2427 } 2428 2429 /* Enable Secondary Uncorrectable errors if this is a bridge */ 2430 if (!PCIE_IS_PCIE_BDG(bus_p)) 2431 goto root; 2432 2433 /* Set Uncorrectable Severity */ 2434 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_SERV)) != 2435 PCI_CAP_EINVAL32) { 2436 tmp32 = pcie_aer_suce_severity; 2437 2438 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_SERV, tmp32); 2439 PCIE_DBG_AER(dip, bus_p, "AER SUCE SEV", 32, PCIE_AER_SUCE_SERV, 2440 reg32); 2441 } 2442 2443 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_MASK)) != 2444 PCI_CAP_EINVAL32) { 2445 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, pcie_aer_suce_mask); 2446 PCIE_DBG_AER(dip, bus_p, "AER SUCE MASK", 32, 2447 PCIE_AER_SUCE_MASK, reg32); 2448 } 2449 2450 root: 2451 /* 2452 * Enable Root Control this is a Root device 2453 */ 2454 if (!PCIE_IS_ROOT(bus_p)) 2455 return; 2456 2457 if ((reg16 = PCIE_AER_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2458 PCI_CAP_EINVAL16) { 2459 PCIE_AER_PUT(16, bus_p, PCIE_AER_RE_CMD, 2460 pcie_root_error_cmd_default); 2461 PCIE_DBG_AER(dip, bus_p, "AER Root Err Cmd", 16, 2462 PCIE_AER_RE_CMD, reg16); 2463 } 2464 } 2465 2466 /* 2467 * This function is used for enabling CE reporting and setting the AER CE mask. 2468 * When called from outside the pcie module it should always be preceded by 2469 * a call to pcie_enable_errors. 2470 */ 2471 int 2472 pcie_enable_ce(dev_info_t *dip) 2473 { 2474 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2475 uint16_t device_sts, device_ctl; 2476 uint32_t tmp_pcie_aer_ce_mask; 2477 2478 if (!PCIE_IS_PCIE(bus_p)) 2479 return (DDI_SUCCESS); 2480 2481 /* 2482 * The "pcie_ce_mask" property is used to control both the CE reporting 2483 * enable field in the device control register and the AER CE mask. We 2484 * leave CE reporting disabled if pcie_ce_mask is set to -1. 2485 */ 2486 2487 tmp_pcie_aer_ce_mask = (uint32_t)ddi_prop_get_int(DDI_DEV_T_ANY, dip, 2488 DDI_PROP_DONTPASS, "pcie_ce_mask", pcie_aer_ce_mask); 2489 2490 if (tmp_pcie_aer_ce_mask == (uint32_t)-1) { 2491 /* 2492 * Nothing to do since CE reporting has already been disabled. 2493 */ 2494 return (DDI_SUCCESS); 2495 } 2496 2497 if (PCIE_HAS_AER(bus_p)) { 2498 /* Enable AER CE */ 2499 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, tmp_pcie_aer_ce_mask); 2500 PCIE_DBG_AER(dip, bus_p, "AER CE MASK", 32, PCIE_AER_CE_MASK, 2501 0); 2502 2503 /* Clear any pending AER CE errors */ 2504 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_STS, -1); 2505 } 2506 2507 /* clear any pending CE errors */ 2508 if ((device_sts = PCIE_CAP_GET(16, bus_p, PCIE_DEVSTS)) != 2509 PCI_CAP_EINVAL16) 2510 PCIE_CAP_PUT(16, bus_p, PCIE_DEVSTS, 2511 device_sts & (~PCIE_DEVSTS_CE_DETECTED)); 2512 2513 /* Enable CE reporting */ 2514 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2515 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, 2516 (device_ctl & (~PCIE_DEVCTL_ERR_MASK)) | pcie_base_err_default); 2517 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, device_ctl); 2518 2519 return (DDI_SUCCESS); 2520 } 2521 2522 /* ARGSUSED */ 2523 void 2524 pcie_disable_errors(dev_info_t *dip) 2525 { 2526 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2527 uint16_t device_ctl; 2528 uint32_t aer_reg; 2529 2530 if (!PCIE_IS_PCIE(bus_p)) 2531 return; 2532 2533 /* 2534 * Disable PCI-Express Baseline Error Handling 2535 */ 2536 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2537 device_ctl &= ~PCIE_DEVCTL_ERR_MASK; 2538 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, device_ctl); 2539 2540 /* 2541 * Disable PCI-Express Advanced Error Handling if Exists 2542 */ 2543 if (!PCIE_HAS_AER(bus_p)) 2544 goto root; 2545 2546 /* Disable Uncorrectable errors */ 2547 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, PCIE_AER_UCE_BITS); 2548 2549 /* Disable Correctable errors */ 2550 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, PCIE_AER_CE_BITS); 2551 2552 /* Disable ECRC generation and checking */ 2553 if ((aer_reg = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2554 PCI_CAP_EINVAL32) { 2555 aer_reg &= ~(PCIE_AER_CTL_ECRC_GEN_ENA | 2556 PCIE_AER_CTL_ECRC_CHECK_ENA); 2557 2558 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, aer_reg); 2559 } 2560 /* 2561 * Disable Secondary Uncorrectable errors if this is a bridge 2562 */ 2563 if (!PCIE_IS_PCIE_BDG(bus_p)) 2564 goto root; 2565 2566 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, PCIE_AER_SUCE_BITS); 2567 2568 root: 2569 /* 2570 * disable Root Control this is a Root device 2571 */ 2572 if (!PCIE_IS_ROOT(bus_p)) 2573 return; 2574 2575 if (!pcie_serr_disable_flag) { 2576 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL); 2577 device_ctl &= ~PCIE_ROOT_SYS_ERR; 2578 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, device_ctl); 2579 } 2580 2581 if (!PCIE_HAS_AER(bus_p)) 2582 return; 2583 2584 if ((device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2585 PCI_CAP_EINVAL16) { 2586 device_ctl &= ~pcie_root_error_cmd_default; 2587 PCIE_CAP_PUT(16, bus_p, PCIE_AER_RE_CMD, device_ctl); 2588 } 2589 } 2590 2591 /* 2592 * Extract bdf from "reg" property. 2593 */ 2594 int 2595 pcie_get_bdf_from_dip(dev_info_t *dip, pcie_req_id_t *bdf) 2596 { 2597 pci_regspec_t *regspec; 2598 int reglen; 2599 2600 if (ddi_prop_lookup_int_array(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2601 "reg", (int **)®spec, (uint_t *)®len) != DDI_SUCCESS) 2602 return (DDI_FAILURE); 2603 2604 if (reglen < (sizeof (pci_regspec_t) / sizeof (int))) { 2605 ddi_prop_free(regspec); 2606 return (DDI_FAILURE); 2607 } 2608 2609 /* Get phys_hi from first element. All have same bdf. */ 2610 *bdf = (regspec->pci_phys_hi & (PCI_REG_BDFR_M ^ PCI_REG_REG_M)) >> 8; 2611 2612 ddi_prop_free(regspec); 2613 return (DDI_SUCCESS); 2614 } 2615 2616 dev_info_t * 2617 pcie_get_my_childs_dip(dev_info_t *dip, dev_info_t *rdip) 2618 { 2619 dev_info_t *cdip = rdip; 2620 2621 for (; ddi_get_parent(cdip) != dip; cdip = ddi_get_parent(cdip)) 2622 ; 2623 2624 return (cdip); 2625 } 2626 2627 uint32_t 2628 pcie_get_bdf_for_dma_xfer(dev_info_t *dip, dev_info_t *rdip) 2629 { 2630 dev_info_t *cdip; 2631 2632 /* 2633 * As part of the probing, the PCI fcode interpreter may setup a DMA 2634 * request if a given card has a fcode on it using dip and rdip of the 2635 * hotplug connector i.e, dip and rdip of px/pcieb driver. In this 2636 * case, return a invalid value for the bdf since we cannot get to the 2637 * bdf value of the actual device which will be initiating this DMA. 2638 */ 2639 if (rdip == dip) 2640 return (PCIE_INVALID_BDF); 2641 2642 cdip = pcie_get_my_childs_dip(dip, rdip); 2643 2644 /* 2645 * For a given rdip, return the bdf value of dip's (px or pcieb) 2646 * immediate child or secondary bus-id if dip is a PCIe2PCI bridge. 2647 * 2648 * XXX - For now, return a invalid bdf value for all PCI and PCI-X 2649 * devices since this needs more work. 2650 */ 2651 return (PCI_GET_PCIE2PCI_SECBUS(cdip) ? 2652 PCIE_INVALID_BDF : PCI_GET_BDF(cdip)); 2653 } 2654 2655 uint32_t 2656 pcie_get_aer_uce_mask() 2657 { 2658 return (pcie_aer_uce_mask); 2659 } 2660 uint32_t 2661 pcie_get_aer_ce_mask() 2662 { 2663 return (pcie_aer_ce_mask); 2664 } 2665 uint32_t 2666 pcie_get_aer_suce_mask() 2667 { 2668 return (pcie_aer_suce_mask); 2669 } 2670 uint32_t 2671 pcie_get_serr_mask() 2672 { 2673 return (pcie_serr_disable_flag); 2674 } 2675 2676 void 2677 pcie_set_aer_uce_mask(uint32_t mask) 2678 { 2679 pcie_aer_uce_mask = mask; 2680 if (mask & PCIE_AER_UCE_UR) 2681 pcie_base_err_default &= ~PCIE_DEVCTL_UR_REPORTING_EN; 2682 else 2683 pcie_base_err_default |= PCIE_DEVCTL_UR_REPORTING_EN; 2684 2685 if (mask & PCIE_AER_UCE_ECRC) 2686 pcie_ecrc_value = 0; 2687 } 2688 2689 void 2690 pcie_set_aer_ce_mask(uint32_t mask) 2691 { 2692 pcie_aer_ce_mask = mask; 2693 } 2694 void 2695 pcie_set_aer_suce_mask(uint32_t mask) 2696 { 2697 pcie_aer_suce_mask = mask; 2698 } 2699 void 2700 pcie_set_serr_mask(uint32_t mask) 2701 { 2702 pcie_serr_disable_flag = mask; 2703 } 2704 2705 /* 2706 * Is the rdip a child of dip. Used for checking certain CTLOPS from bubbling 2707 * up erronously. Ex. ISA ctlops to a PCI-PCI Bridge. 2708 */ 2709 boolean_t 2710 pcie_is_child(dev_info_t *dip, dev_info_t *rdip) 2711 { 2712 dev_info_t *cdip = ddi_get_child(dip); 2713 for (; cdip; cdip = ddi_get_next_sibling(cdip)) 2714 if (cdip == rdip) 2715 break; 2716 return (cdip != NULL); 2717 } 2718 2719 boolean_t 2720 pcie_is_link_disabled(dev_info_t *dip) 2721 { 2722 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2723 2724 if (PCIE_IS_PCIE(bus_p)) { 2725 if (PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL) & 2726 PCIE_LINKCTL_LINK_DISABLE) 2727 return (B_TRUE); 2728 } 2729 return (B_FALSE); 2730 } 2731 2732 /* 2733 * Determines if there are any root ports attached to a root complex. 2734 * 2735 * dip - dip of root complex 2736 * 2737 * Returns - DDI_SUCCESS if there is at least one root port otherwise 2738 * DDI_FAILURE. 2739 */ 2740 int 2741 pcie_root_port(dev_info_t *dip) 2742 { 2743 int port_type; 2744 uint16_t cap_ptr; 2745 ddi_acc_handle_t config_handle; 2746 dev_info_t *cdip = ddi_get_child(dip); 2747 2748 /* 2749 * Determine if any of the children of the passed in dip 2750 * are root ports. 2751 */ 2752 for (; cdip; cdip = ddi_get_next_sibling(cdip)) { 2753 2754 if (pci_config_setup(cdip, &config_handle) != DDI_SUCCESS) 2755 continue; 2756 2757 if ((PCI_CAP_LOCATE(config_handle, PCI_CAP_ID_PCI_E, 2758 &cap_ptr)) == DDI_FAILURE) { 2759 pci_config_teardown(&config_handle); 2760 continue; 2761 } 2762 2763 port_type = PCI_CAP_GET16(config_handle, 0, cap_ptr, 2764 PCIE_PCIECAP) & PCIE_PCIECAP_DEV_TYPE_MASK; 2765 2766 pci_config_teardown(&config_handle); 2767 2768 if (port_type == PCIE_PCIECAP_DEV_TYPE_ROOT) 2769 return (DDI_SUCCESS); 2770 } 2771 2772 /* No root ports were found */ 2773 2774 return (DDI_FAILURE); 2775 } 2776 2777 /* 2778 * Function that determines if a device a PCIe device. 2779 * 2780 * dip - dip of device. 2781 * 2782 * returns - DDI_SUCCESS if device is a PCIe device, otherwise DDI_FAILURE. 2783 */ 2784 int 2785 pcie_dev(dev_info_t *dip) 2786 { 2787 /* get parent device's device_type property */ 2788 char *device_type; 2789 int rc = DDI_FAILURE; 2790 dev_info_t *pdip = ddi_get_parent(dip); 2791 2792 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, 2793 DDI_PROP_DONTPASS, "device_type", &device_type) 2794 != DDI_PROP_SUCCESS) { 2795 return (DDI_FAILURE); 2796 } 2797 2798 if (strcmp(device_type, "pciex") == 0) 2799 rc = DDI_SUCCESS; 2800 else 2801 rc = DDI_FAILURE; 2802 2803 ddi_prop_free(device_type); 2804 return (rc); 2805 } 2806 2807 void 2808 pcie_set_rber_fatal(dev_info_t *dip, boolean_t val) 2809 { 2810 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2811 bus_p->bus_pfd->pe_rber_fatal = val; 2812 } 2813 2814 /* 2815 * Return parent Root Port's pe_rber_fatal value. 2816 */ 2817 boolean_t 2818 pcie_get_rber_fatal(dev_info_t *dip) 2819 { 2820 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2821 pcie_bus_t *rp_bus_p = PCIE_DIP2UPBUS(bus_p->bus_rp_dip); 2822 return (rp_bus_p->bus_pfd->pe_rber_fatal); 2823 } 2824 2825 int 2826 pcie_ari_supported(dev_info_t *dip) 2827 { 2828 uint32_t devcap2; 2829 uint16_t pciecap; 2830 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2831 uint8_t dev_type; 2832 2833 PCIE_DBG("pcie_ari_supported: dip=%p\n", dip); 2834 2835 if (bus_p == NULL) 2836 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2837 2838 dev_type = bus_p->bus_dev_type; 2839 2840 if ((dev_type != PCIE_PCIECAP_DEV_TYPE_DOWN) && 2841 (dev_type != PCIE_PCIECAP_DEV_TYPE_ROOT)) 2842 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2843 2844 if (pcie_disable_ari) { 2845 PCIE_DBG("pcie_ari_supported: dip=%p: ARI Disabled\n", dip); 2846 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2847 } 2848 2849 pciecap = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 2850 2851 if ((pciecap & PCIE_PCIECAP_VER_MASK) < PCIE_PCIECAP_VER_2_0) { 2852 PCIE_DBG("pcie_ari_supported: dip=%p: Not 2.0\n", dip); 2853 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2854 } 2855 2856 devcap2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCAP2); 2857 2858 PCIE_DBG("pcie_ari_supported: dip=%p: DevCap2=0x%x\n", 2859 dip, devcap2); 2860 2861 if (devcap2 & PCIE_DEVCAP2_ARI_FORWARD) { 2862 PCIE_DBG("pcie_ari_supported: " 2863 "dip=%p: ARI Forwarding is supported\n", dip); 2864 return (PCIE_ARI_FORW_SUPPORTED); 2865 } 2866 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2867 } 2868 2869 int 2870 pcie_ari_enable(dev_info_t *dip) 2871 { 2872 uint16_t devctl2; 2873 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2874 2875 PCIE_DBG("pcie_ari_enable: dip=%p\n", dip); 2876 2877 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2878 return (DDI_FAILURE); 2879 2880 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2881 devctl2 |= PCIE_DEVCTL2_ARI_FORWARD_EN; 2882 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2883 2884 PCIE_DBG("pcie_ari_enable: dip=%p: writing 0x%x to DevCtl2\n", 2885 dip, devctl2); 2886 2887 return (DDI_SUCCESS); 2888 } 2889 2890 int 2891 pcie_ari_disable(dev_info_t *dip) 2892 { 2893 uint16_t devctl2; 2894 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2895 2896 PCIE_DBG("pcie_ari_disable: dip=%p\n", dip); 2897 2898 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2899 return (DDI_FAILURE); 2900 2901 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2902 devctl2 &= ~PCIE_DEVCTL2_ARI_FORWARD_EN; 2903 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2904 2905 PCIE_DBG("pcie_ari_disable: dip=%p: writing 0x%x to DevCtl2\n", 2906 dip, devctl2); 2907 2908 return (DDI_SUCCESS); 2909 } 2910 2911 int 2912 pcie_ari_is_enabled(dev_info_t *dip) 2913 { 2914 uint16_t devctl2; 2915 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2916 2917 PCIE_DBG("pcie_ari_is_enabled: dip=%p\n", dip); 2918 2919 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2920 return (PCIE_ARI_FORW_DISABLED); 2921 2922 devctl2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCTL2); 2923 2924 PCIE_DBG("pcie_ari_is_enabled: dip=%p: DevCtl2=0x%x\n", 2925 dip, devctl2); 2926 2927 if (devctl2 & PCIE_DEVCTL2_ARI_FORWARD_EN) { 2928 PCIE_DBG("pcie_ari_is_enabled: " 2929 "dip=%p: ARI Forwarding is enabled\n", dip); 2930 return (PCIE_ARI_FORW_ENABLED); 2931 } 2932 2933 return (PCIE_ARI_FORW_DISABLED); 2934 } 2935 2936 int 2937 pcie_ari_device(dev_info_t *dip) 2938 { 2939 ddi_acc_handle_t handle; 2940 uint16_t cap_ptr; 2941 2942 PCIE_DBG("pcie_ari_device: dip=%p\n", dip); 2943 2944 /* 2945 * XXX - This function may be called before the bus_p structure 2946 * has been populated. This code can be changed to remove 2947 * pci_config_setup()/pci_config_teardown() when the RFE 2948 * to populate the bus_p structures early in boot is putback. 2949 */ 2950 2951 /* First make sure it is a PCIe device */ 2952 2953 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2954 return (PCIE_NOT_ARI_DEVICE); 2955 2956 if ((PCI_CAP_LOCATE(handle, PCI_CAP_ID_PCI_E, &cap_ptr)) 2957 != DDI_SUCCESS) { 2958 pci_config_teardown(&handle); 2959 return (PCIE_NOT_ARI_DEVICE); 2960 } 2961 2962 /* Locate the ARI Capability */ 2963 2964 if ((PCI_CAP_LOCATE(handle, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), 2965 &cap_ptr)) == DDI_FAILURE) { 2966 pci_config_teardown(&handle); 2967 return (PCIE_NOT_ARI_DEVICE); 2968 } 2969 2970 /* ARI Capability was found so it must be a ARI device */ 2971 PCIE_DBG("pcie_ari_device: ARI Device dip=%p\n", dip); 2972 2973 pci_config_teardown(&handle); 2974 return (PCIE_ARI_DEVICE); 2975 } 2976 2977 int 2978 pcie_ari_get_next_function(dev_info_t *dip, int *func) 2979 { 2980 uint32_t val; 2981 uint16_t cap_ptr, next_function; 2982 ddi_acc_handle_t handle; 2983 2984 /* 2985 * XXX - This function may be called before the bus_p structure 2986 * has been populated. This code can be changed to remove 2987 * pci_config_setup()/pci_config_teardown() when the RFE 2988 * to populate the bus_p structures early in boot is putback. 2989 */ 2990 2991 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2992 return (DDI_FAILURE); 2993 2994 if ((PCI_CAP_LOCATE(handle, 2995 PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), &cap_ptr)) == DDI_FAILURE) { 2996 pci_config_teardown(&handle); 2997 return (DDI_FAILURE); 2998 } 2999 3000 val = PCI_CAP_GET32(handle, 0, cap_ptr, PCIE_ARI_CAP); 3001 3002 next_function = (val >> PCIE_ARI_CAP_NEXT_FUNC_SHIFT) & 3003 PCIE_ARI_CAP_NEXT_FUNC_MASK; 3004 3005 pci_config_teardown(&handle); 3006 3007 *func = next_function; 3008 3009 return (DDI_SUCCESS); 3010 } 3011 3012 dev_info_t * 3013 pcie_func_to_dip(dev_info_t *dip, pcie_req_id_t function) 3014 { 3015 pcie_req_id_t child_bdf; 3016 dev_info_t *cdip; 3017 3018 for (cdip = ddi_get_child(dip); cdip; 3019 cdip = ddi_get_next_sibling(cdip)) { 3020 3021 if (pcie_get_bdf_from_dip(cdip, &child_bdf) == DDI_FAILURE) 3022 return (NULL); 3023 3024 if ((child_bdf & PCIE_REQ_ID_ARI_FUNC_MASK) == function) 3025 return (cdip); 3026 } 3027 return (NULL); 3028 } 3029 3030 #ifdef DEBUG 3031 3032 static void 3033 pcie_print_bus(pcie_bus_t *bus_p) 3034 { 3035 pcie_dbg("\tbus_dip = 0x%p\n", bus_p->bus_dip); 3036 pcie_dbg("\tbus_fm_flags = 0x%x\n", bus_p->bus_fm_flags); 3037 3038 pcie_dbg("\tbus_bdf = 0x%x\n", bus_p->bus_bdf); 3039 pcie_dbg("\tbus_dev_ven_id = 0x%x\n", bus_p->bus_dev_ven_id); 3040 pcie_dbg("\tbus_rev_id = 0x%x\n", bus_p->bus_rev_id); 3041 pcie_dbg("\tbus_hdr_type = 0x%x\n", bus_p->bus_hdr_type); 3042 pcie_dbg("\tbus_dev_type = 0x%x\n", bus_p->bus_dev_type); 3043 pcie_dbg("\tbus_bdg_secbus = 0x%x\n", bus_p->bus_bdg_secbus); 3044 pcie_dbg("\tbus_pcie_off = 0x%x\n", bus_p->bus_pcie_off); 3045 pcie_dbg("\tbus_aer_off = 0x%x\n", bus_p->bus_aer_off); 3046 pcie_dbg("\tbus_pcix_off = 0x%x\n", bus_p->bus_pcix_off); 3047 pcie_dbg("\tbus_ecc_ver = 0x%x\n", bus_p->bus_ecc_ver); 3048 } 3049 3050 /* 3051 * For debugging purposes set pcie_dbg_print != 0 to see printf messages 3052 * during interrupt. 3053 * 3054 * When a proper solution is in place this code will disappear. 3055 * Potential solutions are: 3056 * o circular buffers 3057 * o taskq to print at lower pil 3058 */ 3059 int pcie_dbg_print = 0; 3060 void 3061 pcie_dbg(char *fmt, ...) 3062 { 3063 va_list ap; 3064 3065 if (!pcie_debug_flags) { 3066 return; 3067 } 3068 va_start(ap, fmt); 3069 if (servicing_interrupt()) { 3070 if (pcie_dbg_print) { 3071 prom_vprintf(fmt, ap); 3072 } 3073 } else { 3074 prom_vprintf(fmt, ap); 3075 } 3076 va_end(ap); 3077 } 3078 #endif /* DEBUG */ 3079 3080 #if defined(__x86) 3081 static void 3082 pcie_check_io_mem_range(ddi_acc_handle_t cfg_hdl, boolean_t *empty_io_range, 3083 boolean_t *empty_mem_range) 3084 { 3085 uint8_t class, subclass; 3086 uint_t val; 3087 3088 class = pci_config_get8(cfg_hdl, PCI_CONF_BASCLASS); 3089 subclass = pci_config_get8(cfg_hdl, PCI_CONF_SUBCLASS); 3090 3091 if ((class == PCI_CLASS_BRIDGE) && (subclass == PCI_BRIDGE_PCI)) { 3092 val = (((uint_t)pci_config_get8(cfg_hdl, PCI_BCNF_IO_BASE_LOW) & 3093 PCI_BCNF_IO_MASK) << 8); 3094 /* 3095 * Assuming that a zero based io_range[0] implies an 3096 * invalid I/O range. Likewise for mem_range[0]. 3097 */ 3098 if (val == 0) 3099 *empty_io_range = B_TRUE; 3100 val = (((uint_t)pci_config_get16(cfg_hdl, PCI_BCNF_MEM_BASE) & 3101 PCI_BCNF_MEM_MASK) << 16); 3102 if (val == 0) 3103 *empty_mem_range = B_TRUE; 3104 } 3105 } 3106 3107 #endif /* defined(__x86) */ 3108 3109 boolean_t 3110 pcie_link_bw_supported(dev_info_t *dip) 3111 { 3112 uint32_t linkcap; 3113 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3114 3115 if (!PCIE_IS_PCIE(bus_p)) { 3116 return (B_FALSE); 3117 } 3118 3119 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3120 return (B_FALSE); 3121 } 3122 3123 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 3124 return ((linkcap & PCIE_LINKCAP_LINK_BW_NOTIFY_CAP) != 0); 3125 } 3126 3127 int 3128 pcie_link_bw_enable(dev_info_t *dip) 3129 { 3130 uint16_t linkctl; 3131 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3132 3133 if (pcie_disable_lbw != 0) { 3134 return (DDI_FAILURE); 3135 } 3136 3137 if (!pcie_link_bw_supported(dip)) { 3138 return (DDI_FAILURE); 3139 } 3140 3141 mutex_init(&bus_p->bus_lbw_mutex, NULL, MUTEX_DRIVER, NULL); 3142 cv_init(&bus_p->bus_lbw_cv, NULL, CV_DRIVER, NULL); 3143 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3144 linkctl |= PCIE_LINKCTL_LINK_BW_INTR_EN; 3145 linkctl |= PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3146 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3147 3148 bus_p->bus_lbw_pbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3149 bus_p->bus_lbw_cbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3150 bus_p->bus_lbw_state |= PCIE_LBW_S_ENABLED; 3151 3152 return (DDI_SUCCESS); 3153 } 3154 3155 int 3156 pcie_link_bw_disable(dev_info_t *dip) 3157 { 3158 uint16_t linkctl; 3159 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3160 3161 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3162 return (DDI_FAILURE); 3163 } 3164 3165 mutex_enter(&bus_p->bus_lbw_mutex); 3166 while ((bus_p->bus_lbw_state & 3167 (PCIE_LBW_S_DISPATCHED | PCIE_LBW_S_RUNNING)) != 0) { 3168 cv_wait(&bus_p->bus_lbw_cv, &bus_p->bus_lbw_mutex); 3169 } 3170 mutex_exit(&bus_p->bus_lbw_mutex); 3171 3172 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3173 linkctl &= ~PCIE_LINKCTL_LINK_BW_INTR_EN; 3174 linkctl &= ~PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3175 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3176 3177 bus_p->bus_lbw_state &= ~PCIE_LBW_S_ENABLED; 3178 kmem_free(bus_p->bus_lbw_pbuf, MAXPATHLEN); 3179 kmem_free(bus_p->bus_lbw_cbuf, MAXPATHLEN); 3180 bus_p->bus_lbw_pbuf = NULL; 3181 bus_p->bus_lbw_cbuf = NULL; 3182 3183 mutex_destroy(&bus_p->bus_lbw_mutex); 3184 cv_destroy(&bus_p->bus_lbw_cv); 3185 3186 return (DDI_SUCCESS); 3187 } 3188 3189 void 3190 pcie_link_bw_taskq(void *arg) 3191 { 3192 dev_info_t *dip = arg; 3193 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3194 dev_info_t *cdip; 3195 boolean_t again; 3196 sysevent_t *se; 3197 sysevent_value_t se_val; 3198 sysevent_id_t eid; 3199 sysevent_attr_list_t *ev_attr_list; 3200 int circular; 3201 3202 top: 3203 ndi_devi_enter(dip, &circular); 3204 se = NULL; 3205 ev_attr_list = NULL; 3206 mutex_enter(&bus_p->bus_lbw_mutex); 3207 bus_p->bus_lbw_state &= ~PCIE_LBW_S_DISPATCHED; 3208 bus_p->bus_lbw_state |= PCIE_LBW_S_RUNNING; 3209 mutex_exit(&bus_p->bus_lbw_mutex); 3210 3211 /* 3212 * Update our own speeds as we've likely changed something. 3213 */ 3214 pcie_capture_speeds(dip); 3215 3216 /* 3217 * Walk our children. We only care about updating this on function 0 3218 * because the PCIe specification requires that these all be the same 3219 * otherwise. 3220 */ 3221 for (cdip = ddi_get_child(dip); cdip != NULL; 3222 cdip = ddi_get_next_sibling(cdip)) { 3223 pcie_bus_t *cbus_p = PCIE_DIP2BUS(cdip); 3224 3225 if (cbus_p == NULL) { 3226 continue; 3227 } 3228 3229 if ((cbus_p->bus_bdf & PCIE_REQ_ID_FUNC_MASK) != 0) { 3230 continue; 3231 } 3232 3233 /* 3234 * It's possible that this can fire while a child is otherwise 3235 * only partially constructed. Therefore, if we don't have the 3236 * config handle, don't bother updating the child. 3237 */ 3238 if (cbus_p->bus_cfg_hdl == NULL) { 3239 continue; 3240 } 3241 3242 pcie_capture_speeds(cdip); 3243 break; 3244 } 3245 3246 se = sysevent_alloc(EC_PCIE, ESC_PCIE_LINK_STATE, 3247 ILLUMOS_KERN_PUB "pcie", SE_SLEEP); 3248 3249 (void) ddi_pathname(dip, bus_p->bus_lbw_pbuf); 3250 se_val.value_type = SE_DATA_TYPE_STRING; 3251 se_val.value.sv_string = bus_p->bus_lbw_pbuf; 3252 if (sysevent_add_attr(&ev_attr_list, PCIE_EV_DETECTOR_PATH, &se_val, 3253 SE_SLEEP) != 0) { 3254 ndi_devi_exit(dip, circular); 3255 goto err; 3256 } 3257 3258 if (cdip != NULL) { 3259 (void) ddi_pathname(cdip, bus_p->bus_lbw_cbuf); 3260 3261 se_val.value_type = SE_DATA_TYPE_STRING; 3262 se_val.value.sv_string = bus_p->bus_lbw_cbuf; 3263 3264 /* 3265 * If this fails, that's OK. We'd rather get the event off and 3266 * there's a chance that there may not be anything there for us. 3267 */ 3268 (void) sysevent_add_attr(&ev_attr_list, PCIE_EV_CHILD_PATH, 3269 &se_val, SE_SLEEP); 3270 } 3271 3272 ndi_devi_exit(dip, circular); 3273 3274 /* 3275 * Before we generate and send down a sysevent, we need to tell the 3276 * system that parts of the devinfo cache need to be invalidated. While 3277 * the function below takes several args, it ignores them all. Because 3278 * this is a global invalidation, we don't bother trying to do much more 3279 * than requesting a global invalidation, lest we accidentally kick off 3280 * several in a row. 3281 */ 3282 ddi_prop_cache_invalidate(DDI_DEV_T_NONE, NULL, NULL, 0); 3283 3284 if (sysevent_attach_attributes(se, ev_attr_list) != 0) { 3285 goto err; 3286 } 3287 ev_attr_list = NULL; 3288 3289 if (log_sysevent(se, SE_SLEEP, &eid) != 0) { 3290 goto err; 3291 } 3292 3293 err: 3294 sysevent_free_attr(ev_attr_list); 3295 sysevent_free(se); 3296 3297 mutex_enter(&bus_p->bus_lbw_mutex); 3298 bus_p->bus_lbw_state &= ~PCIE_LBW_S_RUNNING; 3299 cv_broadcast(&bus_p->bus_lbw_cv); 3300 again = (bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) != 0; 3301 mutex_exit(&bus_p->bus_lbw_mutex); 3302 3303 if (again) { 3304 goto top; 3305 } 3306 } 3307 3308 int 3309 pcie_link_bw_intr(dev_info_t *dip) 3310 { 3311 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3312 uint16_t linksts; 3313 uint16_t flags = PCIE_LINKSTS_LINK_BW_MGMT | PCIE_LINKSTS_AUTO_BW; 3314 3315 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3316 return (DDI_INTR_UNCLAIMED); 3317 } 3318 3319 linksts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3320 if ((linksts & flags) == 0) { 3321 return (DDI_INTR_UNCLAIMED); 3322 } 3323 3324 /* 3325 * Check if we've already dispatched this event. If we have already 3326 * dispatched it, then there's nothing else to do, we coalesce multiple 3327 * events. 3328 */ 3329 mutex_enter(&bus_p->bus_lbw_mutex); 3330 bus_p->bus_lbw_nevents++; 3331 if ((bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) == 0) { 3332 if ((bus_p->bus_lbw_state & PCIE_LBW_S_RUNNING) == 0) { 3333 taskq_dispatch_ent(pcie_link_tq, pcie_link_bw_taskq, 3334 dip, 0, &bus_p->bus_lbw_ent); 3335 } 3336 3337 bus_p->bus_lbw_state |= PCIE_LBW_S_DISPATCHED; 3338 } 3339 mutex_exit(&bus_p->bus_lbw_mutex); 3340 3341 PCIE_CAP_PUT(16, bus_p, PCIE_LINKSTS, flags); 3342 return (DDI_INTR_CLAIMED); 3343 } 3344 3345 int 3346 pcie_link_set_target(dev_info_t *dip, pcie_link_speed_t speed) 3347 { 3348 uint16_t ctl2, rval; 3349 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3350 3351 if (!PCIE_IS_PCIE(bus_p)) { 3352 return (ENOTSUP); 3353 } 3354 3355 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3356 return (ENOTSUP); 3357 } 3358 3359 if (bus_p->bus_pcie_vers < 2) { 3360 return (ENOTSUP); 3361 } 3362 3363 switch (speed) { 3364 case PCIE_LINK_SPEED_2_5: 3365 rval = PCIE_LINKCTL2_TARGET_SPEED_2_5; 3366 break; 3367 case PCIE_LINK_SPEED_5: 3368 rval = PCIE_LINKCTL2_TARGET_SPEED_5; 3369 break; 3370 case PCIE_LINK_SPEED_8: 3371 rval = PCIE_LINKCTL2_TARGET_SPEED_8; 3372 break; 3373 case PCIE_LINK_SPEED_16: 3374 rval = PCIE_LINKCTL2_TARGET_SPEED_16; 3375 break; 3376 case PCIE_LINK_SPEED_32: 3377 rval = PCIE_LINKCTL2_TARGET_SPEED_32; 3378 break; 3379 case PCIE_LINK_SPEED_64: 3380 rval = PCIE_LINKCTL2_TARGET_SPEED_64; 3381 break; 3382 default: 3383 return (EINVAL); 3384 } 3385 3386 mutex_enter(&bus_p->bus_speed_mutex); 3387 if ((bus_p->bus_sup_speed & speed) == 0) { 3388 mutex_exit(&bus_p->bus_speed_mutex); 3389 return (ENOTSUP); 3390 } 3391 3392 bus_p->bus_target_speed = speed; 3393 bus_p->bus_speed_flags |= PCIE_LINK_F_ADMIN_TARGET; 3394 3395 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 3396 ctl2 &= ~PCIE_LINKCTL2_TARGET_SPEED_MASK; 3397 ctl2 |= rval; 3398 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL2, ctl2); 3399 mutex_exit(&bus_p->bus_speed_mutex); 3400 3401 /* 3402 * Make sure our updates have been reflected in devinfo. 3403 */ 3404 pcie_capture_speeds(dip); 3405 3406 return (0); 3407 } 3408 3409 int 3410 pcie_link_retrain(dev_info_t *dip) 3411 { 3412 uint16_t ctl; 3413 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3414 3415 if (!PCIE_IS_PCIE(bus_p)) { 3416 return (ENOTSUP); 3417 } 3418 3419 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3420 return (ENOTSUP); 3421 } 3422 3423 /* 3424 * The PCIe specification suggests that we make sure that the link isn't 3425 * in training before issuing this command in case there was a state 3426 * machine transition prior to when we got here. We wait and then go 3427 * ahead and issue the command anyways. 3428 */ 3429 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3430 uint16_t sts; 3431 3432 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3433 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3434 break; 3435 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3436 } 3437 3438 ctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3439 ctl |= PCIE_LINKCTL_RETRAIN_LINK; 3440 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, ctl); 3441 3442 /* 3443 * Wait again to see if it clears before returning to the user. 3444 */ 3445 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3446 uint16_t sts; 3447 3448 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3449 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3450 break; 3451 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3452 } 3453 3454 return (0); 3455 } 3456 3457 /* 3458 * Here we're going through and grabbing information about a given PCIe device. 3459 * Our situation is a little bit complicated at this point. This gets invoked 3460 * both during early initialization and during hotplug events. We cannot rely on 3461 * the device node having been fully set up, that is, while the pcie_bus_t 3462 * normally contains a ddi_acc_handle_t for configuration space, that may not be 3463 * valid yet as this can occur before child initialization or we may be dealing 3464 * with a function that will never have a handle. 3465 * 3466 * However, we should always have a fully furnished pcie_bus_t, which means that 3467 * we can get its bdf and use that to access the devices configuration space. 3468 */ 3469 static int 3470 pcie_fabric_feature_scan(dev_info_t *dip, void *arg) 3471 { 3472 pcie_bus_t *bus_p; 3473 uint32_t devcap; 3474 uint16_t mps; 3475 dev_info_t *rcdip; 3476 pcie_fabric_data_t *fab = arg; 3477 3478 /* 3479 * Skip over non-PCIe devices. If we encounter something here, we don't 3480 * bother going through any of its children because we don't have reason 3481 * to believe that a PCIe device that this will impact will exist below 3482 * this. While it is possible that there's a PCIe fabric downstream an 3483 * intermediate old PCI/PCI-X bus, at that point, we'll still trigger 3484 * our complex fabric detection and use the minimums. 3485 * 3486 * The reason this doesn't trigger an immediate flagging as a complex 3487 * case like the one below is because we could be scanning a device that 3488 * is a nexus driver and has children already (albeit that would be 3489 * somewhat surprising as we don't anticipate being called at this 3490 * point). 3491 */ 3492 if (pcie_dev(dip) != DDI_SUCCESS) { 3493 return (DDI_WALK_PRUNECHILD); 3494 } 3495 3496 /* 3497 * If we fail to find a pcie_bus_t for some reason, that's somewhat 3498 * surprising. We log this fact and set the complex flag and indicate it 3499 * was because of this case. This immediately transitions us to a 3500 * "complex" case which means use the minimal, safe, settings. 3501 */ 3502 bus_p = PCIE_DIP2BUS(dip); 3503 if (bus_p == NULL) { 3504 dev_err(dip, CE_WARN, "failed to find associated pcie_bus_t " 3505 "during fabric scan"); 3506 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3507 return (DDI_WALK_TERMINATE); 3508 } 3509 3510 /* 3511 * In a similar case, there is hardware out there which is a PCIe 3512 * device, but does not advertise a PCIe capability. An example of this 3513 * is the IDT Tsi382A which can hide its PCIe capability. If this is 3514 * the case, we immediately terminate scanning and flag this as a 3515 * 'complex' case which causes us to use guaranteed safe settings. 3516 */ 3517 if (bus_p->bus_pcie_off == 0) { 3518 dev_err(dip, CE_WARN, "encountered PCIe device without PCIe " 3519 "capability"); 3520 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3521 return (DDI_WALK_TERMINATE); 3522 } 3523 3524 rcdip = pcie_get_rc_dip(dip); 3525 3526 /* 3527 * First, start by determining what the device's tagging and max packet 3528 * size is. All PCIe devices will always have the 8-bit tag information 3529 * as this has existed since PCIe 1.0. 10-bit tagging requires a V2 3530 * PCIe capability. 14-bit requires the DEV3 cap. If we are missing a 3531 * version or capability, then we always treat that as lacking the bits 3532 * in the fabric. 3533 */ 3534 ASSERT3U(bus_p->bus_pcie_off, !=, 0); 3535 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3536 PCIE_DEVCAP); 3537 mps = devcap & PCIE_DEVCAP_MAX_PAYLOAD_MASK; 3538 if (mps < fab->pfd_mps_found) { 3539 fab->pfd_mps_found = mps; 3540 } 3541 3542 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) == 0) { 3543 fab->pfd_tag_found &= ~PCIE_TAG_8B; 3544 } 3545 3546 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0) { 3547 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3548 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3549 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_COMP_SUP) == 0) { 3550 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3551 } 3552 } else { 3553 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3554 } 3555 3556 if (bus_p->bus_dev3_off != 0) { 3557 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3558 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3559 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_COMP_SUP) == 0) { 3560 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3561 } 3562 } else { 3563 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3564 } 3565 3566 /* 3567 * Now that we have captured device information, we must go and ask 3568 * questions of the topology here. The big theory statement enumerates 3569 * several types of cases. The big question we need to answer is have we 3570 * encountered a hotpluggable bridge that means we need to mark this as 3571 * complex. 3572 * 3573 * The big theory statement notes several different kinds of hotplug 3574 * topologies that exist that we can theoretically support. Right now we 3575 * opt to keep our lives simple and focus solely on (4) and (5). These 3576 * can both be summarized by a single, fairly straightforward rule: 3577 * 3578 * The only allowed hotpluggable entity is a root port. 3579 * 3580 * The reason that this can work and detect cases like (6), (7), and our 3581 * other invalid ones is that the hotplug code will scan and find all 3582 * children before we are called into here. 3583 */ 3584 if (bus_p->bus_hp_sup_modes != 0) { 3585 /* 3586 * We opt to terminate in this case because there's no value in 3587 * scanning the rest of the tree at this point. 3588 */ 3589 if (!PCIE_IS_RP(bus_p)) { 3590 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3591 return (DDI_WALK_TERMINATE); 3592 } 3593 3594 fab->pfd_flags |= PCIE_FABRIC_F_RP_HP; 3595 } 3596 3597 /* 3598 * As our walk starts at a root port, we need to make sure that we don't 3599 * pick up any of its siblings and their children as those would be 3600 * different PCIe fabric domains for us to scan. In many hardware 3601 * platforms multiple root ports are all at the same level in the tree. 3602 */ 3603 if (bus_p->bus_rp_dip == dip) { 3604 return (DDI_WALK_PRUNESIB); 3605 } 3606 3607 return (DDI_WALK_CONTINUE); 3608 } 3609 3610 static int 3611 pcie_fabric_feature_set(dev_info_t *dip, void *arg) 3612 { 3613 pcie_bus_t *bus_p; 3614 dev_info_t *rcdip; 3615 pcie_fabric_data_t *fab = arg; 3616 uint32_t devcap, devctl; 3617 3618 if (pcie_dev(dip) != DDI_SUCCESS) { 3619 return (DDI_WALK_PRUNECHILD); 3620 } 3621 3622 /* 3623 * The missing bus_t sent us into the complex case previously. We still 3624 * need to make sure all devices have values we expect here and thus 3625 * don't terminate like the above. The same is true for the case where 3626 * there is no PCIe capability. 3627 */ 3628 bus_p = PCIE_DIP2BUS(dip); 3629 if (bus_p == NULL || bus_p->bus_pcie_off == 0) { 3630 return (DDI_WALK_CONTINUE); 3631 } 3632 rcdip = pcie_get_rc_dip(dip); 3633 3634 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3635 PCIE_DEVCAP); 3636 devctl = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3637 PCIE_DEVCTL); 3638 3639 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) != 0 && 3640 (fab->pfd_tag_act & PCIE_TAG_8B) != 0) { 3641 devctl |= PCIE_DEVCTL_EXT_TAG_FIELD_EN; 3642 } 3643 3644 devctl &= ~PCIE_DEVCTL_MAX_PAYLOAD_MASK; 3645 ASSERT0(fab->pfd_mps_act & ~PCIE_DEVCAP_MAX_PAYLOAD_MASK); 3646 devctl |= fab->pfd_mps_act << PCIE_DEVCTL_MAX_PAYLOAD_SHIFT; 3647 3648 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3649 PCIE_DEVCTL, devctl); 3650 3651 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0 && 3652 (fab->pfd_tag_act & PCIE_TAG_10B_COMP) != 0) { 3653 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3654 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3655 3656 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_REQ_SUP) == 0) { 3657 uint16_t devctl2 = pci_cfgacc_get16(rcdip, 3658 bus_p->bus_bdf, bus_p->bus_pcie_off + PCIE_DEVCTL2); 3659 devctl2 |= PCIE_DEVCTL2_10B_TAG_REQ_EN; 3660 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3661 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl2); 3662 } 3663 } 3664 3665 if (bus_p->bus_dev3_off != 0 && 3666 (fab->pfd_tag_act & PCIE_TAG_14B_COMP) != 0) { 3667 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3668 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3669 3670 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_REQ_SUP) == 0) { 3671 uint16_t devctl3 = pci_cfgacc_get16(rcdip, 3672 bus_p->bus_bdf, bus_p->bus_dev3_off + PCIE_DEVCTL3); 3673 devctl3 |= PCIE_DEVCTL3_14B_TAG_REQ_EN; 3674 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3675 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl3); 3676 } 3677 } 3678 3679 /* 3680 * As our walk starts at a root port, we need to make sure that we don't 3681 * pick up any of its siblings and their children as those would be 3682 * different PCIe fabric domains for us to scan. In many hardware 3683 * platforms multiple root ports are all at the same level in the tree. 3684 */ 3685 if (bus_p->bus_rp_dip == dip) { 3686 return (DDI_WALK_PRUNESIB); 3687 } 3688 3689 return (DDI_WALK_CONTINUE); 3690 } 3691 3692 /* 3693 * This is used to scan and determine the total set of PCIe fabric settings that 3694 * we should have in the system for everything downstream of this specified root 3695 * port. Note, it is only really safe to call this while working from the 3696 * perspective of a root port as we will be walking down the entire device tree. 3697 * 3698 * However, our callers, particularly hoptlug, don't have all the information 3699 * we'd like. In particular, we need to check that: 3700 * 3701 * o This is actually a PCIe device. 3702 * o That this is a root port (see the big theory statement to understand this 3703 * constraint). 3704 */ 3705 void 3706 pcie_fabric_setup(dev_info_t *dip) 3707 { 3708 pcie_bus_t *bus_p; 3709 pcie_fabric_data_t *fab; 3710 dev_info_t *pdip; 3711 int circular_count; 3712 3713 bus_p = PCIE_DIP2BUS(dip); 3714 if (bus_p == NULL || !PCIE_IS_RP(bus_p)) { 3715 return; 3716 } 3717 3718 VERIFY3P(bus_p->bus_fab, !=, NULL); 3719 fab = bus_p->bus_fab; 3720 3721 /* 3722 * For us to call ddi_walk_devs(), our parent needs to be held. 3723 * ddi_walk_devs() will take care of grabbing our dip as part of its 3724 * walk before we iterate over our children. 3725 * 3726 * A reasonable question to ask here is why is it safe to ask for our 3727 * parent? In this case, because we have entered here through some 3728 * thread that's operating on us whether as part of attach or a hotplug 3729 * event, our dip somewhat by definition has to be valid. If we were 3730 * looking at our dip's children and then asking them for a parent, then 3731 * that would be a race condition. 3732 */ 3733 pdip = ddi_get_parent(dip); 3734 VERIFY3P(pdip, !=, NULL); 3735 ndi_devi_enter(pdip, &circular_count); 3736 fab->pfd_flags |= PCIE_FABRIC_F_SCANNING; 3737 3738 /* 3739 * Reinitialize the tracking structure to basically set the maximum 3740 * caps. These will be chipped away during the scan. 3741 */ 3742 fab->pfd_mps_found = PCIE_DEVCAP_MAX_PAYLOAD_4096; 3743 fab->pfd_tag_found = PCIE_TAG_ALL; 3744 fab->pfd_flags &= ~PCIE_FABRIC_F_COMPLEX; 3745 3746 ddi_walk_devs(dip, pcie_fabric_feature_scan, fab); 3747 3748 if ((fab->pfd_flags & PCIE_FABRIC_F_COMPLEX) != 0) { 3749 fab->pfd_tag_act = PCIE_TAG_5B; 3750 fab->pfd_mps_act = PCIE_DEVCAP_MAX_PAYLOAD_128; 3751 } else { 3752 fab->pfd_tag_act = fab->pfd_tag_found; 3753 fab->pfd_mps_act = fab->pfd_mps_found; 3754 } 3755 3756 ddi_walk_devs(dip, pcie_fabric_feature_set, fab); 3757 3758 fab->pfd_flags &= ~PCIE_FABRIC_F_SCANNING; 3759 ndi_devi_exit(pdip, circular_count); 3760 } 3761