1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 24 * Copyright 2019 Joyent, Inc. 25 * Copyright 2022 Oxide Computer Company 26 */ 27 28 /* 29 * PCIe Initialization 30 * ------------------- 31 * 32 * The PCIe subsystem is split about and initializes itself in a couple of 33 * different places. This is due to the platform-specific nature of initializing 34 * resources and the nature of the SPARC PROM and how that influenced the 35 * subsystem. Note that traditional PCI (mostly seen these days in Virtual 36 * Machines) follows most of the same basic path outlined here, but skips a 37 * large chunk of PCIe-specific initialization. 38 * 39 * First, there is an initial device discovery phase that is taken care of by 40 * the platform. This is where we discover the set of devices that are present 41 * at system power on. These devices may or may not be hot-pluggable. In 42 * particular, this happens in a platform-specific way right now. In general, we 43 * expect most discovery to be driven by scanning each bus, device, and 44 * function, and seeing what actually exists and responds to configuration space 45 * reads. This is driven via pci_boot.c on x86. This may be seeded by something 46 * like device tree, a PROM, supplemented with ACPI, or by knowledge that the 47 * underlying platform has. 48 * 49 * As a part of this discovery process, the full set of resources that exist in 50 * the system for PCIe are: 51 * 52 * o PCI buses 53 * o Prefetchable Memory 54 * o Non-prefetchable memory 55 * o I/O ports 56 * 57 * This process is driven by a platform's PCI platform Resource Discovery (PRD) 58 * module. The PRD definitions can be found in <sys/plat/pci_prd.h> and are used 59 * to discover these resources, which will be converted into the initial set of 60 * the standard properties in the system: 'regs', 'available', 'ranges', etc. 61 * Currently it is up to platform-specific code (which should ideally be 62 * consolidated at some point) to set up all these properties. 63 * 64 * As a part of the discovery process, the platform code will create a device 65 * node (dev_info_t) for each discovered function and will create a PCIe nexus 66 * for each overall root complex that exists in the system. Most root complexes 67 * will have multiple root ports, each of which is the foundation of an 68 * independent PCIe bus due to the point-to-point nature of PCIe. When a root 69 * complex is found, a nexus driver such as npe (Nexus for PCIe Express) is 70 * attached. In the case of a non-PCIe-capable system this is where the older 71 * pci nexus driver would be used instead. 72 * 73 * To track data about a given device on a bus, a 'pcie_bus_t' structure is 74 * created for and assigned to every PCIe-based dev_info_t. This can be used to 75 * find the root port and get basic information about the device, its faults, 76 * and related information. This contains pointers to the corresponding root 77 * port as well. 78 * 79 * A root complex has its pcie_bus_t initialized as part of the device discovery 80 * process. That is, because we're trying to bootstrap the actual tree and most 81 * platforms don't have a representation for this that's explicitly 82 * discoverable, this is created manually. See callers of pcie_rc_init_bus(). 83 * 84 * For other devices, bridges, and switches, the process is split into two. 85 * There is an initial pcie_bus_t that is created which will exist before we go 86 * through the actual driver attachment process. For example, on x86 this is 87 * done as part of the device and function discovery. The second pass of 88 * initialization is done only after the nexus driver actually is attached and 89 * it goes through and finishes processing all of its children. 90 * 91 * Child Initialization 92 * -------------------- 93 * 94 * Generally speaking, the platform will first enumerate all PCIe devices that 95 * are in the sytem before it actually creates a device tree. This is part of 96 * the bus/device/function scanning that is performed and from that dev_info_t 97 * nodes are created for each discovered device and are inserted into the 98 * broader device tree. Later in boot, the actual device tree is walked and the 99 * nodes go through the standard dev_info_t initialization process (DS_PROTO, 100 * DS_LINKED, DS_BOUND, etc.). 101 * 102 * PCIe-specific initialization can roughly be broken into the following pieces: 103 * 104 * 1. Platform initial discovery and resource assignment 105 * 2. The pcie_bus_t initialization 106 * 3. Nexus driver child initialization 107 * 4. Fabric initialization 108 * 5. Device driver-specific initialization 109 * 110 * The first part of this (1) and (2) are discussed in the previous section. 111 * Part (1) in particular is a combination of the PRD (platform resource 112 * discovery) and general device initialization. After this, because we have a 113 * device tree, most of the standard nexus initialization happens. 114 * 115 * (5) is somewhat simple, so let's get into it before we discuss (3) and (4). 116 * This is the last thing that is called and that happens after all of the 117 * others are done. This is the logic that occurs in a driver's attach(9E) entry 118 * point. This is always device-specific and generally speaking should not be 119 * manipulating standard PCIe registers directly on their own. For example, the 120 * MSI/MSI-X, AER, Serial Number, etc. capabilities will be automatically dealt 121 * with by the framework in (3) and (4) below. In many cases, particularly 122 * things that are part of (4), adjusting them in the individual driver is not 123 * safe. 124 * 125 * Finally, let's talk about (3) and (4) as these are related. The NDI provides 126 * for a standard hook for a nexus to initialize its children. In our platforms, 127 * there are basically two possible PCIe nexus drivers: there is the generic 128 * pcieb -- PCIe bridge -- driver which is used for standard root ports, 129 * switches, etc. Then there is the platform-specific primary nexus driver, 130 * which is being slowly consolidated into a single one where it makes sense. An 131 * example of this is npe. 132 * 133 * Each of these has a child initialization function which is called from their 134 * DDI_CTLOPS_INITCHILD operation on the bus_ctl function pointer. This goes 135 * through and initializes a large number of different pieces of PCIe-based 136 * settings through the common pcie_initchild() function. This takes care of 137 * things like: 138 * 139 * o Advanced Error Reporting 140 * o Alternative Routing 141 * o Capturing information around link speed, width, serial numbers, etc. 142 * o Setting common properties around aborts 143 * 144 * There are a few caveats with this that need to be kept in mind: 145 * 146 * o A dev_info_t indicates a specific function. This means that a 147 * multi-function device will not all be initialized at the same time and 148 * there is no guarantee that all children will be initialized before one of 149 * them is attached. 150 * o A child is only initialized if we have found a driver that matches an 151 * alias in the dev_info_t's compatible array property. While a lot of 152 * multi-function devices are often multiple instances of the same thing 153 * (e.g. a multi-port NIC with a function / NIC), this is not always the 154 * case and one cannot make any assumptions here. 155 * 156 * This in turn leads to the next form of initialization that takes place in the 157 * case of (4). This is where we take care of things that need to be consistent 158 * across either entire devices or more generally across an entire root port and 159 * all of its children. There are a few different examples of this: 160 * 161 * o Setting the maximum packet size 162 * o Determining the tag width 163 * 164 * Note that features which are only based on function 0, such as ASPM (Active 165 * State Power Management), hardware autonomous width disable, etc. ultimately 166 * do not go through this path today. There are some implications here in that 167 * today several of these things are captured on functions which may not have 168 * any control here. This is an area of needed improvement. 169 * 170 * The settings in (4) are initialized in a common way, via 171 * pcie_fabric_setup(). This is called into from two different parts of 172 * the stack: 173 * 174 * 1. When we attach a root port, which is driven by pcieb. 175 * 2. When we have a hotplug event that adds a device. 176 * 177 * In general here we are going to use the term 'fabric' to refer to everything 178 * that is downstream of a root port. This corresponds to what the PCIe 179 * specification calls a 'hierarchy domain'. Strictly speaking, this is fine 180 * until peer-to-peer requests begin to happen that cause you to need to forward 181 * things across root ports. At that point the scope of the fabric increases and 182 * these settings become more complicated. We currently optimize for the much 183 * more common case, which is that each root port is effectively independent 184 * from a PCIe transaction routing perspective. 185 * 186 * Put differently, we use the term 'fabric' to refer to a set of PCIe devices 187 * that can route transactions to one another, which is generally constrained to 188 * everything under a root port and that root ports are independent. If this 189 * constraint changes, then all one needs to do is replace the discussion of the 190 * root port below with the broader root complex and system. 191 * 192 * A challenge with these settings is that once they're set and devices are 193 * actively making requests, we cannot really change them without resetting the 194 * links and cancelling all outstanding transactions via device resets. Because 195 * this is not something that we want to do, we instead look at how and when we 196 * set this to constrain what's going on. 197 * 198 * Because of this we basically say that if a given fabric has more than one 199 * hot-plug capable device that's encountered, then we have to use safe defaults 200 * (which we can allow an operator to tune eventually via pcieadm). If we have a 201 * mix of non-hotpluggable slots with downstream endpoints present and 202 * hot-pluggable slots, then we're in this case. If we don't have hot-pluggable 203 * slots, then we can have an arbitrarily complex setup. Let's look at a few of 204 * these visually: 205 * 206 * In the following diagrams, RP stands for Root Port, EP stands for Endpoint. 207 * If something is hot-pluggable, then we label it with (HP). 208 * 209 * (1) RP --> EP 210 * (2) RP --> Switch --> EP 211 * +--> EP 212 * +--> EP 213 * 214 * (3) RP --> Switch --> EP 215 * +--> EP 216 * +--> Switch --> EP 217 * +--> EP 218 * +--> EP 219 * 220 * 221 * (4) RP (HP) --> EP 222 * (5) RP (HP) --> Switch --> EP 223 * +--> EP 224 * +--> EP 225 * 226 * (6) RP --> Switch (HP) --> EP 227 * (7) RP (HP) --> Switch (HP) --> EP 228 * 229 * If we look at all of these, these are all cases where it's safe for us to set 230 * things based on all devices. (1), (2), and (3) are straightforward because 231 * they have no hot-pluggable elements. This means that nothing should come/go 232 * on the system and we can set up fabric-wide properties as part of the root 233 * port. 234 * 235 * Case (4) is the most standard one that we encounter for hot-plug. Here you 236 * have a root port directly connected to an endpoint. The most common example 237 * would be an NVMe device plugged into a root port. Case (5) is interesting to 238 * highlight. While there is a switch and multiple endpoints there, they are 239 * showing up as a unit. This ends up being a weirder variant of (4), but it is 240 * safe for us to set advanced properties because we can figure out what the 241 * total set should be. 242 * 243 * Now, the more interesting bits here are (6) and (7). The reason that (6) 244 * works is that ultimately there is only a single down-stream port here that is 245 * hot-pluggable and all non-hotpluggable ports do not have a device present, 246 * which suggests that they will never have a device present. (7) also could be 247 * made to work by making the observation that if there's truly only one 248 * endpoint in a fabric, it doesn't matter how many switches there are that are 249 * hot-pluggable. This would only hold if we can assume for some reason that no 250 * other endpoints could be added. 251 * 252 * In turn, let's look at several cases that we believe aren't safe: 253 * 254 * (8) RP --> Switch --> EP 255 * +--> EP 256 * (HP) +--> EP 257 * 258 * (9) RP --> Switch (HP) +--> EP 259 * (HP) +--> EP 260 * 261 * (10) RP (HP) --> Switch (HP) +--> EP 262 * (HP) +--> EP 263 * 264 * All of these are situations where it's much more explicitly unsafe. Let's 265 * take (8). The problem here is that the devices on the non-hotpluggable 266 * downstream switches are always there and we should assume all device drivers 267 * will be active and performing I/O when the hot-pluggable slot changes. If the 268 * hot-pluggable slot has a lower max payload size, then we're mostly out of 269 * luck. The case of (9) is very similar to (8), just that we have more hot-plug 270 * capable slots. 271 * 272 * Finally (10) is a case of multiple instances of hotplug. (9) and (10) are the 273 * more general case of (6) and (7). While we can try to detect (6) and (7) more 274 * generally or try to make it safe, we're going to start with a simpler form of 275 * detection for this, which roughly follows the following rules: 276 * 277 * o If there are no hot-pluggable slots in an entire fabric, then we can set 278 * all fabric properties based on device capabilities. 279 * o If we encounter a hot-pluggable slot, we can only set fabric properties 280 * based on device capabilities if: 281 * 282 * 1. The hotpluggable slot is a root port. 283 * 2. There are no other hotpluggable devices downstream of it. 284 * 285 * Otherwise, if neither of the above is true, then we must use the basic PCIe 286 * defaults for various fabric-wide properties (discussed below). Even in these 287 * more complicated cases, device-specific properties such as the configuration 288 * of AERs, ASPM, etc. are still handled in the general pcie_init_bus() and 289 * related discussed earlier here. 290 * 291 * Because the only fabrics that we'll change are those that correspond to root 292 * ports, we will only call into the actual fabric feature setup when one of 293 * those changes. This has the side effect of simplifying locking. When we make 294 * changes here we need to be able to hold the entire device tree under the root 295 * port (including the root port and its parent). This is much harder to do 296 * safely when starting in the middle of the tree. 297 * 298 * Handling of Specific Properties 299 * ------------------------------- 300 * 301 * This section goes into the rationale behind how we initialize and program 302 * various parts of the PCIe stack. 303 * 304 * 5-, 8-, 10- AND 14-BIT TAGS 305 * 306 * Tags are part of PCIe transactions and when combined with a device identifier 307 * are used to uniquely identify a transaction. In PCIe parlance, a Requester 308 * (someone who initiates a PCIe request) sets a unique tag in the request and 309 * the Completer (someone who processes and responds to a PCIe request) echoes 310 * the tag back. This means that a requester generally is responsible for 311 * ensuring that they don't reuse a tag between transactions. 312 * 313 * Thus the number of tags that a device has relates to the number of 314 * outstanding transactions that it can have, which are usually tied to the 315 * number of outstanding DMA transfers. The size of these transactions is also 316 * then scoped by the handling of the Maximum Packet Payload. 317 * 318 * In PCIe 1.0, devices default to a 5-bit tag. There was also an option to 319 * support an 8-bit tag. The 8-bit extended tag did not distinguish between a 320 * Requester or Completer. There was a bit to indicate device support of 8-bit 321 * tags in the Device Capabilities Register of the PCIe Capability and a 322 * separate bit to enable it in the Device Control Register of the PCIe 323 * Capability. 324 * 325 * In PCIe 4.0, support for a 10-bit tag was added. The specification broke 326 * apart the support bit into multiple pieces. In particular, in the Device 327 * Capabilities 2 register of the PCIe Capability there is a separate bit to 328 * indicate whether the device supports 10-bit completions and 10-bit requests. 329 * All PCIe 4.0 compliant devices are required to support 10-bit tags if they 330 * operate at 16.0 GT/s speed (a PCIe Gen 4 compliant device does not have to 331 * operate at Gen 4 speeds). 332 * 333 * This allows a device to support 10-bit completions but not 10-bit requests. 334 * A device that supports 10-bit requests is required to support 10-bit 335 * completions. There is no ability to enable or disable 10-bit completion 336 * support in the Device Capabilities 2 register. There is only a bit to enable 337 * 10-bit requests. This distinction makes our life easier as this means that as 338 * long as the entire fabric supports 10-bit completions, it doesn't matter if 339 * not all devices support 10-bit requests and we can enable them as required. 340 * More on this in a bit. 341 * 342 * In PCIe 6.0, another set of bits was added for 14-bit tags. These follow the 343 * same pattern as the 10-bit tags. The biggest difference is that the 344 * capabilities and control for these are found in the Device Capabilities 3 345 * and Device Control 3 register of the Device 3 Extended Capability. Similar to 346 * what we see with 10-bit tags, requesters are required to support the 347 * completer capability. The only control bit is for whether or not they enable 348 * a 14-bit requester. 349 * 350 * PCIe switches which sit between root ports and endpoints and show up to 351 * software as a set of bridges. Bridges generally don't have to know about tags 352 * as they are usually neither requesters or completers (unless directly talking 353 * to the bridge instance). That is they are generally required to forward 354 * packets without modifying them. This works until we deal with switch error 355 * handling. At that point, the switch may try to interpret the transaction and 356 * if it doesn't understand the tagging scheme in use, return the transaction to 357 * with the wrong tag and also an incorrectly diagnosed error (usually a 358 * malformed TLP). 359 * 360 * With all this, we construct a somewhat simple policy of how and when we 361 * enable extended tags: 362 * 363 * o If we have a complex hotplug-capable fabric (based on the discussion 364 * earlier in fabric-specific settings), then we cannot enable any of the 365 * 8-bit, 10-bit, and 14-bit tagging features. This is due to the issues 366 * with intermediate PCIe switches and related. 367 * 368 * o If every device supports 8-bit capable tags, then we will go through and 369 * enable those everywhere. 370 * 371 * o If every device supports 10-bit capable completions, then we will enable 372 * 10-bit requester on every device that supports it. 373 * 374 * o If every device supports 14-bit capable completions, then we will enable 375 * 14-bit requesters on every device that supports it. 376 * 377 * This is the simpler end of the policy and one that is relatively easy to 378 * implement. While we could attempt to relax the constraint that every device 379 * in the fabric implement these features by making assumptions about peer-to- 380 * peer requests (that is devices at the same layer in the tree won't talk to 381 * one another), that is a lot of complexity. For now, we leave such an 382 * implementation to those who need it in the future. 383 * 384 * MAX PAYLOAD SIZE 385 * 386 * When performing transactions on the PCIe bus, a given transaction has a 387 * maximum allowed size. This size is called the MPS or 'Maximum Payload Size'. 388 * A given device reports its maximum supported size in the Device Capabilities 389 * register of the PCIe Capability. It is then set in the Device Control 390 * register. 391 * 392 * One of the challenges with this value is that different functions of a device 393 * have independent values, but strictly speaking are required to actually have 394 * the same value programmed in all of them lest device behavior goes awry. When 395 * a device has the ARI (alternative routing ID) capability enabled, then only 396 * function 0 controls the actual payload size. 397 * 398 * The settings for this need to be consistent throughout the fabric. A 399 * Transmitter is not allowed to create a TLP that exceeds its maximum packet 400 * size and a Receiver is not allowed to receive a packet that exceeds its 401 * maximum packet size. In all of these cases, this would result in something 402 * like a malformed TLP error. 403 * 404 * Effectively, this means that everything on a given fabric must have the same 405 * value programmed in its Device Control register for this value. While in the 406 * case of tags, switches generally weren't completers or requesters, here every 407 * device along the path is subject to this. This makes the actual value that we 408 * set throughout the fabric even more important and the constraints of hotplug 409 * even worse to deal with. 410 * 411 * Because a hotplug device can be inserted with any packet size, if we hit 412 * anything other than the simple hotplug cases discussed in the fabric-specific 413 * settings section, then we must use the smallest size of 128 byte payloads. 414 * This is because a device could be plugged in that supports something smaller 415 * than we had otherwise set. If there are other active devices, those could not 416 * be changed without quiescing the entire fabric. As such our algorithm is as 417 * follows: 418 * 419 * 1. Scan the entire fabric, keeping track of the smallest seen MPS in the 420 * Device Capabilities Register. 421 * 2. If we have a complex fabric, program each Device Control register with 422 * a 128 byte maximum payload size, otherwise, program it with the 423 * discovered value. 424 * 425 * 426 * MAX READ REQUEST SIZE 427 * 428 * The maximum read request size (mrrs) is a much more confusing thing when 429 * compared to the maximum payload size counterpart. The maximum payload size 430 * (MPS) above is what restricts the actual size of a TLP. The mrrs value 431 * is used to control part of the behavior of Memory Read Request, which is not 432 * strictly speaking subject to the MPS. A PCIe device is allowed to respond to 433 * a Memory Read Request with less bytes than were actually requested in a 434 * single completion. In general, the default size that a root complex and its 435 * root port will reply to are based around the length of a cache line. 436 * 437 * What this ultimately controls is the number of requests that the Requester 438 * has to make and trades off bandwidth, bus sharing, and related here. For 439 * example, if the maximum read request size is 4 KiB, then the requester would 440 * only issue a single read request asking for 4 KiB. It would still receive 441 * these as multiple packets in units of the MPS. If however, the maximum read 442 * request was only say 512 B, then it would need to make 8 separate requests, 443 * potentially increasing latency. On the other hand, if systems are relying on 444 * total requests for QoS, then it's important to set it to something that's 445 * closer to the actual MPS. 446 * 447 * Traditionally, the OS has not been the most straightforward about this. It's 448 * important to remember that setting this up is also somewhat in the realm of 449 * system firmware. Due to the PCI Firmware specification, the firmware may have 450 * set up a value for not just the MRRS but also the MPS. As such, our logic 451 * basically left the MRRS alone and used whatever the device had there as long 452 * as we weren't shrinking the device's MPS. If we were, then we'd set it to the 453 * MPS. If the device was a root port, then it was just left at a system wide 454 * and PCIe default of 512 bytes. 455 * 456 * If we survey firmware (which isn't easy due to its nature), we have seen most 457 * cases where the firmware just doesn't do anything and leaves it to the 458 * device's default, which is basically just the PCIe default, unless it has a 459 * specific knowledge of something like say wanting to do something for an NVMe 460 * device. The same is generally true of other systems, leaving it at its 461 * default unless otherwise set by a device driver. 462 * 463 * Because this value doesn't really have the same constraints as other fabric 464 * properties, this becomes much simpler and we instead opt to set it as part of 465 * the device node initialization. In addition, there are no real rules about 466 * different functions having different values here as it doesn't really impact 467 * the TLP processing the same way that the MPS does. 468 * 469 * While we should add a fuller way of setting this and allowing operator 470 * override of the MRRS based on things like device class, etc. that is driven 471 * by pcieadm, that is left to the future. For now we opt to that all devices 472 * are kept at their default (512 bytes or whatever firmware left behind) and we 473 * ensure that root ports always have the mrrs set to 512. 474 */ 475 476 #include <sys/sysmacros.h> 477 #include <sys/types.h> 478 #include <sys/kmem.h> 479 #include <sys/modctl.h> 480 #include <sys/ddi.h> 481 #include <sys/sunddi.h> 482 #include <sys/sunndi.h> 483 #include <sys/fm/protocol.h> 484 #include <sys/fm/util.h> 485 #include <sys/promif.h> 486 #include <sys/disp.h> 487 #include <sys/stat.h> 488 #include <sys/file.h> 489 #include <sys/pci_cap.h> 490 #include <sys/pci_impl.h> 491 #include <sys/pcie_impl.h> 492 #include <sys/hotplug/pci/pcie_hp.h> 493 #include <sys/hotplug/pci/pciehpc.h> 494 #include <sys/hotplug/pci/pcishpc.h> 495 #include <sys/hotplug/pci/pcicfg.h> 496 #include <sys/pci_cfgacc.h> 497 #include <sys/sysevent.h> 498 #include <sys/sysevent/eventdefs.h> 499 #include <sys/sysevent/pcie.h> 500 501 /* Local functions prototypes */ 502 static void pcie_init_pfd(dev_info_t *); 503 static void pcie_fini_pfd(dev_info_t *); 504 505 #if defined(__x86) 506 static void pcie_check_io_mem_range(ddi_acc_handle_t, boolean_t *, boolean_t *); 507 #endif /* defined(__x86) */ 508 509 #ifdef DEBUG 510 uint_t pcie_debug_flags = 0; 511 static void pcie_print_bus(pcie_bus_t *bus_p); 512 void pcie_dbg(char *fmt, ...); 513 #endif /* DEBUG */ 514 515 /* Variable to control default PCI-Express config settings */ 516 ushort_t pcie_command_default = 517 PCI_COMM_SERR_ENABLE | 518 PCI_COMM_WAIT_CYC_ENAB | 519 PCI_COMM_PARITY_DETECT | 520 PCI_COMM_ME | 521 PCI_COMM_MAE | 522 PCI_COMM_IO; 523 524 /* xxx_fw are bits that are controlled by FW and should not be modified */ 525 ushort_t pcie_command_default_fw = 526 PCI_COMM_SPEC_CYC | 527 PCI_COMM_MEMWR_INVAL | 528 PCI_COMM_PALETTE_SNOOP | 529 PCI_COMM_WAIT_CYC_ENAB | 530 0xF800; /* Reserved Bits */ 531 532 ushort_t pcie_bdg_command_default_fw = 533 PCI_BCNF_BCNTRL_ISA_ENABLE | 534 PCI_BCNF_BCNTRL_VGA_ENABLE | 535 0xF000; /* Reserved Bits */ 536 537 /* PCI-Express Base error defaults */ 538 ushort_t pcie_base_err_default = 539 PCIE_DEVCTL_CE_REPORTING_EN | 540 PCIE_DEVCTL_NFE_REPORTING_EN | 541 PCIE_DEVCTL_FE_REPORTING_EN | 542 PCIE_DEVCTL_UR_REPORTING_EN; 543 544 /* PCI-Express Device Control Register */ 545 uint16_t pcie_devctl_default = PCIE_DEVCTL_RO_EN | 546 PCIE_DEVCTL_MAX_READ_REQ_512; 547 548 /* PCI-Express AER Root Control Register */ 549 #define PCIE_ROOT_SYS_ERR (PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | \ 550 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | \ 551 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN) 552 553 ushort_t pcie_root_ctrl_default = 554 PCIE_ROOTCTL_SYS_ERR_ON_CE_EN | 555 PCIE_ROOTCTL_SYS_ERR_ON_NFE_EN | 556 PCIE_ROOTCTL_SYS_ERR_ON_FE_EN; 557 558 /* PCI-Express Root Error Command Register */ 559 ushort_t pcie_root_error_cmd_default = 560 PCIE_AER_RE_CMD_CE_REP_EN | 561 PCIE_AER_RE_CMD_NFE_REP_EN | 562 PCIE_AER_RE_CMD_FE_REP_EN; 563 564 /* ECRC settings in the PCIe AER Control Register */ 565 uint32_t pcie_ecrc_value = 566 PCIE_AER_CTL_ECRC_GEN_ENA | 567 PCIE_AER_CTL_ECRC_CHECK_ENA; 568 569 /* 570 * If a particular platform wants to disable certain errors such as UR/MA, 571 * instead of using #defines have the platform's PCIe Root Complex driver set 572 * these masks using the pcie_get_XXX_mask and pcie_set_XXX_mask functions. For 573 * x86 the closest thing to a PCIe root complex driver is NPE. For SPARC the 574 * closest PCIe root complex driver is PX. 575 * 576 * pcie_serr_disable_flag : disable SERR only (in RCR and command reg) x86 577 * systems may want to disable SERR in general. For root ports, enabling SERR 578 * causes NMIs which are not handled and results in a watchdog timeout error. 579 */ 580 uint32_t pcie_aer_uce_mask = 0; /* AER UE Mask */ 581 uint32_t pcie_aer_ce_mask = 0; /* AER CE Mask */ 582 uint32_t pcie_aer_suce_mask = 0; /* AER Secondary UE Mask */ 583 uint32_t pcie_serr_disable_flag = 0; /* Disable SERR */ 584 585 /* Default severities needed for eversholt. Error handling doesn't care */ 586 uint32_t pcie_aer_uce_severity = PCIE_AER_UCE_MTLP | PCIE_AER_UCE_RO | \ 587 PCIE_AER_UCE_FCP | PCIE_AER_UCE_SD | PCIE_AER_UCE_DLP | \ 588 PCIE_AER_UCE_TRAINING; 589 uint32_t pcie_aer_suce_severity = PCIE_AER_SUCE_SERR_ASSERT | \ 590 PCIE_AER_SUCE_UC_ADDR_ERR | PCIE_AER_SUCE_UC_ATTR_ERR | \ 591 PCIE_AER_SUCE_USC_MSG_DATA_ERR; 592 593 int pcie_disable_ari = 0; 594 595 /* 596 * On some platforms, such as the AMD B450 chipset, we've seen an odd 597 * relationship between enabling link bandwidth notifications and AERs about 598 * ECRC errors. This provides a mechanism to disable it. 599 */ 600 int pcie_disable_lbw = 0; 601 602 /* 603 * Amount of time to wait for an in-progress retraining. The default is to try 604 * 500 times in 10ms chunks, thus a total of 5s. 605 */ 606 uint32_t pcie_link_retrain_count = 500; 607 uint32_t pcie_link_retrain_delay_ms = 10; 608 609 taskq_t *pcie_link_tq; 610 kmutex_t pcie_link_tq_mutex; 611 612 static void pcie_scan_mps(dev_info_t *rc_dip, dev_info_t *dip, 613 int *max_supported); 614 static int pcie_get_max_supported(dev_info_t *dip, void *arg); 615 static int pcie_map_phys(dev_info_t *dip, pci_regspec_t *phys_spec, 616 caddr_t *addrp, ddi_acc_handle_t *handlep); 617 static void pcie_unmap_phys(ddi_acc_handle_t *handlep, pci_regspec_t *ph); 618 static int pcie_link_bw_intr(dev_info_t *); 619 static void pcie_capture_speeds(dev_info_t *); 620 621 dev_info_t *pcie_get_rc_dip(dev_info_t *dip); 622 623 /* 624 * modload support 625 */ 626 627 static struct modlmisc modlmisc = { 628 &mod_miscops, /* Type of module */ 629 "PCI Express Framework Module" 630 }; 631 632 static struct modlinkage modlinkage = { 633 MODREV_1, 634 (void *)&modlmisc, 635 NULL 636 }; 637 638 /* 639 * Global Variables needed for a non-atomic version of ddi_fm_ereport_post. 640 * Currently used to send the pci.fabric ereports whose payload depends on the 641 * type of PCI device it is being sent for. 642 */ 643 char *pcie_nv_buf; 644 nv_alloc_t *pcie_nvap; 645 nvlist_t *pcie_nvl; 646 647 int 648 _init(void) 649 { 650 int rval; 651 652 pcie_nv_buf = kmem_alloc(ERPT_DATA_SZ, KM_SLEEP); 653 pcie_nvap = fm_nva_xcreate(pcie_nv_buf, ERPT_DATA_SZ); 654 pcie_nvl = fm_nvlist_create(pcie_nvap); 655 mutex_init(&pcie_link_tq_mutex, NULL, MUTEX_DRIVER, NULL); 656 657 if ((rval = mod_install(&modlinkage)) != 0) { 658 mutex_destroy(&pcie_link_tq_mutex); 659 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 660 fm_nva_xdestroy(pcie_nvap); 661 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 662 } 663 return (rval); 664 } 665 666 int 667 _fini() 668 { 669 int rval; 670 671 if ((rval = mod_remove(&modlinkage)) == 0) { 672 if (pcie_link_tq != NULL) { 673 taskq_destroy(pcie_link_tq); 674 } 675 mutex_destroy(&pcie_link_tq_mutex); 676 fm_nvlist_destroy(pcie_nvl, FM_NVA_RETAIN); 677 fm_nva_xdestroy(pcie_nvap); 678 kmem_free(pcie_nv_buf, ERPT_DATA_SZ); 679 } 680 return (rval); 681 } 682 683 int 684 _info(struct modinfo *modinfop) 685 { 686 return (mod_info(&modlinkage, modinfop)); 687 } 688 689 /* ARGSUSED */ 690 int 691 pcie_init(dev_info_t *dip, caddr_t arg) 692 { 693 int ret = DDI_SUCCESS; 694 695 /* 696 * Our _init function is too early to create a taskq. Create the pcie 697 * link management taskq here now instead. 698 */ 699 mutex_enter(&pcie_link_tq_mutex); 700 if (pcie_link_tq == NULL) { 701 pcie_link_tq = taskq_create("pcie_link", 1, minclsyspri, 0, 0, 702 0); 703 } 704 mutex_exit(&pcie_link_tq_mutex); 705 706 707 /* 708 * Create a "devctl" minor node to support DEVCTL_DEVICE_* 709 * and DEVCTL_BUS_* ioctls to this bus. 710 */ 711 if ((ret = ddi_create_minor_node(dip, "devctl", S_IFCHR, 712 PCI_MINOR_NUM(ddi_get_instance(dip), PCI_DEVCTL_MINOR), 713 DDI_NT_NEXUS, 0)) != DDI_SUCCESS) { 714 PCIE_DBG("Failed to create devctl minor node for %s%d\n", 715 ddi_driver_name(dip), ddi_get_instance(dip)); 716 717 return (ret); 718 } 719 720 if ((ret = pcie_hp_init(dip, arg)) != DDI_SUCCESS) { 721 /* 722 * On some x86 platforms, we observed unexpected hotplug 723 * initialization failures in recent years. The known cause 724 * is a hardware issue: while the problem PCI bridges have 725 * the Hotplug Capable registers set, the machine actually 726 * does not implement the expected ACPI object. 727 * 728 * We don't want to stop PCI driver attach and system boot 729 * just because of this hotplug initialization failure. 730 * Continue with a debug message printed. 731 */ 732 PCIE_DBG("%s%d: Failed setting hotplug framework\n", 733 ddi_driver_name(dip), ddi_get_instance(dip)); 734 735 #if defined(__sparc) 736 ddi_remove_minor_node(dip, "devctl"); 737 738 return (ret); 739 #endif /* defined(__sparc) */ 740 } 741 742 return (DDI_SUCCESS); 743 } 744 745 /* ARGSUSED */ 746 int 747 pcie_uninit(dev_info_t *dip) 748 { 749 int ret = DDI_SUCCESS; 750 751 if (pcie_ari_is_enabled(dip) == PCIE_ARI_FORW_ENABLED) 752 (void) pcie_ari_disable(dip); 753 754 if ((ret = pcie_hp_uninit(dip)) != DDI_SUCCESS) { 755 PCIE_DBG("Failed to uninitialize hotplug for %s%d\n", 756 ddi_driver_name(dip), ddi_get_instance(dip)); 757 758 return (ret); 759 } 760 761 if (pcie_link_bw_supported(dip)) { 762 (void) pcie_link_bw_disable(dip); 763 } 764 765 ddi_remove_minor_node(dip, "devctl"); 766 767 return (ret); 768 } 769 770 /* 771 * PCIe module interface for enabling hotplug interrupt. 772 * 773 * It should be called after pcie_init() is done and bus driver's 774 * interrupt handlers have being attached. 775 */ 776 int 777 pcie_hpintr_enable(dev_info_t *dip) 778 { 779 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 780 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 781 782 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 783 (void) (ctrl_p->hc_ops.enable_hpc_intr)(ctrl_p); 784 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 785 (void) pcishpc_enable_irqs(ctrl_p); 786 } 787 return (DDI_SUCCESS); 788 } 789 790 /* 791 * PCIe module interface for disabling hotplug interrupt. 792 * 793 * It should be called before pcie_uninit() is called and bus driver's 794 * interrupt handlers is dettached. 795 */ 796 int 797 pcie_hpintr_disable(dev_info_t *dip) 798 { 799 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 800 pcie_hp_ctrl_t *ctrl_p = PCIE_GET_HP_CTRL(dip); 801 802 if (PCIE_IS_PCIE_HOTPLUG_ENABLED(bus_p)) { 803 (void) (ctrl_p->hc_ops.disable_hpc_intr)(ctrl_p); 804 } else if (PCIE_IS_PCI_HOTPLUG_ENABLED(bus_p)) { 805 (void) pcishpc_disable_irqs(ctrl_p); 806 } 807 return (DDI_SUCCESS); 808 } 809 810 /* ARGSUSED */ 811 int 812 pcie_intr(dev_info_t *dip) 813 { 814 int hp, lbw; 815 816 hp = pcie_hp_intr(dip); 817 lbw = pcie_link_bw_intr(dip); 818 819 if (hp == DDI_INTR_CLAIMED || lbw == DDI_INTR_CLAIMED) { 820 return (DDI_INTR_CLAIMED); 821 } 822 823 return (DDI_INTR_UNCLAIMED); 824 } 825 826 /* ARGSUSED */ 827 int 828 pcie_open(dev_info_t *dip, dev_t *devp, int flags, int otyp, cred_t *credp) 829 { 830 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 831 832 /* 833 * Make sure the open is for the right file type. 834 */ 835 if (otyp != OTYP_CHR) 836 return (EINVAL); 837 838 /* 839 * Handle the open by tracking the device state. 840 */ 841 if ((bus_p->bus_soft_state == PCI_SOFT_STATE_OPEN_EXCL) || 842 ((flags & FEXCL) && 843 (bus_p->bus_soft_state != PCI_SOFT_STATE_CLOSED))) { 844 return (EBUSY); 845 } 846 847 if (flags & FEXCL) 848 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN_EXCL; 849 else 850 bus_p->bus_soft_state = PCI_SOFT_STATE_OPEN; 851 852 return (0); 853 } 854 855 /* ARGSUSED */ 856 int 857 pcie_close(dev_info_t *dip, dev_t dev, int flags, int otyp, cred_t *credp) 858 { 859 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 860 861 if (otyp != OTYP_CHR) 862 return (EINVAL); 863 864 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 865 866 return (0); 867 } 868 869 /* ARGSUSED */ 870 int 871 pcie_ioctl(dev_info_t *dip, dev_t dev, int cmd, intptr_t arg, int mode, 872 cred_t *credp, int *rvalp) 873 { 874 struct devctl_iocdata *dcp; 875 uint_t bus_state; 876 int rv = DDI_SUCCESS; 877 878 /* 879 * We can use the generic implementation for devctl ioctl 880 */ 881 switch (cmd) { 882 case DEVCTL_DEVICE_GETSTATE: 883 case DEVCTL_DEVICE_ONLINE: 884 case DEVCTL_DEVICE_OFFLINE: 885 case DEVCTL_BUS_GETSTATE: 886 return (ndi_devctl_ioctl(dip, cmd, arg, mode, 0)); 887 default: 888 break; 889 } 890 891 /* 892 * read devctl ioctl data 893 */ 894 if (ndi_dc_allochdl((void *)arg, &dcp) != NDI_SUCCESS) 895 return (EFAULT); 896 897 switch (cmd) { 898 case DEVCTL_BUS_QUIESCE: 899 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 900 if (bus_state == BUS_QUIESCED) 901 break; 902 (void) ndi_set_bus_state(dip, BUS_QUIESCED); 903 break; 904 case DEVCTL_BUS_UNQUIESCE: 905 if (ndi_get_bus_state(dip, &bus_state) == NDI_SUCCESS) 906 if (bus_state == BUS_ACTIVE) 907 break; 908 (void) ndi_set_bus_state(dip, BUS_ACTIVE); 909 break; 910 case DEVCTL_BUS_RESET: 911 case DEVCTL_BUS_RESETALL: 912 case DEVCTL_DEVICE_RESET: 913 rv = ENOTSUP; 914 break; 915 default: 916 rv = ENOTTY; 917 } 918 919 ndi_dc_freehdl(dcp); 920 return (rv); 921 } 922 923 /* ARGSUSED */ 924 int 925 pcie_prop_op(dev_t dev, dev_info_t *dip, ddi_prop_op_t prop_op, 926 int flags, char *name, caddr_t valuep, int *lengthp) 927 { 928 if (dev == DDI_DEV_T_ANY) 929 goto skip; 930 931 if (PCIE_IS_HOTPLUG_CAPABLE(dip) && 932 strcmp(name, "pci-occupant") == 0) { 933 int pci_dev = PCI_MINOR_NUM_TO_PCI_DEVNUM(getminor(dev)); 934 935 pcie_hp_create_occupant_props(dip, dev, pci_dev); 936 } 937 938 skip: 939 return (ddi_prop_op(dev, dip, prop_op, flags, name, valuep, lengthp)); 940 } 941 942 int 943 pcie_init_cfghdl(dev_info_t *cdip) 944 { 945 pcie_bus_t *bus_p; 946 ddi_acc_handle_t eh = NULL; 947 948 bus_p = PCIE_DIP2BUS(cdip); 949 if (bus_p == NULL) 950 return (DDI_FAILURE); 951 952 /* Create an config access special to error handling */ 953 if (pci_config_setup(cdip, &eh) != DDI_SUCCESS) { 954 cmn_err(CE_WARN, "Cannot setup config access" 955 " for BDF 0x%x\n", bus_p->bus_bdf); 956 return (DDI_FAILURE); 957 } 958 959 bus_p->bus_cfg_hdl = eh; 960 return (DDI_SUCCESS); 961 } 962 963 void 964 pcie_fini_cfghdl(dev_info_t *cdip) 965 { 966 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 967 968 pci_config_teardown(&bus_p->bus_cfg_hdl); 969 } 970 971 void 972 pcie_determine_serial(dev_info_t *dip) 973 { 974 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 975 ddi_acc_handle_t h; 976 uint16_t cap; 977 uchar_t serial[8]; 978 uint32_t low, high; 979 980 if (!PCIE_IS_PCIE(bus_p)) 981 return; 982 983 h = bus_p->bus_cfg_hdl; 984 985 if ((PCI_CAP_LOCATE(h, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_SER), &cap)) == 986 DDI_FAILURE) 987 return; 988 989 high = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_UPPER_DW); 990 low = PCI_XCAP_GET32(h, 0, cap, PCIE_SER_SID_LOWER_DW); 991 992 /* 993 * Here, we're trying to figure out if we had an invalid PCIe read. From 994 * looking at the contents of the value, it can be hard to tell the 995 * difference between a value that has all 1s correctly versus if we had 996 * an error. In this case, we only assume it's invalid if both register 997 * reads are invalid. We also only use 32-bit reads as we're not sure if 998 * all devices will support these as 64-bit reads, while we know that 999 * they'll support these as 32-bit reads. 1000 */ 1001 if (high == PCI_EINVAL32 && low == PCI_EINVAL32) 1002 return; 1003 1004 serial[0] = low & 0xff; 1005 serial[1] = (low >> 8) & 0xff; 1006 serial[2] = (low >> 16) & 0xff; 1007 serial[3] = (low >> 24) & 0xff; 1008 serial[4] = high & 0xff; 1009 serial[5] = (high >> 8) & 0xff; 1010 serial[6] = (high >> 16) & 0xff; 1011 serial[7] = (high >> 24) & 0xff; 1012 1013 (void) ndi_prop_update_byte_array(DDI_DEV_T_NONE, dip, "pcie-serial", 1014 serial, sizeof (serial)); 1015 } 1016 1017 static void 1018 pcie_determine_aspm(dev_info_t *dip) 1019 { 1020 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1021 uint32_t linkcap; 1022 uint16_t linkctl; 1023 1024 if (!PCIE_IS_PCIE(bus_p)) 1025 return; 1026 1027 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1028 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 1029 1030 switch (linkcap & PCIE_LINKCAP_ASPM_SUP_MASK) { 1031 case PCIE_LINKCAP_ASPM_SUP_L0S: 1032 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1033 "pcie-aspm-support", "l0s"); 1034 break; 1035 case PCIE_LINKCAP_ASPM_SUP_L1: 1036 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1037 "pcie-aspm-support", "l1"); 1038 break; 1039 case PCIE_LINKCAP_ASPM_SUP_L0S_L1: 1040 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1041 "pcie-aspm-support", "l0s,l1"); 1042 break; 1043 default: 1044 return; 1045 } 1046 1047 switch (linkctl & PCIE_LINKCTL_ASPM_CTL_MASK) { 1048 case PCIE_LINKCTL_ASPM_CTL_DIS: 1049 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1050 "pcie-aspm-state", "disabled"); 1051 break; 1052 case PCIE_LINKCTL_ASPM_CTL_L0S: 1053 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1054 "pcie-aspm-state", "l0s"); 1055 break; 1056 case PCIE_LINKCTL_ASPM_CTL_L1: 1057 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1058 "pcie-aspm-state", "l1"); 1059 break; 1060 case PCIE_LINKCTL_ASPM_CTL_L0S_L1: 1061 (void) ndi_prop_update_string(DDI_DEV_T_NONE, dip, 1062 "pcie-aspm-state", "l0s,l1"); 1063 break; 1064 } 1065 } 1066 1067 /* 1068 * PCI-Express child device initialization. Note, this only will be called on a 1069 * device or function if we actually attach a device driver to it. 1070 * 1071 * This function enables generic pci-express interrupts and error handling. 1072 * Note, tagging, the max packet size, and related are all set up before this 1073 * point and is performed in pcie_fabric_setup(). 1074 * 1075 * @param pdip root dip (root nexus's dip) 1076 * @param cdip child's dip (device's dip) 1077 * @return DDI_SUCCESS or DDI_FAILURE 1078 */ 1079 /* ARGSUSED */ 1080 int 1081 pcie_initchild(dev_info_t *cdip) 1082 { 1083 uint16_t tmp16, reg16; 1084 pcie_bus_t *bus_p; 1085 uint32_t devid, venid; 1086 1087 bus_p = PCIE_DIP2BUS(cdip); 1088 if (bus_p == NULL) { 1089 PCIE_DBG("%s: BUS not found.\n", 1090 ddi_driver_name(cdip)); 1091 1092 return (DDI_FAILURE); 1093 } 1094 1095 if (pcie_init_cfghdl(cdip) != DDI_SUCCESS) 1096 return (DDI_FAILURE); 1097 1098 /* 1099 * Update pcie_bus_t with real Vendor Id Device Id. 1100 * 1101 * For assigned devices in IOV environment, the OBP will return 1102 * faked device id/vendor id on configration read and for both 1103 * properties in root domain. translate_devid() function will 1104 * update the properties with real device-id/vendor-id on such 1105 * platforms, so that we can utilize the properties here to get 1106 * real device-id/vendor-id and overwrite the faked ids. 1107 * 1108 * For unassigned devices or devices in non-IOV environment, the 1109 * operation below won't make a difference. 1110 * 1111 * The IOV implementation only supports assignment of PCIE 1112 * endpoint devices. Devices under pci-pci bridges don't need 1113 * operation like this. 1114 */ 1115 devid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1116 "device-id", -1); 1117 venid = ddi_prop_get_int(DDI_DEV_T_ANY, cdip, DDI_PROP_DONTPASS, 1118 "vendor-id", -1); 1119 bus_p->bus_dev_ven_id = (devid << 16) | (venid & 0xffff); 1120 1121 /* Clear the device's status register */ 1122 reg16 = PCIE_GET(16, bus_p, PCI_CONF_STAT); 1123 PCIE_PUT(16, bus_p, PCI_CONF_STAT, reg16); 1124 1125 /* Setup the device's command register */ 1126 reg16 = PCIE_GET(16, bus_p, PCI_CONF_COMM); 1127 tmp16 = (reg16 & pcie_command_default_fw) | pcie_command_default; 1128 1129 #if defined(__x86) 1130 boolean_t empty_io_range = B_FALSE; 1131 boolean_t empty_mem_range = B_FALSE; 1132 /* 1133 * Check for empty IO and Mem ranges on bridges. If so disable IO/Mem 1134 * access as it can cause a hang if enabled. 1135 */ 1136 pcie_check_io_mem_range(bus_p->bus_cfg_hdl, &empty_io_range, 1137 &empty_mem_range); 1138 if ((empty_io_range == B_TRUE) && 1139 (pcie_command_default & PCI_COMM_IO)) { 1140 tmp16 &= ~PCI_COMM_IO; 1141 PCIE_DBG("No I/O range found for %s, bdf 0x%x\n", 1142 ddi_driver_name(cdip), bus_p->bus_bdf); 1143 } 1144 if ((empty_mem_range == B_TRUE) && 1145 (pcie_command_default & PCI_COMM_MAE)) { 1146 tmp16 &= ~PCI_COMM_MAE; 1147 PCIE_DBG("No Mem range found for %s, bdf 0x%x\n", 1148 ddi_driver_name(cdip), bus_p->bus_bdf); 1149 } 1150 #endif /* defined(__x86) */ 1151 1152 if (pcie_serr_disable_flag && PCIE_IS_PCIE(bus_p)) 1153 tmp16 &= ~PCI_COMM_SERR_ENABLE; 1154 1155 PCIE_PUT(16, bus_p, PCI_CONF_COMM, tmp16); 1156 PCIE_DBG_CFG(cdip, bus_p, "COMMAND", 16, PCI_CONF_COMM, reg16); 1157 1158 /* 1159 * If the device has a bus control register then program it 1160 * based on the settings in the command register. 1161 */ 1162 if (PCIE_IS_BDG(bus_p)) { 1163 /* Clear the device's secondary status register */ 1164 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_SEC_STATUS); 1165 PCIE_PUT(16, bus_p, PCI_BCNF_SEC_STATUS, reg16); 1166 1167 /* Setup the device's secondary command register */ 1168 reg16 = PCIE_GET(16, bus_p, PCI_BCNF_BCNTRL); 1169 tmp16 = (reg16 & pcie_bdg_command_default_fw); 1170 1171 tmp16 |= PCI_BCNF_BCNTRL_SERR_ENABLE; 1172 /* 1173 * Workaround for this Nvidia bridge. Don't enable the SERR 1174 * enable bit in the bridge control register as it could lead to 1175 * bogus NMIs. 1176 */ 1177 if (bus_p->bus_dev_ven_id == 0x037010DE) 1178 tmp16 &= ~PCI_BCNF_BCNTRL_SERR_ENABLE; 1179 1180 if (pcie_command_default & PCI_COMM_PARITY_DETECT) 1181 tmp16 |= PCI_BCNF_BCNTRL_PARITY_ENABLE; 1182 1183 /* 1184 * Enable Master Abort Mode only if URs have not been masked. 1185 * For PCI and PCIe-PCI bridges, enabling this bit causes a 1186 * Master Aborts/UR to be forwarded as a UR/TA or SERR. If this 1187 * bit is masked, posted requests are dropped and non-posted 1188 * requests are returned with -1. 1189 */ 1190 if (pcie_aer_uce_mask & PCIE_AER_UCE_UR) 1191 tmp16 &= ~PCI_BCNF_BCNTRL_MAST_AB_MODE; 1192 else 1193 tmp16 |= PCI_BCNF_BCNTRL_MAST_AB_MODE; 1194 PCIE_PUT(16, bus_p, PCI_BCNF_BCNTRL, tmp16); 1195 PCIE_DBG_CFG(cdip, bus_p, "SEC CMD", 16, PCI_BCNF_BCNTRL, 1196 reg16); 1197 } 1198 1199 if (PCIE_IS_PCIE(bus_p)) { 1200 /* Setup PCIe device control register */ 1201 reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 1202 /* note: MPS/MRRS are initialized in pcie_initchild_mps() */ 1203 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 1204 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 1205 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 1206 PCIE_DEVCTL_MAX_PAYLOAD_MASK)); 1207 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 1208 PCIE_DBG_CAP(cdip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 1209 1210 /* Enable PCIe errors */ 1211 pcie_enable_errors(cdip); 1212 1213 pcie_determine_serial(cdip); 1214 1215 pcie_determine_aspm(cdip); 1216 1217 pcie_capture_speeds(cdip); 1218 } 1219 1220 bus_p->bus_ari = B_FALSE; 1221 if ((pcie_ari_is_enabled(ddi_get_parent(cdip)) 1222 == PCIE_ARI_FORW_ENABLED) && (pcie_ari_device(cdip) 1223 == PCIE_ARI_DEVICE)) { 1224 bus_p->bus_ari = B_TRUE; 1225 } 1226 1227 return (DDI_SUCCESS); 1228 } 1229 1230 static void 1231 pcie_init_pfd(dev_info_t *dip) 1232 { 1233 pf_data_t *pfd_p = PCIE_ZALLOC(pf_data_t); 1234 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1235 1236 PCIE_DIP2PFD(dip) = pfd_p; 1237 1238 pfd_p->pe_bus_p = bus_p; 1239 pfd_p->pe_severity_flags = 0; 1240 pfd_p->pe_severity_mask = 0; 1241 pfd_p->pe_orig_severity_flags = 0; 1242 pfd_p->pe_lock = B_FALSE; 1243 pfd_p->pe_valid = B_FALSE; 1244 1245 /* Allocate the root fault struct for both RC and RP */ 1246 if (PCIE_IS_ROOT(bus_p)) { 1247 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1248 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1249 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1250 } 1251 1252 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1253 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1254 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1255 1256 if (PCIE_IS_BDG(bus_p)) 1257 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1258 1259 if (PCIE_IS_PCIE(bus_p)) { 1260 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1261 1262 if (PCIE_IS_RP(bus_p)) 1263 PCIE_RP_REG(pfd_p) = 1264 PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1265 1266 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1267 PCIE_ADV_REG(pfd_p)->pcie_ue_tgt_bdf = PCIE_INVALID_BDF; 1268 1269 if (PCIE_IS_RP(bus_p)) { 1270 PCIE_ADV_RP_REG(pfd_p) = 1271 PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1272 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = 1273 PCIE_INVALID_BDF; 1274 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = 1275 PCIE_INVALID_BDF; 1276 } else if (PCIE_IS_PCIE_BDG(bus_p)) { 1277 PCIE_ADV_BDG_REG(pfd_p) = 1278 PCIE_ZALLOC(pf_pcie_adv_bdg_err_regs_t); 1279 PCIE_ADV_BDG_REG(pfd_p)->pcie_sue_tgt_bdf = 1280 PCIE_INVALID_BDF; 1281 } 1282 1283 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1284 PCIX_BDG_ERR_REG(pfd_p) = 1285 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1286 1287 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1288 PCIX_BDG_ECC_REG(pfd_p, 0) = 1289 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1290 PCIX_BDG_ECC_REG(pfd_p, 1) = 1291 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1292 } 1293 } 1294 1295 PCIE_SLOT_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_slot_regs_t); 1296 PCIE_SLOT_REG(pfd_p)->pcie_slot_regs_valid = B_FALSE; 1297 PCIE_SLOT_REG(pfd_p)->pcie_slot_cap = 0; 1298 PCIE_SLOT_REG(pfd_p)->pcie_slot_control = 0; 1299 PCIE_SLOT_REG(pfd_p)->pcie_slot_status = 0; 1300 1301 } else if (PCIE_IS_PCIX(bus_p)) { 1302 if (PCIE_IS_BDG(bus_p)) { 1303 PCIX_BDG_ERR_REG(pfd_p) = 1304 PCIE_ZALLOC(pf_pcix_bdg_err_regs_t); 1305 1306 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1307 PCIX_BDG_ECC_REG(pfd_p, 0) = 1308 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1309 PCIX_BDG_ECC_REG(pfd_p, 1) = 1310 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1311 } 1312 } else { 1313 PCIX_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcix_err_regs_t); 1314 1315 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1316 PCIX_ECC_REG(pfd_p) = 1317 PCIE_ZALLOC(pf_pcix_ecc_regs_t); 1318 } 1319 } 1320 } 1321 1322 static void 1323 pcie_fini_pfd(dev_info_t *dip) 1324 { 1325 pf_data_t *pfd_p = PCIE_DIP2PFD(dip); 1326 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1327 1328 if (PCIE_IS_PCIE(bus_p)) { 1329 if (PCIE_IS_PCIE_BDG(bus_p) && PCIE_IS_PCIX(bus_p)) { 1330 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1331 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1332 sizeof (pf_pcix_ecc_regs_t)); 1333 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1334 sizeof (pf_pcix_ecc_regs_t)); 1335 } 1336 1337 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1338 sizeof (pf_pcix_bdg_err_regs_t)); 1339 } 1340 1341 if (PCIE_IS_RP(bus_p)) 1342 kmem_free(PCIE_ADV_RP_REG(pfd_p), 1343 sizeof (pf_pcie_adv_rp_err_regs_t)); 1344 else if (PCIE_IS_PCIE_BDG(bus_p)) 1345 kmem_free(PCIE_ADV_BDG_REG(pfd_p), 1346 sizeof (pf_pcie_adv_bdg_err_regs_t)); 1347 1348 kmem_free(PCIE_ADV_REG(pfd_p), 1349 sizeof (pf_pcie_adv_err_regs_t)); 1350 1351 if (PCIE_IS_RP(bus_p)) 1352 kmem_free(PCIE_RP_REG(pfd_p), 1353 sizeof (pf_pcie_rp_err_regs_t)); 1354 1355 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1356 } else if (PCIE_IS_PCIX(bus_p)) { 1357 if (PCIE_IS_BDG(bus_p)) { 1358 if (PCIX_ECC_VERSION_CHECK(bus_p)) { 1359 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 0), 1360 sizeof (pf_pcix_ecc_regs_t)); 1361 kmem_free(PCIX_BDG_ECC_REG(pfd_p, 1), 1362 sizeof (pf_pcix_ecc_regs_t)); 1363 } 1364 1365 kmem_free(PCIX_BDG_ERR_REG(pfd_p), 1366 sizeof (pf_pcix_bdg_err_regs_t)); 1367 } else { 1368 if (PCIX_ECC_VERSION_CHECK(bus_p)) 1369 kmem_free(PCIX_ECC_REG(pfd_p), 1370 sizeof (pf_pcix_ecc_regs_t)); 1371 1372 kmem_free(PCIX_ERR_REG(pfd_p), 1373 sizeof (pf_pcix_err_regs_t)); 1374 } 1375 } 1376 1377 if (PCIE_IS_BDG(bus_p)) 1378 kmem_free(PCI_BDG_ERR_REG(pfd_p), 1379 sizeof (pf_pci_bdg_err_regs_t)); 1380 1381 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1382 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1383 1384 if (PCIE_IS_ROOT(bus_p)) { 1385 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1386 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1387 } 1388 1389 kmem_free(PCIE_DIP2PFD(dip), sizeof (pf_data_t)); 1390 1391 PCIE_DIP2PFD(dip) = NULL; 1392 } 1393 1394 1395 /* 1396 * Special functions to allocate pf_data_t's for PCIe root complexes. 1397 * Note: Root Complex not Root Port 1398 */ 1399 void 1400 pcie_rc_init_pfd(dev_info_t *dip, pf_data_t *pfd_p) 1401 { 1402 pfd_p->pe_bus_p = PCIE_DIP2DOWNBUS(dip); 1403 pfd_p->pe_severity_flags = 0; 1404 pfd_p->pe_severity_mask = 0; 1405 pfd_p->pe_orig_severity_flags = 0; 1406 pfd_p->pe_lock = B_FALSE; 1407 pfd_p->pe_valid = B_FALSE; 1408 1409 PCIE_ROOT_FAULT(pfd_p) = PCIE_ZALLOC(pf_root_fault_t); 1410 PCIE_ROOT_FAULT(pfd_p)->scan_bdf = PCIE_INVALID_BDF; 1411 PCIE_ROOT_EH_SRC(pfd_p) = PCIE_ZALLOC(pf_root_eh_src_t); 1412 PCI_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_err_regs_t); 1413 PFD_AFFECTED_DEV(pfd_p) = PCIE_ZALLOC(pf_affected_dev_t); 1414 PFD_AFFECTED_DEV(pfd_p)->pe_affected_bdf = PCIE_INVALID_BDF; 1415 PCI_BDG_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pci_bdg_err_regs_t); 1416 PCIE_ERR_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_err_regs_t); 1417 PCIE_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_rp_err_regs_t); 1418 PCIE_ADV_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_err_regs_t); 1419 PCIE_ADV_RP_REG(pfd_p) = PCIE_ZALLOC(pf_pcie_adv_rp_err_regs_t); 1420 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ce_src_id = PCIE_INVALID_BDF; 1421 PCIE_ADV_RP_REG(pfd_p)->pcie_rp_ue_src_id = PCIE_INVALID_BDF; 1422 1423 PCIE_ADV_REG(pfd_p)->pcie_ue_sev = pcie_aer_uce_severity; 1424 } 1425 1426 void 1427 pcie_rc_fini_pfd(pf_data_t *pfd_p) 1428 { 1429 kmem_free(PCIE_ADV_RP_REG(pfd_p), sizeof (pf_pcie_adv_rp_err_regs_t)); 1430 kmem_free(PCIE_ADV_REG(pfd_p), sizeof (pf_pcie_adv_err_regs_t)); 1431 kmem_free(PCIE_RP_REG(pfd_p), sizeof (pf_pcie_rp_err_regs_t)); 1432 kmem_free(PCIE_ERR_REG(pfd_p), sizeof (pf_pcie_err_regs_t)); 1433 kmem_free(PCI_BDG_ERR_REG(pfd_p), sizeof (pf_pci_bdg_err_regs_t)); 1434 kmem_free(PFD_AFFECTED_DEV(pfd_p), sizeof (pf_affected_dev_t)); 1435 kmem_free(PCI_ERR_REG(pfd_p), sizeof (pf_pci_err_regs_t)); 1436 kmem_free(PCIE_ROOT_FAULT(pfd_p), sizeof (pf_root_fault_t)); 1437 kmem_free(PCIE_ROOT_EH_SRC(pfd_p), sizeof (pf_root_eh_src_t)); 1438 } 1439 1440 /* 1441 * init pcie_bus_t for root complex 1442 * 1443 * Only a few of the fields in bus_t is valid for root complex. 1444 * The fields that are bracketed are initialized in this routine: 1445 * 1446 * dev_info_t * <bus_dip> 1447 * dev_info_t * bus_rp_dip 1448 * ddi_acc_handle_t bus_cfg_hdl 1449 * uint_t <bus_fm_flags> 1450 * pcie_req_id_t bus_bdf 1451 * pcie_req_id_t bus_rp_bdf 1452 * uint32_t bus_dev_ven_id 1453 * uint8_t bus_rev_id 1454 * uint8_t <bus_hdr_type> 1455 * uint16_t <bus_dev_type> 1456 * uint8_t bus_bdg_secbus 1457 * uint16_t bus_pcie_off 1458 * uint16_t <bus_aer_off> 1459 * uint16_t bus_pcix_off 1460 * uint16_t bus_ecc_ver 1461 * pci_bus_range_t bus_bus_range 1462 * ppb_ranges_t * bus_addr_ranges 1463 * int bus_addr_entries 1464 * pci_regspec_t * bus_assigned_addr 1465 * int bus_assigned_entries 1466 * pf_data_t * bus_pfd 1467 * pcie_domain_t * <bus_dom> 1468 * int bus_mps 1469 * uint64_t bus_cfgacc_base 1470 * void * bus_plat_private 1471 */ 1472 void 1473 pcie_rc_init_bus(dev_info_t *dip) 1474 { 1475 pcie_bus_t *bus_p; 1476 1477 bus_p = (pcie_bus_t *)kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1478 bus_p->bus_dip = dip; 1479 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_RC_PSEUDO; 1480 bus_p->bus_hdr_type = PCI_HEADER_ONE; 1481 1482 /* Fake that there are AER logs */ 1483 bus_p->bus_aer_off = (uint16_t)-1; 1484 1485 /* Needed only for handle lookup */ 1486 atomic_or_uint(&bus_p->bus_fm_flags, PF_FM_READY); 1487 1488 ndi_set_bus_private(dip, B_FALSE, DEVI_PORT_TYPE_PCI, bus_p); 1489 1490 PCIE_BUS2DOM(bus_p) = PCIE_ZALLOC(pcie_domain_t); 1491 } 1492 1493 void 1494 pcie_rc_fini_bus(dev_info_t *dip) 1495 { 1496 pcie_bus_t *bus_p = PCIE_DIP2DOWNBUS(dip); 1497 ndi_set_bus_private(dip, B_FALSE, 0, NULL); 1498 kmem_free(PCIE_BUS2DOM(bus_p), sizeof (pcie_domain_t)); 1499 kmem_free(bus_p, sizeof (pcie_bus_t)); 1500 } 1501 1502 static int 1503 pcie_width_to_int(pcie_link_width_t width) 1504 { 1505 switch (width) { 1506 case PCIE_LINK_WIDTH_X1: 1507 return (1); 1508 case PCIE_LINK_WIDTH_X2: 1509 return (2); 1510 case PCIE_LINK_WIDTH_X4: 1511 return (4); 1512 case PCIE_LINK_WIDTH_X8: 1513 return (8); 1514 case PCIE_LINK_WIDTH_X12: 1515 return (12); 1516 case PCIE_LINK_WIDTH_X16: 1517 return (16); 1518 case PCIE_LINK_WIDTH_X32: 1519 return (32); 1520 default: 1521 return (0); 1522 } 1523 } 1524 1525 /* 1526 * Return the speed in Transfers / second. This is a signed quantity to match 1527 * the ndi/ddi property interfaces. 1528 */ 1529 static int64_t 1530 pcie_speed_to_int(pcie_link_speed_t speed) 1531 { 1532 switch (speed) { 1533 case PCIE_LINK_SPEED_2_5: 1534 return (2500000000LL); 1535 case PCIE_LINK_SPEED_5: 1536 return (5000000000LL); 1537 case PCIE_LINK_SPEED_8: 1538 return (8000000000LL); 1539 case PCIE_LINK_SPEED_16: 1540 return (16000000000LL); 1541 default: 1542 return (0); 1543 } 1544 } 1545 1546 /* 1547 * Translate the recorded speed information into devinfo properties. 1548 */ 1549 static void 1550 pcie_speeds_to_devinfo(dev_info_t *dip, pcie_bus_t *bus_p) 1551 { 1552 if (bus_p->bus_max_width != PCIE_LINK_WIDTH_UNKNOWN) { 1553 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1554 "pcie-link-maximum-width", 1555 pcie_width_to_int(bus_p->bus_max_width)); 1556 } 1557 1558 if (bus_p->bus_cur_width != PCIE_LINK_WIDTH_UNKNOWN) { 1559 (void) ndi_prop_update_int(DDI_DEV_T_NONE, dip, 1560 "pcie-link-current-width", 1561 pcie_width_to_int(bus_p->bus_cur_width)); 1562 } 1563 1564 if (bus_p->bus_cur_speed != PCIE_LINK_SPEED_UNKNOWN) { 1565 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1566 "pcie-link-current-speed", 1567 pcie_speed_to_int(bus_p->bus_cur_speed)); 1568 } 1569 1570 if (bus_p->bus_max_speed != PCIE_LINK_SPEED_UNKNOWN) { 1571 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1572 "pcie-link-maximum-speed", 1573 pcie_speed_to_int(bus_p->bus_max_speed)); 1574 } 1575 1576 if (bus_p->bus_target_speed != PCIE_LINK_SPEED_UNKNOWN) { 1577 (void) ndi_prop_update_int64(DDI_DEV_T_NONE, dip, 1578 "pcie-link-target-speed", 1579 pcie_speed_to_int(bus_p->bus_target_speed)); 1580 } 1581 1582 if ((bus_p->bus_speed_flags & PCIE_LINK_F_ADMIN_TARGET) != 0) { 1583 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 1584 "pcie-link-admin-target-speed"); 1585 } 1586 1587 if (bus_p->bus_sup_speed != PCIE_LINK_SPEED_UNKNOWN) { 1588 int64_t speeds[4]; 1589 uint_t nspeeds = 0; 1590 1591 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_2_5) { 1592 speeds[nspeeds++] = 1593 pcie_speed_to_int(PCIE_LINK_SPEED_2_5); 1594 } 1595 1596 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_5) { 1597 speeds[nspeeds++] = 1598 pcie_speed_to_int(PCIE_LINK_SPEED_5); 1599 } 1600 1601 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_8) { 1602 speeds[nspeeds++] = 1603 pcie_speed_to_int(PCIE_LINK_SPEED_8); 1604 } 1605 1606 if (bus_p->bus_sup_speed & PCIE_LINK_SPEED_16) { 1607 speeds[nspeeds++] = 1608 pcie_speed_to_int(PCIE_LINK_SPEED_16); 1609 } 1610 1611 (void) ndi_prop_update_int64_array(DDI_DEV_T_NONE, dip, 1612 "pcie-link-supported-speeds", speeds, nspeeds); 1613 } 1614 } 1615 1616 /* 1617 * We need to capture the supported, maximum, and current device speed and 1618 * width. The way that this has been done has changed over time. 1619 * 1620 * Prior to PCIe Gen 3, there were only current and supported speed fields. 1621 * These were found in the link status and link capabilities registers of the 1622 * PCI express capability. With the change to PCIe Gen 3, the information in the 1623 * link capabilities changed to the maximum value. The supported speeds vector 1624 * was moved to the link capabilities 2 register. 1625 * 1626 * Now, a device may not implement some of these registers. To determine whether 1627 * or not it's here, we have to do the following. First, we need to check the 1628 * revision of the PCI express capability. The link capabilities 2 register did 1629 * not exist prior to version 2 of this capability. If a modern device does not 1630 * implement it, it is supposed to return zero for the register. 1631 */ 1632 static void 1633 pcie_capture_speeds(dev_info_t *dip) 1634 { 1635 uint16_t vers, status; 1636 uint32_t cap, cap2, ctl2; 1637 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 1638 dev_info_t *rcdip; 1639 1640 if (!PCIE_IS_PCIE(bus_p)) 1641 return; 1642 1643 rcdip = pcie_get_rc_dip(dip); 1644 if (bus_p->bus_cfg_hdl == NULL) { 1645 vers = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1646 bus_p->bus_pcie_off + PCIE_PCIECAP); 1647 } else { 1648 vers = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 1649 } 1650 if (vers == PCI_EINVAL16) 1651 return; 1652 vers &= PCIE_PCIECAP_VER_MASK; 1653 1654 /* 1655 * Verify the capability's version. 1656 */ 1657 switch (vers) { 1658 case PCIE_PCIECAP_VER_1_0: 1659 cap2 = 0; 1660 ctl2 = 0; 1661 break; 1662 case PCIE_PCIECAP_VER_2_0: 1663 if (bus_p->bus_cfg_hdl == NULL) { 1664 cap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1665 bus_p->bus_pcie_off + PCIE_LINKCAP2); 1666 ctl2 = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1667 bus_p->bus_pcie_off + PCIE_LINKCTL2); 1668 } else { 1669 cap2 = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP2); 1670 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 1671 } 1672 if (cap2 == PCI_EINVAL32) 1673 cap2 = 0; 1674 if (ctl2 == PCI_EINVAL16) 1675 ctl2 = 0; 1676 break; 1677 default: 1678 /* Don't try and handle an unknown version */ 1679 return; 1680 } 1681 1682 if (bus_p->bus_cfg_hdl == NULL) { 1683 status = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, 1684 bus_p->bus_pcie_off + PCIE_LINKSTS); 1685 cap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 1686 bus_p->bus_pcie_off + PCIE_LINKCAP); 1687 } else { 1688 status = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 1689 cap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 1690 } 1691 if (status == PCI_EINVAL16 || cap == PCI_EINVAL32) 1692 return; 1693 1694 mutex_enter(&bus_p->bus_speed_mutex); 1695 1696 switch (status & PCIE_LINKSTS_SPEED_MASK) { 1697 case PCIE_LINKSTS_SPEED_2_5: 1698 bus_p->bus_cur_speed = PCIE_LINK_SPEED_2_5; 1699 break; 1700 case PCIE_LINKSTS_SPEED_5: 1701 bus_p->bus_cur_speed = PCIE_LINK_SPEED_5; 1702 break; 1703 case PCIE_LINKSTS_SPEED_8: 1704 bus_p->bus_cur_speed = PCIE_LINK_SPEED_8; 1705 break; 1706 case PCIE_LINKSTS_SPEED_16: 1707 bus_p->bus_cur_speed = PCIE_LINK_SPEED_16; 1708 break; 1709 default: 1710 bus_p->bus_cur_speed = PCIE_LINK_SPEED_UNKNOWN; 1711 break; 1712 } 1713 1714 switch (status & PCIE_LINKSTS_NEG_WIDTH_MASK) { 1715 case PCIE_LINKSTS_NEG_WIDTH_X1: 1716 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X1; 1717 break; 1718 case PCIE_LINKSTS_NEG_WIDTH_X2: 1719 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X2; 1720 break; 1721 case PCIE_LINKSTS_NEG_WIDTH_X4: 1722 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X4; 1723 break; 1724 case PCIE_LINKSTS_NEG_WIDTH_X8: 1725 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X8; 1726 break; 1727 case PCIE_LINKSTS_NEG_WIDTH_X12: 1728 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X12; 1729 break; 1730 case PCIE_LINKSTS_NEG_WIDTH_X16: 1731 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X16; 1732 break; 1733 case PCIE_LINKSTS_NEG_WIDTH_X32: 1734 bus_p->bus_cur_width = PCIE_LINK_WIDTH_X32; 1735 break; 1736 default: 1737 bus_p->bus_cur_width = PCIE_LINK_WIDTH_UNKNOWN; 1738 break; 1739 } 1740 1741 switch (cap & PCIE_LINKCAP_MAX_WIDTH_MASK) { 1742 case PCIE_LINKCAP_MAX_WIDTH_X1: 1743 bus_p->bus_max_width = PCIE_LINK_WIDTH_X1; 1744 break; 1745 case PCIE_LINKCAP_MAX_WIDTH_X2: 1746 bus_p->bus_max_width = PCIE_LINK_WIDTH_X2; 1747 break; 1748 case PCIE_LINKCAP_MAX_WIDTH_X4: 1749 bus_p->bus_max_width = PCIE_LINK_WIDTH_X4; 1750 break; 1751 case PCIE_LINKCAP_MAX_WIDTH_X8: 1752 bus_p->bus_max_width = PCIE_LINK_WIDTH_X8; 1753 break; 1754 case PCIE_LINKCAP_MAX_WIDTH_X12: 1755 bus_p->bus_max_width = PCIE_LINK_WIDTH_X12; 1756 break; 1757 case PCIE_LINKCAP_MAX_WIDTH_X16: 1758 bus_p->bus_max_width = PCIE_LINK_WIDTH_X16; 1759 break; 1760 case PCIE_LINKCAP_MAX_WIDTH_X32: 1761 bus_p->bus_max_width = PCIE_LINK_WIDTH_X32; 1762 break; 1763 default: 1764 bus_p->bus_max_width = PCIE_LINK_WIDTH_UNKNOWN; 1765 break; 1766 } 1767 1768 /* 1769 * If we have the Link Capabilities 2, then we can get the supported 1770 * speeds from it and treat the bits in Link Capabilities 1 as the 1771 * maximum. If we don't, then we need to follow the Implementation Note 1772 * in the standard under Link Capabilities 2. Effectively, this means 1773 * that if the value of 10b is set in Link Capabilities register, that 1774 * it supports both 2.5 and 5 GT/s speeds. 1775 */ 1776 if (cap2 != 0) { 1777 if (cap2 & PCIE_LINKCAP2_SPEED_2_5) 1778 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_2_5; 1779 if (cap2 & PCIE_LINKCAP2_SPEED_5) 1780 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_5; 1781 if (cap2 & PCIE_LINKCAP2_SPEED_8) 1782 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_8; 1783 if (cap2 & PCIE_LINKCAP2_SPEED_16) 1784 bus_p->bus_sup_speed |= PCIE_LINK_SPEED_16; 1785 1786 switch (cap & PCIE_LINKCAP_MAX_SPEED_MASK) { 1787 case PCIE_LINKCAP_MAX_SPEED_2_5: 1788 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1789 break; 1790 case PCIE_LINKCAP_MAX_SPEED_5: 1791 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1792 break; 1793 case PCIE_LINKCAP_MAX_SPEED_8: 1794 bus_p->bus_max_speed = PCIE_LINK_SPEED_8; 1795 break; 1796 case PCIE_LINKCAP_MAX_SPEED_16: 1797 bus_p->bus_max_speed = PCIE_LINK_SPEED_16; 1798 break; 1799 default: 1800 bus_p->bus_max_speed = PCIE_LINK_SPEED_UNKNOWN; 1801 break; 1802 } 1803 } else { 1804 if (cap & PCIE_LINKCAP_MAX_SPEED_5) { 1805 bus_p->bus_max_speed = PCIE_LINK_SPEED_5; 1806 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5 | 1807 PCIE_LINK_SPEED_5; 1808 } else if (cap & PCIE_LINKCAP_MAX_SPEED_2_5) { 1809 bus_p->bus_max_speed = PCIE_LINK_SPEED_2_5; 1810 bus_p->bus_sup_speed = PCIE_LINK_SPEED_2_5; 1811 } 1812 } 1813 1814 switch (ctl2 & PCIE_LINKCTL2_TARGET_SPEED_MASK) { 1815 case PCIE_LINKCTL2_TARGET_SPEED_2_5: 1816 bus_p->bus_target_speed = PCIE_LINK_SPEED_2_5; 1817 break; 1818 case PCIE_LINKCTL2_TARGET_SPEED_5: 1819 bus_p->bus_target_speed = PCIE_LINK_SPEED_5; 1820 break; 1821 case PCIE_LINKCTL2_TARGET_SPEED_8: 1822 bus_p->bus_target_speed = PCIE_LINK_SPEED_8; 1823 break; 1824 case PCIE_LINKCTL2_TARGET_SPEED_16: 1825 bus_p->bus_target_speed = PCIE_LINK_SPEED_16; 1826 break; 1827 default: 1828 bus_p->bus_target_speed = PCIE_LINK_SPEED_UNKNOWN; 1829 break; 1830 } 1831 1832 pcie_speeds_to_devinfo(dip, bus_p); 1833 mutex_exit(&bus_p->bus_speed_mutex); 1834 } 1835 1836 /* 1837 * partially init pcie_bus_t for device (dip,bdf) for accessing pci 1838 * config space 1839 * 1840 * This routine is invoked during boot, either after creating a devinfo node 1841 * (x86 case) or during px driver attach (sparc case); it is also invoked 1842 * in hotplug context after a devinfo node is created. 1843 * 1844 * The fields that are bracketed are initialized if flag PCIE_BUS_INITIAL 1845 * is set: 1846 * 1847 * dev_info_t * <bus_dip> 1848 * dev_info_t * <bus_rp_dip> 1849 * ddi_acc_handle_t bus_cfg_hdl 1850 * uint_t bus_fm_flags 1851 * pcie_req_id_t <bus_bdf> 1852 * pcie_req_id_t <bus_rp_bdf> 1853 * uint32_t <bus_dev_ven_id> 1854 * uint8_t <bus_rev_id> 1855 * uint8_t <bus_hdr_type> 1856 * uint16_t <bus_dev_type> 1857 * uint8_t <bus_bdg_secbus 1858 * uint16_t <bus_pcie_off> 1859 * uint16_t <bus_aer_off> 1860 * uint16_t <bus_pcix_off> 1861 * uint16_t <bus_ecc_ver> 1862 * pci_bus_range_t bus_bus_range 1863 * ppb_ranges_t * bus_addr_ranges 1864 * int bus_addr_entries 1865 * pci_regspec_t * bus_assigned_addr 1866 * int bus_assigned_entries 1867 * pf_data_t * bus_pfd 1868 * pcie_domain_t * bus_dom 1869 * int bus_mps 1870 * uint64_t bus_cfgacc_base 1871 * void * bus_plat_private 1872 * 1873 * The fields that are bracketed are initialized if flag PCIE_BUS_FINAL 1874 * is set: 1875 * 1876 * dev_info_t * bus_dip 1877 * dev_info_t * bus_rp_dip 1878 * ddi_acc_handle_t bus_cfg_hdl 1879 * uint_t bus_fm_flags 1880 * pcie_req_id_t bus_bdf 1881 * pcie_req_id_t bus_rp_bdf 1882 * uint32_t bus_dev_ven_id 1883 * uint8_t bus_rev_id 1884 * uint8_t bus_hdr_type 1885 * uint16_t bus_dev_type 1886 * uint8_t <bus_bdg_secbus> 1887 * uint16_t bus_pcie_off 1888 * uint16_t bus_aer_off 1889 * uint16_t bus_pcix_off 1890 * uint16_t bus_ecc_ver 1891 * pci_bus_range_t <bus_bus_range> 1892 * ppb_ranges_t * <bus_addr_ranges> 1893 * int <bus_addr_entries> 1894 * pci_regspec_t * <bus_assigned_addr> 1895 * int <bus_assigned_entries> 1896 * pf_data_t * <bus_pfd> 1897 * pcie_domain_t * bus_dom 1898 * int bus_mps 1899 * uint64_t bus_cfgacc_base 1900 * void * <bus_plat_private> 1901 */ 1902 1903 pcie_bus_t * 1904 pcie_init_bus(dev_info_t *dip, pcie_req_id_t bdf, uint8_t flags) 1905 { 1906 uint16_t status, base, baseptr, num_cap; 1907 uint32_t capid; 1908 int range_size; 1909 pcie_bus_t *bus_p = NULL; 1910 dev_info_t *rcdip; 1911 dev_info_t *pdip; 1912 const char *errstr = NULL; 1913 1914 if (!(flags & PCIE_BUS_INITIAL)) 1915 goto initial_done; 1916 1917 bus_p = kmem_zalloc(sizeof (pcie_bus_t), KM_SLEEP); 1918 1919 bus_p->bus_dip = dip; 1920 bus_p->bus_bdf = bdf; 1921 1922 rcdip = pcie_get_rc_dip(dip); 1923 ASSERT(rcdip != NULL); 1924 1925 /* Save the Vendor ID, Device ID and revision ID */ 1926 bus_p->bus_dev_ven_id = pci_cfgacc_get32(rcdip, bdf, PCI_CONF_VENID); 1927 bus_p->bus_rev_id = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_REVID); 1928 /* Save the Header Type */ 1929 bus_p->bus_hdr_type = pci_cfgacc_get8(rcdip, bdf, PCI_CONF_HEADER); 1930 bus_p->bus_hdr_type &= PCI_HEADER_TYPE_M; 1931 1932 /* 1933 * Figure out the device type and all the relavant capability offsets 1934 */ 1935 /* set default value */ 1936 bus_p->bus_dev_type = PCIE_PCIECAP_DEV_TYPE_PCI_PSEUDO; 1937 1938 status = pci_cfgacc_get16(rcdip, bdf, PCI_CONF_STAT); 1939 if (status == PCI_CAP_EINVAL16 || !(status & PCI_STAT_CAP)) 1940 goto caps_done; /* capability not supported */ 1941 1942 /* Relevant conventional capabilities first */ 1943 1944 /* Conventional caps: PCI_CAP_ID_PCI_E, PCI_CAP_ID_PCIX */ 1945 num_cap = 2; 1946 1947 switch (bus_p->bus_hdr_type) { 1948 case PCI_HEADER_ZERO: 1949 baseptr = PCI_CONF_CAP_PTR; 1950 break; 1951 case PCI_HEADER_PPB: 1952 baseptr = PCI_BCNF_CAP_PTR; 1953 break; 1954 case PCI_HEADER_CARDBUS: 1955 baseptr = PCI_CBUS_CAP_PTR; 1956 break; 1957 default: 1958 cmn_err(CE_WARN, "%s: unexpected pci header type:%x", 1959 __func__, bus_p->bus_hdr_type); 1960 goto caps_done; 1961 } 1962 1963 base = baseptr; 1964 for (base = pci_cfgacc_get8(rcdip, bdf, base); base && num_cap; 1965 base = pci_cfgacc_get8(rcdip, bdf, base + PCI_CAP_NEXT_PTR)) { 1966 capid = pci_cfgacc_get8(rcdip, bdf, base); 1967 uint16_t pcap; 1968 1969 switch (capid) { 1970 case PCI_CAP_ID_PCI_E: 1971 bus_p->bus_pcie_off = base; 1972 pcap = pci_cfgacc_get16(rcdip, bdf, base + 1973 PCIE_PCIECAP); 1974 bus_p->bus_dev_type = pcap & PCIE_PCIECAP_DEV_TYPE_MASK; 1975 bus_p->bus_pcie_vers = pcap & PCIE_PCIECAP_VER_MASK; 1976 1977 /* Check and save PCIe hotplug capability information */ 1978 if ((PCIE_IS_RP(bus_p) || PCIE_IS_SWD(bus_p)) && 1979 (pci_cfgacc_get16(rcdip, bdf, base + PCIE_PCIECAP) 1980 & PCIE_PCIECAP_SLOT_IMPL) && 1981 (pci_cfgacc_get32(rcdip, bdf, base + PCIE_SLOTCAP) 1982 & PCIE_SLOTCAP_HP_CAPABLE)) 1983 bus_p->bus_hp_sup_modes |= PCIE_NATIVE_HP_MODE; 1984 1985 num_cap--; 1986 break; 1987 case PCI_CAP_ID_PCIX: 1988 bus_p->bus_pcix_off = base; 1989 if (PCIE_IS_BDG(bus_p)) 1990 bus_p->bus_ecc_ver = 1991 pci_cfgacc_get16(rcdip, bdf, base + 1992 PCI_PCIX_SEC_STATUS) & PCI_PCIX_VER_MASK; 1993 else 1994 bus_p->bus_ecc_ver = 1995 pci_cfgacc_get16(rcdip, bdf, base + 1996 PCI_PCIX_COMMAND) & PCI_PCIX_VER_MASK; 1997 num_cap--; 1998 break; 1999 default: 2000 break; 2001 } 2002 } 2003 2004 /* Check and save PCI hotplug (SHPC) capability information */ 2005 if (PCIE_IS_BDG(bus_p)) { 2006 base = baseptr; 2007 for (base = pci_cfgacc_get8(rcdip, bdf, base); 2008 base; base = pci_cfgacc_get8(rcdip, bdf, 2009 base + PCI_CAP_NEXT_PTR)) { 2010 capid = pci_cfgacc_get8(rcdip, bdf, base); 2011 if (capid == PCI_CAP_ID_PCI_HOTPLUG) { 2012 bus_p->bus_pci_hp_off = base; 2013 bus_p->bus_hp_sup_modes |= PCIE_PCI_HP_MODE; 2014 break; 2015 } 2016 } 2017 } 2018 2019 /* Then, relevant extended capabilities */ 2020 2021 if (!PCIE_IS_PCIE(bus_p)) 2022 goto caps_done; 2023 2024 /* Extended caps: PCIE_EXT_CAP_ID_AER */ 2025 for (base = PCIE_EXT_CAP; base; base = (capid >> 2026 PCIE_EXT_CAP_NEXT_PTR_SHIFT) & PCIE_EXT_CAP_NEXT_PTR_MASK) { 2027 capid = pci_cfgacc_get32(rcdip, bdf, base); 2028 if (capid == PCI_CAP_EINVAL32) 2029 break; 2030 switch ((capid >> PCIE_EXT_CAP_ID_SHIFT) & 2031 PCIE_EXT_CAP_ID_MASK) { 2032 case PCIE_EXT_CAP_ID_AER: 2033 bus_p->bus_aer_off = base; 2034 break; 2035 case PCIE_EXT_CAP_ID_DEV3: 2036 bus_p->bus_dev3_off = base; 2037 break; 2038 } 2039 } 2040 2041 caps_done: 2042 /* save RP dip and RP bdf */ 2043 if (PCIE_IS_RP(bus_p)) { 2044 bus_p->bus_rp_dip = dip; 2045 bus_p->bus_rp_bdf = bus_p->bus_bdf; 2046 2047 bus_p->bus_fab = PCIE_ZALLOC(pcie_fabric_data_t); 2048 } else { 2049 for (pdip = ddi_get_parent(dip); pdip; 2050 pdip = ddi_get_parent(pdip)) { 2051 pcie_bus_t *parent_bus_p = PCIE_DIP2BUS(pdip); 2052 2053 /* 2054 * If RP dip and RP bdf in parent's bus_t have 2055 * been initialized, simply use these instead of 2056 * continuing up to the RC. 2057 */ 2058 if (parent_bus_p->bus_rp_dip != NULL) { 2059 bus_p->bus_rp_dip = parent_bus_p->bus_rp_dip; 2060 bus_p->bus_rp_bdf = parent_bus_p->bus_rp_bdf; 2061 break; 2062 } 2063 2064 /* 2065 * When debugging be aware that some NVIDIA x86 2066 * architectures have 2 nodes for each RP, One at Bus 2067 * 0x0 and one at Bus 0x80. The requester is from Bus 2068 * 0x80 2069 */ 2070 if (PCIE_IS_ROOT(parent_bus_p)) { 2071 bus_p->bus_rp_dip = pdip; 2072 bus_p->bus_rp_bdf = parent_bus_p->bus_bdf; 2073 break; 2074 } 2075 } 2076 } 2077 2078 bus_p->bus_soft_state = PCI_SOFT_STATE_CLOSED; 2079 (void) atomic_swap_uint(&bus_p->bus_fm_flags, 0); 2080 2081 ndi_set_bus_private(dip, B_TRUE, DEVI_PORT_TYPE_PCI, (void *)bus_p); 2082 2083 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) 2084 (void) ndi_prop_create_boolean(DDI_DEV_T_NONE, dip, 2085 "hotplug-capable"); 2086 2087 initial_done: 2088 if (!(flags & PCIE_BUS_FINAL)) 2089 goto final_done; 2090 2091 /* already initialized? */ 2092 bus_p = PCIE_DIP2BUS(dip); 2093 2094 /* Save the Range information if device is a switch/bridge */ 2095 if (PCIE_IS_BDG(bus_p)) { 2096 /* get "bus_range" property */ 2097 range_size = sizeof (pci_bus_range_t); 2098 if (ddi_getlongprop_buf(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2099 "bus-range", (caddr_t)&bus_p->bus_bus_range, &range_size) 2100 != DDI_PROP_SUCCESS) { 2101 errstr = "Cannot find \"bus-range\" property"; 2102 cmn_err(CE_WARN, 2103 "PCIE init err info failed BDF 0x%x:%s\n", 2104 bus_p->bus_bdf, errstr); 2105 } 2106 2107 /* get secondary bus number */ 2108 rcdip = pcie_get_rc_dip(dip); 2109 ASSERT(rcdip != NULL); 2110 2111 bus_p->bus_bdg_secbus = pci_cfgacc_get8(rcdip, 2112 bus_p->bus_bdf, PCI_BCNF_SECBUS); 2113 2114 /* Get "ranges" property */ 2115 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2116 "ranges", (caddr_t)&bus_p->bus_addr_ranges, 2117 &bus_p->bus_addr_entries) != DDI_PROP_SUCCESS) 2118 bus_p->bus_addr_entries = 0; 2119 bus_p->bus_addr_entries /= sizeof (ppb_ranges_t); 2120 } 2121 2122 /* save "assigned-addresses" property array, ignore failues */ 2123 if (ddi_getlongprop(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2124 "assigned-addresses", (caddr_t)&bus_p->bus_assigned_addr, 2125 &bus_p->bus_assigned_entries) == DDI_PROP_SUCCESS) 2126 bus_p->bus_assigned_entries /= sizeof (pci_regspec_t); 2127 else 2128 bus_p->bus_assigned_entries = 0; 2129 2130 pcie_init_pfd(dip); 2131 2132 pcie_init_plat(dip); 2133 2134 pcie_capture_speeds(dip); 2135 2136 final_done: 2137 2138 PCIE_DBG("Add %s(dip 0x%p, bdf 0x%x, secbus 0x%x)\n", 2139 ddi_driver_name(dip), (void *)dip, bus_p->bus_bdf, 2140 bus_p->bus_bdg_secbus); 2141 #ifdef DEBUG 2142 if (bus_p != NULL) { 2143 pcie_print_bus(bus_p); 2144 } 2145 #endif 2146 2147 return (bus_p); 2148 } 2149 2150 /* 2151 * Invoked before destroying devinfo node, mostly during hotplug 2152 * operation to free pcie_bus_t data structure 2153 */ 2154 /* ARGSUSED */ 2155 void 2156 pcie_fini_bus(dev_info_t *dip, uint8_t flags) 2157 { 2158 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2159 ASSERT(bus_p); 2160 2161 if (flags & PCIE_BUS_INITIAL) { 2162 pcie_fini_plat(dip); 2163 pcie_fini_pfd(dip); 2164 2165 if (PCIE_IS_RP(bus_p)) { 2166 kmem_free(bus_p->bus_fab, sizeof (pcie_fabric_data_t)); 2167 bus_p->bus_fab = NULL; 2168 } 2169 2170 kmem_free(bus_p->bus_assigned_addr, 2171 (sizeof (pci_regspec_t) * bus_p->bus_assigned_entries)); 2172 kmem_free(bus_p->bus_addr_ranges, 2173 (sizeof (ppb_ranges_t) * bus_p->bus_addr_entries)); 2174 /* zero out the fields that have been destroyed */ 2175 bus_p->bus_assigned_addr = NULL; 2176 bus_p->bus_addr_ranges = NULL; 2177 bus_p->bus_assigned_entries = 0; 2178 bus_p->bus_addr_entries = 0; 2179 } 2180 2181 if (flags & PCIE_BUS_FINAL) { 2182 if (PCIE_IS_HOTPLUG_CAPABLE(dip)) { 2183 (void) ndi_prop_remove(DDI_DEV_T_NONE, dip, 2184 "hotplug-capable"); 2185 } 2186 2187 ndi_set_bus_private(dip, B_TRUE, 0, NULL); 2188 kmem_free(bus_p, sizeof (pcie_bus_t)); 2189 } 2190 } 2191 2192 int 2193 pcie_postattach_child(dev_info_t *cdip) 2194 { 2195 pcie_bus_t *bus_p = PCIE_DIP2BUS(cdip); 2196 2197 if (!bus_p) 2198 return (DDI_FAILURE); 2199 2200 return (pcie_enable_ce(cdip)); 2201 } 2202 2203 /* 2204 * PCI-Express child device de-initialization. 2205 * This function disables generic pci-express interrupts and error 2206 * handling. 2207 */ 2208 void 2209 pcie_uninitchild(dev_info_t *cdip) 2210 { 2211 pcie_disable_errors(cdip); 2212 pcie_fini_cfghdl(cdip); 2213 pcie_fini_dom(cdip); 2214 } 2215 2216 /* 2217 * find the root complex dip 2218 */ 2219 dev_info_t * 2220 pcie_get_rc_dip(dev_info_t *dip) 2221 { 2222 dev_info_t *rcdip; 2223 pcie_bus_t *rc_bus_p; 2224 2225 for (rcdip = ddi_get_parent(dip); rcdip; 2226 rcdip = ddi_get_parent(rcdip)) { 2227 rc_bus_p = PCIE_DIP2BUS(rcdip); 2228 if (rc_bus_p && PCIE_IS_RC(rc_bus_p)) 2229 break; 2230 } 2231 2232 return (rcdip); 2233 } 2234 2235 boolean_t 2236 pcie_is_pci_device(dev_info_t *dip) 2237 { 2238 dev_info_t *pdip; 2239 char *device_type; 2240 2241 pdip = ddi_get_parent(dip); 2242 if (pdip == NULL) 2243 return (B_FALSE); 2244 2245 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, DDI_PROP_DONTPASS, 2246 "device_type", &device_type) != DDI_PROP_SUCCESS) 2247 return (B_FALSE); 2248 2249 if (strcmp(device_type, "pciex") != 0 && 2250 strcmp(device_type, "pci") != 0) { 2251 ddi_prop_free(device_type); 2252 return (B_FALSE); 2253 } 2254 2255 ddi_prop_free(device_type); 2256 return (B_TRUE); 2257 } 2258 2259 typedef struct { 2260 boolean_t init; 2261 uint8_t flags; 2262 } pcie_bus_arg_t; 2263 2264 /*ARGSUSED*/ 2265 static int 2266 pcie_fab_do_init_fini(dev_info_t *dip, void *arg) 2267 { 2268 pcie_req_id_t bdf; 2269 pcie_bus_arg_t *bus_arg = (pcie_bus_arg_t *)arg; 2270 2271 if (!pcie_is_pci_device(dip)) 2272 goto out; 2273 2274 if (bus_arg->init) { 2275 if (pcie_get_bdf_from_dip(dip, &bdf) != DDI_SUCCESS) 2276 goto out; 2277 2278 (void) pcie_init_bus(dip, bdf, bus_arg->flags); 2279 } else { 2280 (void) pcie_fini_bus(dip, bus_arg->flags); 2281 } 2282 2283 return (DDI_WALK_CONTINUE); 2284 2285 out: 2286 return (DDI_WALK_PRUNECHILD); 2287 } 2288 2289 void 2290 pcie_fab_init_bus(dev_info_t *rcdip, uint8_t flags) 2291 { 2292 int circular_count; 2293 dev_info_t *dip = ddi_get_child(rcdip); 2294 pcie_bus_arg_t arg; 2295 2296 arg.init = B_TRUE; 2297 arg.flags = flags; 2298 2299 ndi_devi_enter(rcdip, &circular_count); 2300 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2301 ndi_devi_exit(rcdip, circular_count); 2302 } 2303 2304 void 2305 pcie_fab_fini_bus(dev_info_t *rcdip, uint8_t flags) 2306 { 2307 int circular_count; 2308 dev_info_t *dip = ddi_get_child(rcdip); 2309 pcie_bus_arg_t arg; 2310 2311 arg.init = B_FALSE; 2312 arg.flags = flags; 2313 2314 ndi_devi_enter(rcdip, &circular_count); 2315 ddi_walk_devs(dip, pcie_fab_do_init_fini, &arg); 2316 ndi_devi_exit(rcdip, circular_count); 2317 } 2318 2319 void 2320 pcie_enable_errors(dev_info_t *dip) 2321 { 2322 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2323 uint16_t reg16, tmp16; 2324 uint32_t reg32, tmp32; 2325 2326 ASSERT(bus_p); 2327 2328 /* 2329 * Clear any pending errors 2330 */ 2331 pcie_clear_errors(dip); 2332 2333 if (!PCIE_IS_PCIE(bus_p)) 2334 return; 2335 2336 /* 2337 * Enable Baseline Error Handling but leave CE reporting off (poweron 2338 * default). 2339 */ 2340 if ((reg16 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL)) != 2341 PCI_CAP_EINVAL16) { 2342 tmp16 = (reg16 & (PCIE_DEVCTL_MAX_READ_REQ_MASK | 2343 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2344 (pcie_devctl_default & ~(PCIE_DEVCTL_MAX_READ_REQ_MASK | 2345 PCIE_DEVCTL_MAX_PAYLOAD_MASK)) | 2346 (pcie_base_err_default & (~PCIE_DEVCTL_CE_REPORTING_EN)); 2347 2348 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, tmp16); 2349 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, reg16); 2350 } 2351 2352 /* Enable Root Port Baseline Error Receiving */ 2353 if (PCIE_IS_ROOT(bus_p) && 2354 (reg16 = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL)) != 2355 PCI_CAP_EINVAL16) { 2356 2357 tmp16 = pcie_serr_disable_flag ? 2358 (pcie_root_ctrl_default & ~PCIE_ROOT_SYS_ERR) : 2359 pcie_root_ctrl_default; 2360 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, tmp16); 2361 PCIE_DBG_CAP(dip, bus_p, "ROOT DEVCTL", 16, PCIE_ROOTCTL, 2362 reg16); 2363 } 2364 2365 /* 2366 * Enable PCI-Express Advanced Error Handling if Exists 2367 */ 2368 if (!PCIE_HAS_AER(bus_p)) 2369 return; 2370 2371 /* Set Uncorrectable Severity */ 2372 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_SERV)) != 2373 PCI_CAP_EINVAL32) { 2374 tmp32 = pcie_aer_uce_severity; 2375 2376 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_SERV, tmp32); 2377 PCIE_DBG_AER(dip, bus_p, "AER UCE SEV", 32, PCIE_AER_UCE_SERV, 2378 reg32); 2379 } 2380 2381 /* Enable Uncorrectable errors */ 2382 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_UCE_MASK)) != 2383 PCI_CAP_EINVAL32) { 2384 tmp32 = pcie_aer_uce_mask; 2385 2386 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, tmp32); 2387 PCIE_DBG_AER(dip, bus_p, "AER UCE MASK", 32, PCIE_AER_UCE_MASK, 2388 reg32); 2389 } 2390 2391 /* Enable ECRC generation and checking */ 2392 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2393 PCI_CAP_EINVAL32) { 2394 tmp32 = reg32 | pcie_ecrc_value; 2395 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, tmp32); 2396 PCIE_DBG_AER(dip, bus_p, "AER CTL", 32, PCIE_AER_CTL, reg32); 2397 } 2398 2399 /* Enable Secondary Uncorrectable errors if this is a bridge */ 2400 if (!PCIE_IS_PCIE_BDG(bus_p)) 2401 goto root; 2402 2403 /* Set Uncorrectable Severity */ 2404 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_SERV)) != 2405 PCI_CAP_EINVAL32) { 2406 tmp32 = pcie_aer_suce_severity; 2407 2408 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_SERV, tmp32); 2409 PCIE_DBG_AER(dip, bus_p, "AER SUCE SEV", 32, PCIE_AER_SUCE_SERV, 2410 reg32); 2411 } 2412 2413 if ((reg32 = PCIE_AER_GET(32, bus_p, PCIE_AER_SUCE_MASK)) != 2414 PCI_CAP_EINVAL32) { 2415 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, pcie_aer_suce_mask); 2416 PCIE_DBG_AER(dip, bus_p, "AER SUCE MASK", 32, 2417 PCIE_AER_SUCE_MASK, reg32); 2418 } 2419 2420 root: 2421 /* 2422 * Enable Root Control this is a Root device 2423 */ 2424 if (!PCIE_IS_ROOT(bus_p)) 2425 return; 2426 2427 if ((reg16 = PCIE_AER_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2428 PCI_CAP_EINVAL16) { 2429 PCIE_AER_PUT(16, bus_p, PCIE_AER_RE_CMD, 2430 pcie_root_error_cmd_default); 2431 PCIE_DBG_AER(dip, bus_p, "AER Root Err Cmd", 16, 2432 PCIE_AER_RE_CMD, reg16); 2433 } 2434 } 2435 2436 /* 2437 * This function is used for enabling CE reporting and setting the AER CE mask. 2438 * When called from outside the pcie module it should always be preceded by 2439 * a call to pcie_enable_errors. 2440 */ 2441 int 2442 pcie_enable_ce(dev_info_t *dip) 2443 { 2444 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2445 uint16_t device_sts, device_ctl; 2446 uint32_t tmp_pcie_aer_ce_mask; 2447 2448 if (!PCIE_IS_PCIE(bus_p)) 2449 return (DDI_SUCCESS); 2450 2451 /* 2452 * The "pcie_ce_mask" property is used to control both the CE reporting 2453 * enable field in the device control register and the AER CE mask. We 2454 * leave CE reporting disabled if pcie_ce_mask is set to -1. 2455 */ 2456 2457 tmp_pcie_aer_ce_mask = (uint32_t)ddi_prop_get_int(DDI_DEV_T_ANY, dip, 2458 DDI_PROP_DONTPASS, "pcie_ce_mask", pcie_aer_ce_mask); 2459 2460 if (tmp_pcie_aer_ce_mask == (uint32_t)-1) { 2461 /* 2462 * Nothing to do since CE reporting has already been disabled. 2463 */ 2464 return (DDI_SUCCESS); 2465 } 2466 2467 if (PCIE_HAS_AER(bus_p)) { 2468 /* Enable AER CE */ 2469 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, tmp_pcie_aer_ce_mask); 2470 PCIE_DBG_AER(dip, bus_p, "AER CE MASK", 32, PCIE_AER_CE_MASK, 2471 0); 2472 2473 /* Clear any pending AER CE errors */ 2474 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_STS, -1); 2475 } 2476 2477 /* clear any pending CE errors */ 2478 if ((device_sts = PCIE_CAP_GET(16, bus_p, PCIE_DEVSTS)) != 2479 PCI_CAP_EINVAL16) 2480 PCIE_CAP_PUT(16, bus_p, PCIE_DEVSTS, 2481 device_sts & (~PCIE_DEVSTS_CE_DETECTED)); 2482 2483 /* Enable CE reporting */ 2484 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2485 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, 2486 (device_ctl & (~PCIE_DEVCTL_ERR_MASK)) | pcie_base_err_default); 2487 PCIE_DBG_CAP(dip, bus_p, "DEVCTL", 16, PCIE_DEVCTL, device_ctl); 2488 2489 return (DDI_SUCCESS); 2490 } 2491 2492 /* ARGSUSED */ 2493 void 2494 pcie_disable_errors(dev_info_t *dip) 2495 { 2496 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2497 uint16_t device_ctl; 2498 uint32_t aer_reg; 2499 2500 if (!PCIE_IS_PCIE(bus_p)) 2501 return; 2502 2503 /* 2504 * Disable PCI-Express Baseline Error Handling 2505 */ 2506 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL); 2507 device_ctl &= ~PCIE_DEVCTL_ERR_MASK; 2508 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL, device_ctl); 2509 2510 /* 2511 * Disable PCI-Express Advanced Error Handling if Exists 2512 */ 2513 if (!PCIE_HAS_AER(bus_p)) 2514 goto root; 2515 2516 /* Disable Uncorrectable errors */ 2517 PCIE_AER_PUT(32, bus_p, PCIE_AER_UCE_MASK, PCIE_AER_UCE_BITS); 2518 2519 /* Disable Correctable errors */ 2520 PCIE_AER_PUT(32, bus_p, PCIE_AER_CE_MASK, PCIE_AER_CE_BITS); 2521 2522 /* Disable ECRC generation and checking */ 2523 if ((aer_reg = PCIE_AER_GET(32, bus_p, PCIE_AER_CTL)) != 2524 PCI_CAP_EINVAL32) { 2525 aer_reg &= ~(PCIE_AER_CTL_ECRC_GEN_ENA | 2526 PCIE_AER_CTL_ECRC_CHECK_ENA); 2527 2528 PCIE_AER_PUT(32, bus_p, PCIE_AER_CTL, aer_reg); 2529 } 2530 /* 2531 * Disable Secondary Uncorrectable errors if this is a bridge 2532 */ 2533 if (!PCIE_IS_PCIE_BDG(bus_p)) 2534 goto root; 2535 2536 PCIE_AER_PUT(32, bus_p, PCIE_AER_SUCE_MASK, PCIE_AER_SUCE_BITS); 2537 2538 root: 2539 /* 2540 * disable Root Control this is a Root device 2541 */ 2542 if (!PCIE_IS_ROOT(bus_p)) 2543 return; 2544 2545 if (!pcie_serr_disable_flag) { 2546 device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_ROOTCTL); 2547 device_ctl &= ~PCIE_ROOT_SYS_ERR; 2548 PCIE_CAP_PUT(16, bus_p, PCIE_ROOTCTL, device_ctl); 2549 } 2550 2551 if (!PCIE_HAS_AER(bus_p)) 2552 return; 2553 2554 if ((device_ctl = PCIE_CAP_GET(16, bus_p, PCIE_AER_RE_CMD)) != 2555 PCI_CAP_EINVAL16) { 2556 device_ctl &= ~pcie_root_error_cmd_default; 2557 PCIE_CAP_PUT(16, bus_p, PCIE_AER_RE_CMD, device_ctl); 2558 } 2559 } 2560 2561 /* 2562 * Extract bdf from "reg" property. 2563 */ 2564 int 2565 pcie_get_bdf_from_dip(dev_info_t *dip, pcie_req_id_t *bdf) 2566 { 2567 pci_regspec_t *regspec; 2568 int reglen; 2569 2570 if (ddi_prop_lookup_int_array(DDI_DEV_T_ANY, dip, DDI_PROP_DONTPASS, 2571 "reg", (int **)®spec, (uint_t *)®len) != DDI_SUCCESS) 2572 return (DDI_FAILURE); 2573 2574 if (reglen < (sizeof (pci_regspec_t) / sizeof (int))) { 2575 ddi_prop_free(regspec); 2576 return (DDI_FAILURE); 2577 } 2578 2579 /* Get phys_hi from first element. All have same bdf. */ 2580 *bdf = (regspec->pci_phys_hi & (PCI_REG_BDFR_M ^ PCI_REG_REG_M)) >> 8; 2581 2582 ddi_prop_free(regspec); 2583 return (DDI_SUCCESS); 2584 } 2585 2586 dev_info_t * 2587 pcie_get_my_childs_dip(dev_info_t *dip, dev_info_t *rdip) 2588 { 2589 dev_info_t *cdip = rdip; 2590 2591 for (; ddi_get_parent(cdip) != dip; cdip = ddi_get_parent(cdip)) 2592 ; 2593 2594 return (cdip); 2595 } 2596 2597 uint32_t 2598 pcie_get_bdf_for_dma_xfer(dev_info_t *dip, dev_info_t *rdip) 2599 { 2600 dev_info_t *cdip; 2601 2602 /* 2603 * As part of the probing, the PCI fcode interpreter may setup a DMA 2604 * request if a given card has a fcode on it using dip and rdip of the 2605 * hotplug connector i.e, dip and rdip of px/pcieb driver. In this 2606 * case, return a invalid value for the bdf since we cannot get to the 2607 * bdf value of the actual device which will be initiating this DMA. 2608 */ 2609 if (rdip == dip) 2610 return (PCIE_INVALID_BDF); 2611 2612 cdip = pcie_get_my_childs_dip(dip, rdip); 2613 2614 /* 2615 * For a given rdip, return the bdf value of dip's (px or pcieb) 2616 * immediate child or secondary bus-id if dip is a PCIe2PCI bridge. 2617 * 2618 * XXX - For now, return a invalid bdf value for all PCI and PCI-X 2619 * devices since this needs more work. 2620 */ 2621 return (PCI_GET_PCIE2PCI_SECBUS(cdip) ? 2622 PCIE_INVALID_BDF : PCI_GET_BDF(cdip)); 2623 } 2624 2625 uint32_t 2626 pcie_get_aer_uce_mask() 2627 { 2628 return (pcie_aer_uce_mask); 2629 } 2630 uint32_t 2631 pcie_get_aer_ce_mask() 2632 { 2633 return (pcie_aer_ce_mask); 2634 } 2635 uint32_t 2636 pcie_get_aer_suce_mask() 2637 { 2638 return (pcie_aer_suce_mask); 2639 } 2640 uint32_t 2641 pcie_get_serr_mask() 2642 { 2643 return (pcie_serr_disable_flag); 2644 } 2645 2646 void 2647 pcie_set_aer_uce_mask(uint32_t mask) 2648 { 2649 pcie_aer_uce_mask = mask; 2650 if (mask & PCIE_AER_UCE_UR) 2651 pcie_base_err_default &= ~PCIE_DEVCTL_UR_REPORTING_EN; 2652 else 2653 pcie_base_err_default |= PCIE_DEVCTL_UR_REPORTING_EN; 2654 2655 if (mask & PCIE_AER_UCE_ECRC) 2656 pcie_ecrc_value = 0; 2657 } 2658 2659 void 2660 pcie_set_aer_ce_mask(uint32_t mask) 2661 { 2662 pcie_aer_ce_mask = mask; 2663 } 2664 void 2665 pcie_set_aer_suce_mask(uint32_t mask) 2666 { 2667 pcie_aer_suce_mask = mask; 2668 } 2669 void 2670 pcie_set_serr_mask(uint32_t mask) 2671 { 2672 pcie_serr_disable_flag = mask; 2673 } 2674 2675 /* 2676 * Is the rdip a child of dip. Used for checking certain CTLOPS from bubbling 2677 * up erronously. Ex. ISA ctlops to a PCI-PCI Bridge. 2678 */ 2679 boolean_t 2680 pcie_is_child(dev_info_t *dip, dev_info_t *rdip) 2681 { 2682 dev_info_t *cdip = ddi_get_child(dip); 2683 for (; cdip; cdip = ddi_get_next_sibling(cdip)) 2684 if (cdip == rdip) 2685 break; 2686 return (cdip != NULL); 2687 } 2688 2689 boolean_t 2690 pcie_is_link_disabled(dev_info_t *dip) 2691 { 2692 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2693 2694 if (PCIE_IS_PCIE(bus_p)) { 2695 if (PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL) & 2696 PCIE_LINKCTL_LINK_DISABLE) 2697 return (B_TRUE); 2698 } 2699 return (B_FALSE); 2700 } 2701 2702 /* 2703 * Determines if there are any root ports attached to a root complex. 2704 * 2705 * dip - dip of root complex 2706 * 2707 * Returns - DDI_SUCCESS if there is at least one root port otherwise 2708 * DDI_FAILURE. 2709 */ 2710 int 2711 pcie_root_port(dev_info_t *dip) 2712 { 2713 int port_type; 2714 uint16_t cap_ptr; 2715 ddi_acc_handle_t config_handle; 2716 dev_info_t *cdip = ddi_get_child(dip); 2717 2718 /* 2719 * Determine if any of the children of the passed in dip 2720 * are root ports. 2721 */ 2722 for (; cdip; cdip = ddi_get_next_sibling(cdip)) { 2723 2724 if (pci_config_setup(cdip, &config_handle) != DDI_SUCCESS) 2725 continue; 2726 2727 if ((PCI_CAP_LOCATE(config_handle, PCI_CAP_ID_PCI_E, 2728 &cap_ptr)) == DDI_FAILURE) { 2729 pci_config_teardown(&config_handle); 2730 continue; 2731 } 2732 2733 port_type = PCI_CAP_GET16(config_handle, 0, cap_ptr, 2734 PCIE_PCIECAP) & PCIE_PCIECAP_DEV_TYPE_MASK; 2735 2736 pci_config_teardown(&config_handle); 2737 2738 if (port_type == PCIE_PCIECAP_DEV_TYPE_ROOT) 2739 return (DDI_SUCCESS); 2740 } 2741 2742 /* No root ports were found */ 2743 2744 return (DDI_FAILURE); 2745 } 2746 2747 /* 2748 * Function that determines if a device a PCIe device. 2749 * 2750 * dip - dip of device. 2751 * 2752 * returns - DDI_SUCCESS if device is a PCIe device, otherwise DDI_FAILURE. 2753 */ 2754 int 2755 pcie_dev(dev_info_t *dip) 2756 { 2757 /* get parent device's device_type property */ 2758 char *device_type; 2759 int rc = DDI_FAILURE; 2760 dev_info_t *pdip = ddi_get_parent(dip); 2761 2762 if (ddi_prop_lookup_string(DDI_DEV_T_ANY, pdip, 2763 DDI_PROP_DONTPASS, "device_type", &device_type) 2764 != DDI_PROP_SUCCESS) { 2765 return (DDI_FAILURE); 2766 } 2767 2768 if (strcmp(device_type, "pciex") == 0) 2769 rc = DDI_SUCCESS; 2770 else 2771 rc = DDI_FAILURE; 2772 2773 ddi_prop_free(device_type); 2774 return (rc); 2775 } 2776 2777 /* 2778 * Function to map in a device's memory space. 2779 */ 2780 static int 2781 pcie_map_phys(dev_info_t *dip, pci_regspec_t *phys_spec, 2782 caddr_t *addrp, ddi_acc_handle_t *handlep) 2783 { 2784 ddi_map_req_t mr; 2785 ddi_acc_hdl_t *hp; 2786 int result; 2787 ddi_device_acc_attr_t attr; 2788 2789 attr.devacc_attr_version = DDI_DEVICE_ATTR_V0; 2790 attr.devacc_attr_endian_flags = DDI_STRUCTURE_LE_ACC; 2791 attr.devacc_attr_dataorder = DDI_STRICTORDER_ACC; 2792 attr.devacc_attr_access = DDI_CAUTIOUS_ACC; 2793 2794 *handlep = impl_acc_hdl_alloc(KM_SLEEP, NULL); 2795 hp = impl_acc_hdl_get(*handlep); 2796 hp->ah_vers = VERS_ACCHDL; 2797 hp->ah_dip = dip; 2798 hp->ah_rnumber = 0; 2799 hp->ah_offset = 0; 2800 hp->ah_len = 0; 2801 hp->ah_acc = attr; 2802 2803 mr.map_op = DDI_MO_MAP_LOCKED; 2804 mr.map_type = DDI_MT_REGSPEC; 2805 mr.map_obj.rp = (struct regspec *)phys_spec; 2806 mr.map_prot = PROT_READ | PROT_WRITE; 2807 mr.map_flags = DDI_MF_KERNEL_MAPPING; 2808 mr.map_handlep = hp; 2809 mr.map_vers = DDI_MAP_VERSION; 2810 2811 result = ddi_map(dip, &mr, 0, 0, addrp); 2812 2813 if (result != DDI_SUCCESS) { 2814 impl_acc_hdl_free(*handlep); 2815 *handlep = (ddi_acc_handle_t)NULL; 2816 } else { 2817 hp->ah_addr = *addrp; 2818 } 2819 2820 return (result); 2821 } 2822 2823 /* 2824 * Map out memory that was mapped in with pcie_map_phys(); 2825 */ 2826 static void 2827 pcie_unmap_phys(ddi_acc_handle_t *handlep, pci_regspec_t *ph) 2828 { 2829 ddi_map_req_t mr; 2830 ddi_acc_hdl_t *hp; 2831 2832 hp = impl_acc_hdl_get(*handlep); 2833 ASSERT(hp); 2834 2835 mr.map_op = DDI_MO_UNMAP; 2836 mr.map_type = DDI_MT_REGSPEC; 2837 mr.map_obj.rp = (struct regspec *)ph; 2838 mr.map_prot = PROT_READ | PROT_WRITE; 2839 mr.map_flags = DDI_MF_KERNEL_MAPPING; 2840 mr.map_handlep = hp; 2841 mr.map_vers = DDI_MAP_VERSION; 2842 2843 (void) ddi_map(hp->ah_dip, &mr, hp->ah_offset, 2844 hp->ah_len, &hp->ah_addr); 2845 2846 impl_acc_hdl_free(*handlep); 2847 *handlep = (ddi_acc_handle_t)NULL; 2848 } 2849 2850 void 2851 pcie_set_rber_fatal(dev_info_t *dip, boolean_t val) 2852 { 2853 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2854 bus_p->bus_pfd->pe_rber_fatal = val; 2855 } 2856 2857 /* 2858 * Return parent Root Port's pe_rber_fatal value. 2859 */ 2860 boolean_t 2861 pcie_get_rber_fatal(dev_info_t *dip) 2862 { 2863 pcie_bus_t *bus_p = PCIE_DIP2UPBUS(dip); 2864 pcie_bus_t *rp_bus_p = PCIE_DIP2UPBUS(bus_p->bus_rp_dip); 2865 return (rp_bus_p->bus_pfd->pe_rber_fatal); 2866 } 2867 2868 int 2869 pcie_ari_supported(dev_info_t *dip) 2870 { 2871 uint32_t devcap2; 2872 uint16_t pciecap; 2873 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2874 uint8_t dev_type; 2875 2876 PCIE_DBG("pcie_ari_supported: dip=%p\n", dip); 2877 2878 if (bus_p == NULL) 2879 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2880 2881 dev_type = bus_p->bus_dev_type; 2882 2883 if ((dev_type != PCIE_PCIECAP_DEV_TYPE_DOWN) && 2884 (dev_type != PCIE_PCIECAP_DEV_TYPE_ROOT)) 2885 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2886 2887 if (pcie_disable_ari) { 2888 PCIE_DBG("pcie_ari_supported: dip=%p: ARI Disabled\n", dip); 2889 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2890 } 2891 2892 pciecap = PCIE_CAP_GET(16, bus_p, PCIE_PCIECAP); 2893 2894 if ((pciecap & PCIE_PCIECAP_VER_MASK) < PCIE_PCIECAP_VER_2_0) { 2895 PCIE_DBG("pcie_ari_supported: dip=%p: Not 2.0\n", dip); 2896 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2897 } 2898 2899 devcap2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCAP2); 2900 2901 PCIE_DBG("pcie_ari_supported: dip=%p: DevCap2=0x%x\n", 2902 dip, devcap2); 2903 2904 if (devcap2 & PCIE_DEVCAP2_ARI_FORWARD) { 2905 PCIE_DBG("pcie_ari_supported: " 2906 "dip=%p: ARI Forwarding is supported\n", dip); 2907 return (PCIE_ARI_FORW_SUPPORTED); 2908 } 2909 return (PCIE_ARI_FORW_NOT_SUPPORTED); 2910 } 2911 2912 int 2913 pcie_ari_enable(dev_info_t *dip) 2914 { 2915 uint16_t devctl2; 2916 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2917 2918 PCIE_DBG("pcie_ari_enable: dip=%p\n", dip); 2919 2920 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2921 return (DDI_FAILURE); 2922 2923 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2924 devctl2 |= PCIE_DEVCTL2_ARI_FORWARD_EN; 2925 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2926 2927 PCIE_DBG("pcie_ari_enable: dip=%p: writing 0x%x to DevCtl2\n", 2928 dip, devctl2); 2929 2930 return (DDI_SUCCESS); 2931 } 2932 2933 int 2934 pcie_ari_disable(dev_info_t *dip) 2935 { 2936 uint16_t devctl2; 2937 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2938 2939 PCIE_DBG("pcie_ari_disable: dip=%p\n", dip); 2940 2941 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2942 return (DDI_FAILURE); 2943 2944 devctl2 = PCIE_CAP_GET(16, bus_p, PCIE_DEVCTL2); 2945 devctl2 &= ~PCIE_DEVCTL2_ARI_FORWARD_EN; 2946 PCIE_CAP_PUT(16, bus_p, PCIE_DEVCTL2, devctl2); 2947 2948 PCIE_DBG("pcie_ari_disable: dip=%p: writing 0x%x to DevCtl2\n", 2949 dip, devctl2); 2950 2951 return (DDI_SUCCESS); 2952 } 2953 2954 int 2955 pcie_ari_is_enabled(dev_info_t *dip) 2956 { 2957 uint16_t devctl2; 2958 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 2959 2960 PCIE_DBG("pcie_ari_is_enabled: dip=%p\n", dip); 2961 2962 if (pcie_ari_supported(dip) == PCIE_ARI_FORW_NOT_SUPPORTED) 2963 return (PCIE_ARI_FORW_DISABLED); 2964 2965 devctl2 = PCIE_CAP_GET(32, bus_p, PCIE_DEVCTL2); 2966 2967 PCIE_DBG("pcie_ari_is_enabled: dip=%p: DevCtl2=0x%x\n", 2968 dip, devctl2); 2969 2970 if (devctl2 & PCIE_DEVCTL2_ARI_FORWARD_EN) { 2971 PCIE_DBG("pcie_ari_is_enabled: " 2972 "dip=%p: ARI Forwarding is enabled\n", dip); 2973 return (PCIE_ARI_FORW_ENABLED); 2974 } 2975 2976 return (PCIE_ARI_FORW_DISABLED); 2977 } 2978 2979 int 2980 pcie_ari_device(dev_info_t *dip) 2981 { 2982 ddi_acc_handle_t handle; 2983 uint16_t cap_ptr; 2984 2985 PCIE_DBG("pcie_ari_device: dip=%p\n", dip); 2986 2987 /* 2988 * XXX - This function may be called before the bus_p structure 2989 * has been populated. This code can be changed to remove 2990 * pci_config_setup()/pci_config_teardown() when the RFE 2991 * to populate the bus_p structures early in boot is putback. 2992 */ 2993 2994 /* First make sure it is a PCIe device */ 2995 2996 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 2997 return (PCIE_NOT_ARI_DEVICE); 2998 2999 if ((PCI_CAP_LOCATE(handle, PCI_CAP_ID_PCI_E, &cap_ptr)) 3000 != DDI_SUCCESS) { 3001 pci_config_teardown(&handle); 3002 return (PCIE_NOT_ARI_DEVICE); 3003 } 3004 3005 /* Locate the ARI Capability */ 3006 3007 if ((PCI_CAP_LOCATE(handle, PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), 3008 &cap_ptr)) == DDI_FAILURE) { 3009 pci_config_teardown(&handle); 3010 return (PCIE_NOT_ARI_DEVICE); 3011 } 3012 3013 /* ARI Capability was found so it must be a ARI device */ 3014 PCIE_DBG("pcie_ari_device: ARI Device dip=%p\n", dip); 3015 3016 pci_config_teardown(&handle); 3017 return (PCIE_ARI_DEVICE); 3018 } 3019 3020 int 3021 pcie_ari_get_next_function(dev_info_t *dip, int *func) 3022 { 3023 uint32_t val; 3024 uint16_t cap_ptr, next_function; 3025 ddi_acc_handle_t handle; 3026 3027 /* 3028 * XXX - This function may be called before the bus_p structure 3029 * has been populated. This code can be changed to remove 3030 * pci_config_setup()/pci_config_teardown() when the RFE 3031 * to populate the bus_p structures early in boot is putback. 3032 */ 3033 3034 if (pci_config_setup(dip, &handle) != DDI_SUCCESS) 3035 return (DDI_FAILURE); 3036 3037 if ((PCI_CAP_LOCATE(handle, 3038 PCI_CAP_XCFG_SPC(PCIE_EXT_CAP_ID_ARI), &cap_ptr)) == DDI_FAILURE) { 3039 pci_config_teardown(&handle); 3040 return (DDI_FAILURE); 3041 } 3042 3043 val = PCI_CAP_GET32(handle, 0, cap_ptr, PCIE_ARI_CAP); 3044 3045 next_function = (val >> PCIE_ARI_CAP_NEXT_FUNC_SHIFT) & 3046 PCIE_ARI_CAP_NEXT_FUNC_MASK; 3047 3048 pci_config_teardown(&handle); 3049 3050 *func = next_function; 3051 3052 return (DDI_SUCCESS); 3053 } 3054 3055 dev_info_t * 3056 pcie_func_to_dip(dev_info_t *dip, pcie_req_id_t function) 3057 { 3058 pcie_req_id_t child_bdf; 3059 dev_info_t *cdip; 3060 3061 for (cdip = ddi_get_child(dip); cdip; 3062 cdip = ddi_get_next_sibling(cdip)) { 3063 3064 if (pcie_get_bdf_from_dip(cdip, &child_bdf) == DDI_FAILURE) 3065 return (NULL); 3066 3067 if ((child_bdf & PCIE_REQ_ID_ARI_FUNC_MASK) == function) 3068 return (cdip); 3069 } 3070 return (NULL); 3071 } 3072 3073 #ifdef DEBUG 3074 3075 static void 3076 pcie_print_bus(pcie_bus_t *bus_p) 3077 { 3078 pcie_dbg("\tbus_dip = 0x%p\n", bus_p->bus_dip); 3079 pcie_dbg("\tbus_fm_flags = 0x%x\n", bus_p->bus_fm_flags); 3080 3081 pcie_dbg("\tbus_bdf = 0x%x\n", bus_p->bus_bdf); 3082 pcie_dbg("\tbus_dev_ven_id = 0x%x\n", bus_p->bus_dev_ven_id); 3083 pcie_dbg("\tbus_rev_id = 0x%x\n", bus_p->bus_rev_id); 3084 pcie_dbg("\tbus_hdr_type = 0x%x\n", bus_p->bus_hdr_type); 3085 pcie_dbg("\tbus_dev_type = 0x%x\n", bus_p->bus_dev_type); 3086 pcie_dbg("\tbus_bdg_secbus = 0x%x\n", bus_p->bus_bdg_secbus); 3087 pcie_dbg("\tbus_pcie_off = 0x%x\n", bus_p->bus_pcie_off); 3088 pcie_dbg("\tbus_aer_off = 0x%x\n", bus_p->bus_aer_off); 3089 pcie_dbg("\tbus_pcix_off = 0x%x\n", bus_p->bus_pcix_off); 3090 pcie_dbg("\tbus_ecc_ver = 0x%x\n", bus_p->bus_ecc_ver); 3091 } 3092 3093 /* 3094 * For debugging purposes set pcie_dbg_print != 0 to see printf messages 3095 * during interrupt. 3096 * 3097 * When a proper solution is in place this code will disappear. 3098 * Potential solutions are: 3099 * o circular buffers 3100 * o taskq to print at lower pil 3101 */ 3102 int pcie_dbg_print = 0; 3103 void 3104 pcie_dbg(char *fmt, ...) 3105 { 3106 va_list ap; 3107 3108 if (!pcie_debug_flags) { 3109 return; 3110 } 3111 va_start(ap, fmt); 3112 if (servicing_interrupt()) { 3113 if (pcie_dbg_print) { 3114 prom_vprintf(fmt, ap); 3115 } 3116 } else { 3117 prom_vprintf(fmt, ap); 3118 } 3119 va_end(ap); 3120 } 3121 #endif /* DEBUG */ 3122 3123 #if defined(__x86) 3124 static void 3125 pcie_check_io_mem_range(ddi_acc_handle_t cfg_hdl, boolean_t *empty_io_range, 3126 boolean_t *empty_mem_range) 3127 { 3128 uint8_t class, subclass; 3129 uint_t val; 3130 3131 class = pci_config_get8(cfg_hdl, PCI_CONF_BASCLASS); 3132 subclass = pci_config_get8(cfg_hdl, PCI_CONF_SUBCLASS); 3133 3134 if ((class == PCI_CLASS_BRIDGE) && (subclass == PCI_BRIDGE_PCI)) { 3135 val = (((uint_t)pci_config_get8(cfg_hdl, PCI_BCNF_IO_BASE_LOW) & 3136 PCI_BCNF_IO_MASK) << 8); 3137 /* 3138 * Assuming that a zero based io_range[0] implies an 3139 * invalid I/O range. Likewise for mem_range[0]. 3140 */ 3141 if (val == 0) 3142 *empty_io_range = B_TRUE; 3143 val = (((uint_t)pci_config_get16(cfg_hdl, PCI_BCNF_MEM_BASE) & 3144 PCI_BCNF_MEM_MASK) << 16); 3145 if (val == 0) 3146 *empty_mem_range = B_TRUE; 3147 } 3148 } 3149 3150 #endif /* defined(__x86) */ 3151 3152 boolean_t 3153 pcie_link_bw_supported(dev_info_t *dip) 3154 { 3155 uint32_t linkcap; 3156 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3157 3158 if (!PCIE_IS_PCIE(bus_p)) { 3159 return (B_FALSE); 3160 } 3161 3162 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3163 return (B_FALSE); 3164 } 3165 3166 linkcap = PCIE_CAP_GET(32, bus_p, PCIE_LINKCAP); 3167 return ((linkcap & PCIE_LINKCAP_LINK_BW_NOTIFY_CAP) != 0); 3168 } 3169 3170 int 3171 pcie_link_bw_enable(dev_info_t *dip) 3172 { 3173 uint16_t linkctl; 3174 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3175 3176 if (pcie_disable_lbw != 0) { 3177 return (DDI_FAILURE); 3178 } 3179 3180 if (!pcie_link_bw_supported(dip)) { 3181 return (DDI_FAILURE); 3182 } 3183 3184 mutex_init(&bus_p->bus_lbw_mutex, NULL, MUTEX_DRIVER, NULL); 3185 cv_init(&bus_p->bus_lbw_cv, NULL, CV_DRIVER, NULL); 3186 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3187 linkctl |= PCIE_LINKCTL_LINK_BW_INTR_EN; 3188 linkctl |= PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3189 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3190 3191 bus_p->bus_lbw_pbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3192 bus_p->bus_lbw_cbuf = kmem_zalloc(MAXPATHLEN, KM_SLEEP); 3193 bus_p->bus_lbw_state |= PCIE_LBW_S_ENABLED; 3194 3195 return (DDI_SUCCESS); 3196 } 3197 3198 int 3199 pcie_link_bw_disable(dev_info_t *dip) 3200 { 3201 uint16_t linkctl; 3202 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3203 3204 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3205 return (DDI_FAILURE); 3206 } 3207 3208 mutex_enter(&bus_p->bus_lbw_mutex); 3209 while ((bus_p->bus_lbw_state & 3210 (PCIE_LBW_S_DISPATCHED | PCIE_LBW_S_RUNNING)) != 0) { 3211 cv_wait(&bus_p->bus_lbw_cv, &bus_p->bus_lbw_mutex); 3212 } 3213 mutex_exit(&bus_p->bus_lbw_mutex); 3214 3215 linkctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3216 linkctl &= ~PCIE_LINKCTL_LINK_BW_INTR_EN; 3217 linkctl &= ~PCIE_LINKCTL_LINK_AUTO_BW_INTR_EN; 3218 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, linkctl); 3219 3220 bus_p->bus_lbw_state &= ~PCIE_LBW_S_ENABLED; 3221 kmem_free(bus_p->bus_lbw_pbuf, MAXPATHLEN); 3222 kmem_free(bus_p->bus_lbw_cbuf, MAXPATHLEN); 3223 bus_p->bus_lbw_pbuf = NULL; 3224 bus_p->bus_lbw_cbuf = NULL; 3225 3226 mutex_destroy(&bus_p->bus_lbw_mutex); 3227 cv_destroy(&bus_p->bus_lbw_cv); 3228 3229 return (DDI_SUCCESS); 3230 } 3231 3232 void 3233 pcie_link_bw_taskq(void *arg) 3234 { 3235 dev_info_t *dip = arg; 3236 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3237 dev_info_t *cdip; 3238 boolean_t again; 3239 sysevent_t *se; 3240 sysevent_value_t se_val; 3241 sysevent_id_t eid; 3242 sysevent_attr_list_t *ev_attr_list; 3243 int circular; 3244 3245 top: 3246 ndi_devi_enter(dip, &circular); 3247 se = NULL; 3248 ev_attr_list = NULL; 3249 mutex_enter(&bus_p->bus_lbw_mutex); 3250 bus_p->bus_lbw_state &= ~PCIE_LBW_S_DISPATCHED; 3251 bus_p->bus_lbw_state |= PCIE_LBW_S_RUNNING; 3252 mutex_exit(&bus_p->bus_lbw_mutex); 3253 3254 /* 3255 * Update our own speeds as we've likely changed something. 3256 */ 3257 pcie_capture_speeds(dip); 3258 3259 /* 3260 * Walk our children. We only care about updating this on function 0 3261 * because the PCIe specification requires that these all be the same 3262 * otherwise. 3263 */ 3264 for (cdip = ddi_get_child(dip); cdip != NULL; 3265 cdip = ddi_get_next_sibling(cdip)) { 3266 pcie_bus_t *cbus_p = PCIE_DIP2BUS(cdip); 3267 3268 if (cbus_p == NULL) { 3269 continue; 3270 } 3271 3272 if ((cbus_p->bus_bdf & PCIE_REQ_ID_FUNC_MASK) != 0) { 3273 continue; 3274 } 3275 3276 /* 3277 * It's possible that this can fire while a child is otherwise 3278 * only partially constructed. Therefore, if we don't have the 3279 * config handle, don't bother updating the child. 3280 */ 3281 if (cbus_p->bus_cfg_hdl == NULL) { 3282 continue; 3283 } 3284 3285 pcie_capture_speeds(cdip); 3286 break; 3287 } 3288 3289 se = sysevent_alloc(EC_PCIE, ESC_PCIE_LINK_STATE, 3290 ILLUMOS_KERN_PUB "pcie", SE_SLEEP); 3291 3292 (void) ddi_pathname(dip, bus_p->bus_lbw_pbuf); 3293 se_val.value_type = SE_DATA_TYPE_STRING; 3294 se_val.value.sv_string = bus_p->bus_lbw_pbuf; 3295 if (sysevent_add_attr(&ev_attr_list, PCIE_EV_DETECTOR_PATH, &se_val, 3296 SE_SLEEP) != 0) { 3297 ndi_devi_exit(dip, circular); 3298 goto err; 3299 } 3300 3301 if (cdip != NULL) { 3302 (void) ddi_pathname(cdip, bus_p->bus_lbw_cbuf); 3303 3304 se_val.value_type = SE_DATA_TYPE_STRING; 3305 se_val.value.sv_string = bus_p->bus_lbw_cbuf; 3306 3307 /* 3308 * If this fails, that's OK. We'd rather get the event off and 3309 * there's a chance that there may not be anything there for us. 3310 */ 3311 (void) sysevent_add_attr(&ev_attr_list, PCIE_EV_CHILD_PATH, 3312 &se_val, SE_SLEEP); 3313 } 3314 3315 ndi_devi_exit(dip, circular); 3316 3317 /* 3318 * Before we generate and send down a sysevent, we need to tell the 3319 * system that parts of the devinfo cache need to be invalidated. While 3320 * the function below takes several args, it ignores them all. Because 3321 * this is a global invalidation, we don't bother trying to do much more 3322 * than requesting a global invalidation, lest we accidentally kick off 3323 * several in a row. 3324 */ 3325 ddi_prop_cache_invalidate(DDI_DEV_T_NONE, NULL, NULL, 0); 3326 3327 if (sysevent_attach_attributes(se, ev_attr_list) != 0) { 3328 goto err; 3329 } 3330 ev_attr_list = NULL; 3331 3332 if (log_sysevent(se, SE_SLEEP, &eid) != 0) { 3333 goto err; 3334 } 3335 3336 err: 3337 sysevent_free_attr(ev_attr_list); 3338 sysevent_free(se); 3339 3340 mutex_enter(&bus_p->bus_lbw_mutex); 3341 bus_p->bus_lbw_state &= ~PCIE_LBW_S_RUNNING; 3342 cv_broadcast(&bus_p->bus_lbw_cv); 3343 again = (bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) != 0; 3344 mutex_exit(&bus_p->bus_lbw_mutex); 3345 3346 if (again) { 3347 goto top; 3348 } 3349 } 3350 3351 int 3352 pcie_link_bw_intr(dev_info_t *dip) 3353 { 3354 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3355 uint16_t linksts; 3356 uint16_t flags = PCIE_LINKSTS_LINK_BW_MGMT | PCIE_LINKSTS_AUTO_BW; 3357 3358 if ((bus_p->bus_lbw_state & PCIE_LBW_S_ENABLED) == 0) { 3359 return (DDI_INTR_UNCLAIMED); 3360 } 3361 3362 linksts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3363 if ((linksts & flags) == 0) { 3364 return (DDI_INTR_UNCLAIMED); 3365 } 3366 3367 /* 3368 * Check if we've already dispatched this event. If we have already 3369 * dispatched it, then there's nothing else to do, we coalesce multiple 3370 * events. 3371 */ 3372 mutex_enter(&bus_p->bus_lbw_mutex); 3373 bus_p->bus_lbw_nevents++; 3374 if ((bus_p->bus_lbw_state & PCIE_LBW_S_DISPATCHED) == 0) { 3375 if ((bus_p->bus_lbw_state & PCIE_LBW_S_RUNNING) == 0) { 3376 taskq_dispatch_ent(pcie_link_tq, pcie_link_bw_taskq, 3377 dip, 0, &bus_p->bus_lbw_ent); 3378 } 3379 3380 bus_p->bus_lbw_state |= PCIE_LBW_S_DISPATCHED; 3381 } 3382 mutex_exit(&bus_p->bus_lbw_mutex); 3383 3384 PCIE_CAP_PUT(16, bus_p, PCIE_LINKSTS, flags); 3385 return (DDI_INTR_CLAIMED); 3386 } 3387 3388 int 3389 pcie_link_set_target(dev_info_t *dip, pcie_link_speed_t speed) 3390 { 3391 uint16_t ctl2, rval; 3392 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3393 3394 if (!PCIE_IS_PCIE(bus_p)) { 3395 return (ENOTSUP); 3396 } 3397 3398 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3399 return (ENOTSUP); 3400 } 3401 3402 switch (speed) { 3403 case PCIE_LINK_SPEED_2_5: 3404 rval = PCIE_LINKCTL2_TARGET_SPEED_2_5; 3405 break; 3406 case PCIE_LINK_SPEED_5: 3407 rval = PCIE_LINKCTL2_TARGET_SPEED_5; 3408 break; 3409 case PCIE_LINK_SPEED_8: 3410 rval = PCIE_LINKCTL2_TARGET_SPEED_8; 3411 break; 3412 case PCIE_LINK_SPEED_16: 3413 rval = PCIE_LINKCTL2_TARGET_SPEED_16; 3414 break; 3415 default: 3416 return (EINVAL); 3417 } 3418 3419 mutex_enter(&bus_p->bus_speed_mutex); 3420 bus_p->bus_target_speed = speed; 3421 bus_p->bus_speed_flags |= PCIE_LINK_F_ADMIN_TARGET; 3422 3423 ctl2 = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL2); 3424 ctl2 &= ~PCIE_LINKCTL2_TARGET_SPEED_MASK; 3425 ctl2 |= rval; 3426 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL2, ctl2); 3427 mutex_exit(&bus_p->bus_speed_mutex); 3428 3429 /* 3430 * Make sure our updates have been reflected in devinfo. 3431 */ 3432 pcie_capture_speeds(dip); 3433 3434 return (0); 3435 } 3436 3437 int 3438 pcie_link_retrain(dev_info_t *dip) 3439 { 3440 uint16_t ctl; 3441 pcie_bus_t *bus_p = PCIE_DIP2BUS(dip); 3442 3443 if (!PCIE_IS_PCIE(bus_p)) { 3444 return (ENOTSUP); 3445 } 3446 3447 if (!PCIE_IS_RP(bus_p) && !PCIE_IS_SWD(bus_p)) { 3448 return (ENOTSUP); 3449 } 3450 3451 /* 3452 * The PCIe specification suggests that we make sure that the link isn't 3453 * in training before issuing this command in case there was a state 3454 * machine transition prior to when we got here. We wait and then go 3455 * ahead and issue the command anyways. 3456 */ 3457 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3458 uint16_t sts; 3459 3460 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3461 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3462 break; 3463 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3464 } 3465 3466 ctl = PCIE_CAP_GET(16, bus_p, PCIE_LINKCTL); 3467 ctl |= PCIE_LINKCTL_RETRAIN_LINK; 3468 PCIE_CAP_PUT(16, bus_p, PCIE_LINKCTL, ctl); 3469 3470 /* 3471 * Wait again to see if it clears before returning to the user. 3472 */ 3473 for (uint32_t i = 0; i < pcie_link_retrain_count; i++) { 3474 uint16_t sts; 3475 3476 sts = PCIE_CAP_GET(16, bus_p, PCIE_LINKSTS); 3477 if ((sts & PCIE_LINKSTS_LINK_TRAINING) == 0) 3478 break; 3479 delay(drv_usectohz(pcie_link_retrain_delay_ms * 1000)); 3480 } 3481 3482 return (0); 3483 } 3484 3485 /* 3486 * Here we're going through and grabbing information about a given PCIe device. 3487 * Our situation is a little bit complicated at this point. This gets invoked 3488 * both during early initialization and during hotplug events. We cannot rely on 3489 * the device node having been fully set up, that is, while the pcie_bus_t 3490 * normally contains a ddi_acc_handle_t for configuration space, that may not be 3491 * valid yet as this can occur before child initialization or we may be dealing 3492 * with a function that will never have a handle. 3493 * 3494 * However, we should always have a fully furnished pcie_bus_t, which means that 3495 * we can get its bdf and use that to access the devices configuration space. 3496 */ 3497 static int 3498 pcie_fabric_feature_scan(dev_info_t *dip, void *arg) 3499 { 3500 pcie_bus_t *bus_p; 3501 uint32_t devcap; 3502 uint16_t mps; 3503 dev_info_t *rcdip; 3504 pcie_fabric_data_t *fab = arg; 3505 3506 /* 3507 * Skip over non-PCIe devices. If we encounter something here, we don't 3508 * bother going through any of its children because we don't have reason 3509 * to believe that a PCIe device that this will impact will exist below 3510 * this. While it is possible that there's a PCIe fabric downstream an 3511 * intermediate old PCI/PCI-X bus, at that point, we'll still trigger 3512 * our complex fabric detection and use the minimums. 3513 * 3514 * The reason this doesn't trigger an immediate flagging as a complex 3515 * case like the one below is because we could be scanning a device that 3516 * is a nexus driver and has children already (albeit that would be 3517 * somewhat surprising as we don't anticipate being called at this 3518 * point). 3519 */ 3520 if (pcie_dev(dip) != DDI_SUCCESS) { 3521 return (DDI_WALK_PRUNECHILD); 3522 } 3523 3524 /* 3525 * If we fail to find a pcie_bus_t for some reason, that's somewhat 3526 * surprising. We log this fact and set the complex flag and indicate it 3527 * was because of this case. This immediately transitions us to a 3528 * "complex" case which means use the minimal, safe, settings. 3529 */ 3530 bus_p = PCIE_DIP2BUS(dip); 3531 if (bus_p == NULL) { 3532 dev_err(dip, CE_WARN, "failed to find associated pcie_bus_t " 3533 "during fabric scan"); 3534 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3535 return (DDI_WALK_TERMINATE); 3536 } 3537 rcdip = pcie_get_rc_dip(dip); 3538 3539 /* 3540 * First, start by determining what the device's tagging and max packet 3541 * size is. All PCIe devices will always have the 8-bit tag information 3542 * as this has existed since PCIe 1.0. 10-bit tagging requires a V2 3543 * PCIe capability. 14-bit requires the DEV3 cap. If we are missing a 3544 * version or capability, then we always treat that as lacking the bits 3545 * in the fabric. 3546 */ 3547 ASSERT3U(bus_p->bus_pcie_off, !=, 0); 3548 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3549 PCIE_DEVCAP); 3550 mps = devcap & PCIE_DEVCAP_MAX_PAYLOAD_MASK; 3551 if (mps < fab->pfd_mps_found) { 3552 fab->pfd_mps_found = mps; 3553 } 3554 3555 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) == 0) { 3556 fab->pfd_tag_found &= ~PCIE_TAG_8B; 3557 } 3558 3559 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0) { 3560 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3561 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3562 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_COMP_SUP) == 0) { 3563 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3564 } 3565 } else { 3566 fab->pfd_tag_found &= ~PCIE_TAG_10B_COMP; 3567 } 3568 3569 if (bus_p->bus_dev3_off != 0) { 3570 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3571 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3572 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_COMP_SUP) == 0) { 3573 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3574 } 3575 } else { 3576 fab->pfd_tag_found &= ~PCIE_TAG_14B_COMP; 3577 } 3578 3579 /* 3580 * Now that we have captured device information, we must go and ask 3581 * questions of the topology here. The big theory statement enumerates 3582 * several types of cases. The big question we need to answer is have we 3583 * encountered a hotpluggable bridge that means we need to mark this as 3584 * complex. 3585 * 3586 * The big theory statement notes several different kinds of hotplug 3587 * topologies that exist that we can theoretically support. Right now we 3588 * opt to keep our lives simple and focus solely on (4) and (5). These 3589 * can both be summarized by a single, fairly straightforward rule: 3590 * 3591 * The only allowed hotpluggable entity is a root port. 3592 * 3593 * The reason that this can work and detect cases like (6), (7), and our 3594 * other invalid ones is that the hotplug code will scan and find all 3595 * children before we are called into here. 3596 */ 3597 if (bus_p->bus_hp_sup_modes != 0) { 3598 /* 3599 * We opt to terminate in this case because there's no value in 3600 * scanning the rest of the tree at this point. 3601 */ 3602 if (!PCIE_IS_RP(bus_p)) { 3603 fab->pfd_flags |= PCIE_FABRIC_F_COMPLEX; 3604 return (DDI_WALK_TERMINATE); 3605 } 3606 3607 fab->pfd_flags |= PCIE_FABRIC_F_RP_HP; 3608 } 3609 3610 /* 3611 * As our walk starts at a root port, we need to make sure that we don't 3612 * pick up any of its siblings and their children as those would be 3613 * different PCIe fabric domains for us to scan. In many hardware 3614 * platforms multiple root ports are all at the same level in the tree. 3615 */ 3616 if (bus_p->bus_rp_dip == dip) { 3617 return (DDI_WALK_PRUNESIB); 3618 } 3619 3620 return (DDI_WALK_CONTINUE); 3621 } 3622 3623 static int 3624 pcie_fabric_feature_set(dev_info_t *dip, void *arg) 3625 { 3626 pcie_bus_t *bus_p; 3627 dev_info_t *rcdip; 3628 pcie_fabric_data_t *fab = arg; 3629 uint32_t devcap, devctl; 3630 3631 if (pcie_dev(dip) != DDI_SUCCESS) { 3632 return (DDI_WALK_PRUNECHILD); 3633 } 3634 3635 /* 3636 * The missing bus_t sent us into the complex case previously. We still 3637 * need to make sure all devices have values we expect here and thus 3638 * don't terminate like the above. 3639 */ 3640 bus_p = PCIE_DIP2BUS(dip); 3641 if (bus_p == NULL) { 3642 return (DDI_WALK_CONTINUE); 3643 } 3644 rcdip = pcie_get_rc_dip(dip); 3645 3646 devcap = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3647 PCIE_DEVCAP); 3648 devctl = pci_cfgacc_get16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3649 PCIE_DEVCTL); 3650 3651 if ((devcap & PCIE_DEVCAP_EXT_TAG_8BIT) != 0 && 3652 (fab->pfd_tag_act & PCIE_TAG_8B) != 0) { 3653 devctl |= PCIE_DEVCTL_EXT_TAG_FIELD_EN; 3654 } 3655 3656 devctl &= ~PCIE_DEVCTL_MAX_PAYLOAD_MASK; 3657 ASSERT0(fab->pfd_mps_act & ~PCIE_DEVCAP_MAX_PAYLOAD_MASK); 3658 devctl |= fab->pfd_mps_act << PCIE_DEVCTL_MAX_PAYLOAD_SHIFT; 3659 3660 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, bus_p->bus_pcie_off + 3661 PCIE_DEVCTL, devctl); 3662 3663 if (bus_p->bus_pcie_vers == PCIE_PCIECAP_VER_2_0 && 3664 (fab->pfd_tag_act & PCIE_TAG_10B_COMP) != 0) { 3665 uint32_t devcap2 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3666 bus_p->bus_pcie_off + PCIE_DEVCAP2); 3667 3668 if ((devcap2 & PCIE_DEVCAP2_10B_TAG_REQ_SUP) == 0) { 3669 uint16_t devctl2 = pci_cfgacc_get16(rcdip, 3670 bus_p->bus_bdf, bus_p->bus_pcie_off + PCIE_DEVCTL2); 3671 devctl2 |= PCIE_DEVCTL2_10B_TAG_REQ_EN; 3672 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3673 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl2); 3674 } 3675 } 3676 3677 if (bus_p->bus_dev3_off != 0 && 3678 (fab->pfd_tag_act & PCIE_TAG_14B_COMP) != 0) { 3679 uint32_t devcap3 = pci_cfgacc_get32(rcdip, bus_p->bus_bdf, 3680 bus_p->bus_dev3_off + PCIE_DEVCAP3); 3681 3682 if ((devcap3 & PCIE_DEVCAP3_14B_TAG_REQ_SUP) == 0) { 3683 uint16_t devctl3 = pci_cfgacc_get16(rcdip, 3684 bus_p->bus_bdf, bus_p->bus_dev3_off + PCIE_DEVCTL3); 3685 devctl3 |= PCIE_DEVCTL3_14B_TAG_REQ_EN; 3686 pci_cfgacc_put16(rcdip, bus_p->bus_bdf, 3687 bus_p->bus_pcie_off + PCIE_DEVCTL2, devctl3); 3688 } 3689 } 3690 3691 /* 3692 * As our walk starts at a root port, we need to make sure that we don't 3693 * pick up any of its siblings and their children as those would be 3694 * different PCIe fabric domains for us to scan. In many hardware 3695 * platforms multiple root ports are all at the same level in the tree. 3696 */ 3697 if (bus_p->bus_rp_dip == dip) { 3698 return (DDI_WALK_PRUNESIB); 3699 } 3700 3701 return (DDI_WALK_CONTINUE); 3702 } 3703 3704 /* 3705 * This is used to scan and determine the total set of PCIe fabric settings that 3706 * we should have in the system for everything downstream of this specified root 3707 * port. Note, it is only really safe to call this while working from the 3708 * perspective of a root port as we will be walking down the entire device tree. 3709 * 3710 * However, our callers, particularly hoptlug, don't have all the information 3711 * we'd like. In particular, we need to check that: 3712 * 3713 * o This is actually a PCIe device. 3714 * o That this is a root port (see the big theory statement to understand this 3715 * constraint). 3716 */ 3717 void 3718 pcie_fabric_setup(dev_info_t *dip) 3719 { 3720 pcie_bus_t *bus_p; 3721 pcie_fabric_data_t *fab; 3722 dev_info_t *pdip; 3723 int circular_count; 3724 3725 bus_p = PCIE_DIP2BUS(dip); 3726 if (bus_p == NULL || !PCIE_IS_RP(bus_p)) { 3727 return; 3728 } 3729 3730 VERIFY3P(bus_p->bus_fab, !=, NULL); 3731 fab = bus_p->bus_fab; 3732 3733 /* 3734 * For us to call ddi_walk_devs(), our parent needs to be held. 3735 * ddi_walk_devs() will take care of grabbing our dip as part of its 3736 * walk before we iterate over our children. 3737 * 3738 * A reasonable question to ask here is why is it safe to ask for our 3739 * parent? In this case, because we have entered here through some 3740 * thread that's operating on us whether as part of attach or a hotplug 3741 * event, our dip somewhat by definition has to be valid. If we were 3742 * looking at our dip's children and then asking them for a parent, then 3743 * that would be a race condition. 3744 */ 3745 pdip = ddi_get_parent(dip); 3746 VERIFY3P(pdip, !=, NULL); 3747 ndi_devi_enter(pdip, &circular_count); 3748 fab->pfd_flags |= PCIE_FABRIC_F_SCANNING; 3749 3750 /* 3751 * Reinitialize the tracking structure to basically set the maximum 3752 * caps. These will be chipped away during the scan. 3753 */ 3754 fab->pfd_mps_found = PCIE_DEVCAP_MAX_PAYLOAD_4096; 3755 fab->pfd_tag_found = PCIE_TAG_ALL; 3756 fab->pfd_flags &= ~PCIE_FABRIC_F_COMPLEX; 3757 3758 ddi_walk_devs(dip, pcie_fabric_feature_scan, fab); 3759 3760 if ((fab->pfd_flags & PCIE_FABRIC_F_COMPLEX) != 0) { 3761 fab->pfd_tag_act = PCIE_TAG_5B; 3762 fab->pfd_mps_act = PCIE_DEVCAP_MAX_PAYLOAD_128; 3763 } else { 3764 fab->pfd_tag_act = fab->pfd_tag_found; 3765 fab->pfd_mps_act = fab->pfd_mps_found; 3766 } 3767 3768 ddi_walk_devs(dip, pcie_fabric_feature_set, fab); 3769 3770 fab->pfd_flags &= ~PCIE_FABRIC_F_SCANNING; 3771 ndi_devi_exit(pdip, circular_count); 3772 } 3773