1========================= 2CPU hotplug in the Kernel 3========================= 4 5:Date: September, 2021 6:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, 7 Rusty Russell <rusty@rustcorp.com.au>, 8 Srivatsa Vaddagiri <vatsa@in.ibm.com>, 9 Ashok Raj <ashok.raj@intel.com>, 10 Joel Schopp <jschopp@austin.ibm.com>, 11 Thomas Gleixner <tglx@kernel.org> 12 13Introduction 14============ 15 16Modern advances in system architectures have introduced advanced error 17reporting and correction capabilities in processors. There are couple OEMS that 18support NUMA hardware which are hot pluggable as well, where physical node 19insertion and removal require support for CPU hotplug. 20 21Such advances require CPUs available to a kernel to be removed either for 22provisioning reasons, or for RAS purposes to keep an offending CPU off 23system execution path. Hence the need for CPU hotplug support in the 24Linux kernel. 25 26A more novel use of CPU-hotplug support is its use today in suspend resume 27support for SMP. Dual-core and HT support makes even a laptop run SMP kernels 28which didn't support these methods. 29 30 31Command Line Switches 32===================== 33``maxcpus=n`` 34 Restrict boot time CPUs to *n*. Say if you have four CPUs, using 35 ``maxcpus=2`` will only boot two. You can choose to bring the 36 other CPUs later online. 37 38``nr_cpus=n`` 39 Restrict the total amount of CPUs the kernel will support. If the number 40 supplied here is lower than the number of physically available CPUs, then 41 those CPUs can not be brought online later. 42 43``possible_cpus=n`` 44 This option sets ``possible_cpus`` bits in ``cpu_possible_mask``. 45 46 This option is limited to the X86 and S390 architecture. 47 48CPU maps 49======== 50 51``cpu_possible_mask`` 52 Bitmap of possible CPUs that can ever be available in the 53 system. This is used to allocate some boot time memory for per_cpu variables 54 that aren't designed to grow/shrink as CPUs are made available or removed. 55 Once set during boot time discovery phase, the map is static, i.e no bits 56 are added or removed anytime. Trimming it accurately for your system needs 57 upfront can save some boot time memory. 58 59``cpu_online_mask`` 60 Bitmap of all CPUs currently online. Its set in ``__cpu_up()`` 61 after a CPU is available for kernel scheduling and ready to receive 62 interrupts from devices. Its cleared when a CPU is brought down using 63 ``__cpu_disable()``, before which all OS services including interrupts are 64 migrated to another target CPU. 65 66``cpu_present_mask`` 67 Bitmap of CPUs currently present in the system. Not all 68 of them may be online. When physical hotplug is processed by the relevant 69 subsystem (e.g ACPI) can change and new bit either be added or removed 70 from the map depending on the event is hot-add/hot-remove. There are currently 71 no locking rules as of now. Typical usage is to init topology during boot, 72 at which time hotplug is disabled. 73 74You really don't need to manipulate any of the system CPU maps. They should 75be read-only for most use. When setting up per-cpu resources almost always use 76``cpu_possible_mask`` or ``for_each_possible_cpu()`` to iterate. To macro 77``for_each_cpu()`` can be used to iterate over a custom CPU mask. 78 79Never use anything other than ``cpumask_t`` to represent bitmap of CPUs. 80 81 82Using CPU hotplug 83================= 84 85The kernel option *CONFIG_HOTPLUG_CPU* needs to be enabled. It is currently 86available on multiple architectures including ARM, MIPS, PowerPC and X86. The 87configuration is done via the sysfs interface:: 88 89 $ ls -lh /sys/devices/system/cpu 90 total 0 91 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu0 92 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu1 93 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu2 94 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu3 95 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu4 96 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu5 97 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu6 98 drwxr-xr-x 9 root root 0 Dec 21 16:33 cpu7 99 drwxr-xr-x 2 root root 0 Dec 21 16:33 hotplug 100 -r--r--r-- 1 root root 4.0K Dec 21 16:33 offline 101 -r--r--r-- 1 root root 4.0K Dec 21 16:33 online 102 -r--r--r-- 1 root root 4.0K Dec 21 16:33 possible 103 -r--r--r-- 1 root root 4.0K Dec 21 16:33 present 104 105The files *offline*, *online*, *possible*, *present* represent the CPU masks. 106Each CPU folder contains an *online* file which controls the logical on (1) and 107off (0) state. To logically shutdown CPU4:: 108 109 $ echo 0 > /sys/devices/system/cpu/cpu4/online 110 smpboot: CPU 4 is now offline 111 112Once the CPU is shutdown, it will be removed from */proc/interrupts*, 113*/proc/cpuinfo* and should also not be shown visible by the *top* command. To 114bring CPU4 back online:: 115 116 $ echo 1 > /sys/devices/system/cpu/cpu4/online 117 smpboot: Booting Node 0 Processor 4 APIC 0x1 118 119The CPU is usable again. This should work on all CPUs, but CPU0 is often special 120and excluded from CPU hotplug. 121 122The CPU hotplug coordination 123============================ 124 125The offline case 126---------------- 127 128Once a CPU has been logically shutdown the teardown callbacks of registered 129hotplug states will be invoked, starting with ``CPUHP_ONLINE`` and terminating 130at state ``CPUHP_OFFLINE``. This includes: 131 132* If tasks are frozen due to a suspend operation then *cpuhp_tasks_frozen* 133 will be set to true. 134* All processes are migrated away from this outgoing CPU to new CPUs. 135 The new CPU is chosen from each process' current cpuset, which may be 136 a subset of all online CPUs. 137* All interrupts targeted to this CPU are migrated to a new CPU 138* timers are also migrated to a new CPU 139* Once all services are migrated, kernel calls an arch specific routine 140 ``__cpu_disable()`` to perform arch specific cleanup. 141 142 143The CPU hotplug API 144=================== 145 146CPU hotplug state machine 147------------------------- 148 149CPU hotplug uses a trivial state machine with a linear state space from 150CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown 151callback. 152 153When a CPU is onlined, the startup callbacks are invoked sequentially until 154the state CPUHP_ONLINE is reached. They can also be invoked when the 155callbacks of a state are set up or an instance is added to a multi-instance 156state. 157 158When a CPU is offlined the teardown callbacks are invoked in the reverse 159order sequentially until the state CPUHP_OFFLINE is reached. They can also 160be invoked when the callbacks of a state are removed or an instance is 161removed from a multi-instance state. 162 163If a usage site requires only a callback in one direction of the hotplug 164operations (CPU online or CPU offline) then the other not-required callback 165can be set to NULL when the state is set up. 166 167The state space is divided into three sections: 168 169* The PREPARE section 170 171 The PREPARE section covers the state space from CPUHP_OFFLINE to 172 CPUHP_BRINGUP_CPU. 173 174 The startup callbacks in this section are invoked before the CPU is 175 started during a CPU online operation. The teardown callbacks are invoked 176 after the CPU has become dysfunctional during a CPU offline operation. 177 178 The callbacks are invoked on a control CPU as they can't obviously run on 179 the hotplugged CPU which is either not yet started or has become 180 dysfunctional already. 181 182 The startup callbacks are used to setup resources which are required to 183 bring a CPU successfully online. The teardown callbacks are used to free 184 resources or to move pending work to an online CPU after the hotplugged 185 CPU became dysfunctional. 186 187 The startup callbacks are allowed to fail. If a callback fails, the CPU 188 online operation is aborted and the CPU is brought down to the previous 189 state (usually CPUHP_OFFLINE) again. 190 191 The teardown callbacks in this section are not allowed to fail. 192 193* The STARTING section 194 195 The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1 196 and CPUHP_AP_ONLINE. 197 198 The startup callbacks in this section are invoked on the hotplugged CPU 199 with interrupts disabled during a CPU online operation in the early CPU 200 setup code. The teardown callbacks are invoked with interrupts disabled 201 on the hotplugged CPU during a CPU offline operation shortly before the 202 CPU is completely shut down. 203 204 The callbacks in this section are not allowed to fail. 205 206 The callbacks are used for low level hardware initialization/shutdown and 207 for core subsystems. 208 209* The ONLINE section 210 211 The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and 212 CPUHP_ONLINE. 213 214 The startup callbacks in this section are invoked on the hotplugged CPU 215 during a CPU online operation. The teardown callbacks are invoked on the 216 hotplugged CPU during a CPU offline operation. 217 218 The callbacks are invoked in the context of the per CPU hotplug thread, 219 which is pinned on the hotplugged CPU. The callbacks are invoked with 220 interrupts and preemption enabled. 221 222 The callbacks are allowed to fail. When a callback fails the hotplug 223 operation is aborted and the CPU is brought back to the previous state. 224 225CPU online/offline operations 226----------------------------- 227 228A successful online operation looks like this:: 229 230 [CPUHP_OFFLINE] 231 [CPUHP_OFFLINE + 1]->startup() -> success 232 [CPUHP_OFFLINE + 2]->startup() -> success 233 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 234 ... 235 [CPUHP_BRINGUP_CPU]->startup() -> success 236 === End of PREPARE section 237 [CPUHP_BRINGUP_CPU + 1]->startup() -> success 238 ... 239 [CPUHP_AP_ONLINE]->startup() -> success 240 === End of STARTUP section 241 [CPUHP_AP_ONLINE + 1]->startup() -> success 242 ... 243 [CPUHP_ONLINE - 1]->startup() -> success 244 [CPUHP_ONLINE] 245 246A successful offline operation looks like this:: 247 248 [CPUHP_ONLINE] 249 [CPUHP_ONLINE - 1]->teardown() -> success 250 ... 251 [CPUHP_AP_ONLINE + 1]->teardown() -> success 252 === Start of STARTUP section 253 [CPUHP_AP_ONLINE]->teardown() -> success 254 ... 255 [CPUHP_BRINGUP_ONLINE - 1]->teardown() 256 ... 257 === Start of PREPARE section 258 [CPUHP_BRINGUP_CPU]->teardown() 259 [CPUHP_OFFLINE + 3]->teardown() 260 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 261 [CPUHP_OFFLINE + 1]->teardown() 262 [CPUHP_OFFLINE] 263 264A failed online operation looks like this:: 265 266 [CPUHP_OFFLINE] 267 [CPUHP_OFFLINE + 1]->startup() -> success 268 [CPUHP_OFFLINE + 2]->startup() -> success 269 [CPUHP_OFFLINE + 3] -> skipped because startup == NULL 270 ... 271 [CPUHP_BRINGUP_CPU]->startup() -> success 272 === End of PREPARE section 273 [CPUHP_BRINGUP_CPU + 1]->startup() -> success 274 ... 275 [CPUHP_AP_ONLINE]->startup() -> success 276 === End of STARTUP section 277 [CPUHP_AP_ONLINE + 1]->startup() -> success 278 --- 279 [CPUHP_AP_ONLINE + N]->startup() -> fail 280 [CPUHP_AP_ONLINE + (N - 1)]->teardown() 281 ... 282 [CPUHP_AP_ONLINE + 1]->teardown() 283 === Start of STARTUP section 284 [CPUHP_AP_ONLINE]->teardown() 285 ... 286 [CPUHP_BRINGUP_ONLINE - 1]->teardown() 287 ... 288 === Start of PREPARE section 289 [CPUHP_BRINGUP_CPU]->teardown() 290 [CPUHP_OFFLINE + 3]->teardown() 291 [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL 292 [CPUHP_OFFLINE + 1]->teardown() 293 [CPUHP_OFFLINE] 294 295A failed offline operation looks like this:: 296 297 [CPUHP_ONLINE] 298 [CPUHP_ONLINE - 1]->teardown() -> success 299 ... 300 [CPUHP_ONLINE - N]->teardown() -> fail 301 [CPUHP_ONLINE - (N - 1)]->startup() 302 ... 303 [CPUHP_ONLINE - 1]->startup() 304 [CPUHP_ONLINE] 305 306Recursive failures cannot be handled sensibly. Look at the following 307example of a recursive fail due to a failed offline operation: :: 308 309 [CPUHP_ONLINE] 310 [CPUHP_ONLINE - 1]->teardown() -> success 311 ... 312 [CPUHP_ONLINE - N]->teardown() -> fail 313 [CPUHP_ONLINE - (N - 1)]->startup() -> success 314 [CPUHP_ONLINE - (N - 2)]->startup() -> fail 315 316The CPU hotplug state machine stops right here and does not try to go back 317down again because that would likely result in an endless loop:: 318 319 [CPUHP_ONLINE - (N - 1)]->teardown() -> success 320 [CPUHP_ONLINE - N]->teardown() -> fail 321 [CPUHP_ONLINE - (N - 1)]->startup() -> success 322 [CPUHP_ONLINE - (N - 2)]->startup() -> fail 323 [CPUHP_ONLINE - (N - 1)]->teardown() -> success 324 [CPUHP_ONLINE - N]->teardown() -> fail 325 326Lather, rinse and repeat. In this case the CPU left in state:: 327 328 [CPUHP_ONLINE - (N - 1)] 329 330which at least lets the system make progress and gives the user a chance to 331debug or even resolve the situation. 332 333Allocating a state 334------------------ 335 336There are two ways to allocate a CPU hotplug state: 337 338* Static allocation 339 340 Static allocation has to be used when the subsystem or driver has 341 ordering requirements versus other CPU hotplug states. E.g. the PERF core 342 startup callback has to be invoked before the PERF driver startup 343 callbacks during a CPU online operation. During a CPU offline operation 344 the driver teardown callbacks have to be invoked before the core teardown 345 callback. The statically allocated states are described by constants in 346 the cpuhp_state enum which can be found in include/linux/cpuhotplug.h. 347 348 Insert the state into the enum at the proper place so the ordering 349 requirements are fulfilled. The state constant has to be used for state 350 setup and removal. 351 352 Static allocation is also required when the state callbacks are not set 353 up at runtime and are part of the initializer of the CPU hotplug state 354 array in kernel/cpu.c. 355 356* Dynamic allocation 357 358 When there are no ordering requirements for the state callbacks then 359 dynamic allocation is the preferred method. The state number is allocated 360 by the setup function and returned to the caller on success. 361 362 Only the PREPARE and ONLINE sections provide a dynamic allocation 363 range. The STARTING section does not as most of the callbacks in that 364 section have explicit ordering requirements. 365 366Setup of a CPU hotplug state 367---------------------------- 368 369The core code provides the following functions to setup a state: 370 371* cpuhp_setup_state(state, name, startup, teardown) 372* cpuhp_setup_state_nocalls(state, name, startup, teardown) 373* cpuhp_setup_state_cpuslocked(state, name, startup, teardown) 374* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown) 375 376For cases where a driver or a subsystem has multiple instances and the same 377CPU hotplug state callbacks need to be invoked for each instance, the CPU 378hotplug core provides multi-instance support. The advantage over driver 379specific instance lists is that the instance related functions are fully 380serialized against CPU hotplug operations and provide the automatic 381invocations of the state callbacks on add and removal. To set up such a 382multi-instance state the following function is available: 383 384* cpuhp_setup_state_multi(state, name, startup, teardown) 385 386The @state argument is either a statically allocated state or one of the 387constants for dynamically allocated states - CPUHP_BP_PREPARE_DYN, 388CPUHP_AP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for 389which a dynamic state should be allocated. 390 391The @name argument is used for sysfs output and for instrumentation. The 392naming convention is "subsys:mode" or "subsys/driver:mode", 393e.g. "perf:mode" or "perf/x86:mode". The common mode names are: 394 395======== ======================================================= 396prepare For states in the PREPARE section 397 398dead For states in the PREPARE section which do not provide 399 a startup callback 400 401starting For states in the STARTING section 402 403dying For states in the STARTING section which do not provide 404 a startup callback 405 406online For states in the ONLINE section 407 408offline For states in the ONLINE section which do not provide 409 a startup callback 410======== ======================================================= 411 412As the @name argument is only used for sysfs and instrumentation other mode 413descriptors can be used as well if they describe the nature of the state 414better than the common ones. 415 416Examples for @name arguments: "perf/online", "perf/x86:prepare", 417"RCU/tree:dying", "sched/waitempty" 418 419The @startup argument is a function pointer to the callback which should be 420invoked during a CPU online operation. If the usage site does not require a 421startup callback set the pointer to NULL. 422 423The @teardown argument is a function pointer to the callback which should 424be invoked during a CPU offline operation. If the usage site does not 425require a teardown callback set the pointer to NULL. 426 427The functions differ in the way how the installed callbacks are treated: 428 429 * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked() 430 and cpuhp_setup_state_multi() only install the callbacks 431 432 * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the 433 callbacks and invoke the @startup callback (if not NULL) for all online 434 CPUs which have currently a state greater than the newly installed 435 state. Depending on the state section the callback is either invoked on 436 the current CPU (PREPARE section) or on each online CPU (ONLINE 437 section) in the context of the CPU's hotplug thread. 438 439 If a callback fails for CPU N then the teardown callback for CPU 440 0 .. N-1 is invoked to rollback the operation. The state setup fails, 441 the callbacks for the state are not installed and in case of dynamic 442 allocation the allocated state is freed. 443 444The state setup and the callback invocations are serialized against CPU 445hotplug operations. If the setup function has to be called from a CPU 446hotplug read locked region, then the _cpuslocked() variants have to be 447used. These functions cannot be used from within CPU hotplug callbacks. 448 449The function return values: 450 ======== =================================================================== 451 0 Statically allocated state was successfully set up 452 453 >0 Dynamically allocated state was successfully set up. 454 455 The returned number is the state number which was allocated. If 456 the state callbacks have to be removed later, e.g. module 457 removal, then this number has to be saved by the caller and used 458 as @state argument for the state remove function. For 459 multi-instance states the dynamically allocated state number is 460 also required as @state argument for the instance add/remove 461 operations. 462 463 <0 Operation failed 464 ======== =================================================================== 465 466Removal of a CPU hotplug state 467------------------------------ 468 469To remove a previously set up state, the following functions are provided: 470 471* cpuhp_remove_state(state) 472* cpuhp_remove_state_nocalls(state) 473* cpuhp_remove_state_nocalls_cpuslocked(state) 474* cpuhp_remove_multi_state(state) 475 476The @state argument is either a statically allocated state or the state 477number which was allocated in the dynamic range by cpuhp_setup_state*(). If 478the state is in the dynamic range, then the state number is freed and 479available for dynamic allocation again. 480 481The functions differ in the way how the installed callbacks are treated: 482 483 * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked() 484 and cpuhp_remove_multi_state() only remove the callbacks. 485 486 * cpuhp_remove_state() removes the callbacks and invokes the teardown 487 callback (if not NULL) for all online CPUs which have currently a state 488 greater than the removed state. Depending on the state section the 489 callback is either invoked on the current CPU (PREPARE section) or on 490 each online CPU (ONLINE section) in the context of the CPU's hotplug 491 thread. 492 493 In order to complete the removal, the teardown callback should not fail. 494 495The state removal and the callback invocations are serialized against CPU 496hotplug operations. If the remove function has to be called from a CPU 497hotplug read locked region, then the _cpuslocked() variants have to be 498used. These functions cannot be used from within CPU hotplug callbacks. 499 500If a multi-instance state is removed then the caller has to remove all 501instances first. 502 503Multi-Instance state instance management 504---------------------------------------- 505 506Once the multi-instance state is set up, instances can be added to the 507state: 508 509 * cpuhp_state_add_instance(state, node) 510 * cpuhp_state_add_instance_nocalls(state, node) 511 512The @state argument is either a statically allocated state or the state 513number which was allocated in the dynamic range by cpuhp_setup_state_multi(). 514 515The @node argument is a pointer to an hlist_node which is embedded in the 516instance's data structure. The pointer is handed to the multi-instance 517state callbacks and can be used by the callback to retrieve the instance 518via container_of(). 519 520The functions differ in the way how the installed callbacks are treated: 521 522 * cpuhp_state_add_instance_nocalls() and only adds the instance to the 523 multi-instance state's node list. 524 525 * cpuhp_state_add_instance() adds the instance and invokes the startup 526 callback (if not NULL) associated with @state for all online CPUs which 527 have currently a state greater than @state. The callback is only 528 invoked for the to be added instance. Depending on the state section 529 the callback is either invoked on the current CPU (PREPARE section) or 530 on each online CPU (ONLINE section) in the context of the CPU's hotplug 531 thread. 532 533 If a callback fails for CPU N then the teardown callback for CPU 534 0 .. N-1 is invoked to rollback the operation, the function fails and 535 the instance is not added to the node list of the multi-instance state. 536 537To remove an instance from the state's node list these functions are 538available: 539 540 * cpuhp_state_remove_instance(state, node) 541 * cpuhp_state_remove_instance_nocalls(state, node) 542 543The arguments are the same as for the cpuhp_state_add_instance*() 544variants above. 545 546The functions differ in the way how the installed callbacks are treated: 547 548 * cpuhp_state_remove_instance_nocalls() only removes the instance from the 549 state's node list. 550 551 * cpuhp_state_remove_instance() removes the instance and invokes the 552 teardown callback (if not NULL) associated with @state for all online 553 CPUs which have currently a state greater than @state. The callback is 554 only invoked for the to be removed instance. Depending on the state 555 section the callback is either invoked on the current CPU (PREPARE 556 section) or on each online CPU (ONLINE section) in the context of the 557 CPU's hotplug thread. 558 559 In order to complete the removal, the teardown callback should not fail. 560 561The node list add/remove operations and the callback invocations are 562serialized against CPU hotplug operations. These functions cannot be used 563from within CPU hotplug callbacks and CPU hotplug read locked regions. 564 565Examples 566-------- 567 568Setup and teardown a statically allocated state in the STARTING section for 569notifications on online and offline operations:: 570 571 ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying); 572 if (ret < 0) 573 return ret; 574 .... 575 cpuhp_remove_state(CPUHP_SUBSYS_STARTING); 576 577Setup and teardown a dynamically allocated state in the ONLINE section 578for notifications on offline operations:: 579 580 state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); 581 if (state < 0) 582 return state; 583 .... 584 cpuhp_remove_state(state); 585 586Setup and teardown a dynamically allocated state in the ONLINE section 587for notifications on online operations without invoking the callbacks:: 588 589 state = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); 590 if (state < 0) 591 return state; 592 .... 593 cpuhp_remove_state_nocalls(state); 594 595Setup, use and teardown a dynamically allocated multi-instance state in the 596ONLINE section for notifications on online and offline operation:: 597 598 state = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); 599 if (state < 0) 600 return state; 601 .... 602 ret = cpuhp_state_add_instance(state, &inst1->node); 603 if (ret) 604 return ret; 605 .... 606 ret = cpuhp_state_add_instance(state, &inst2->node); 607 if (ret) 608 return ret; 609 .... 610 cpuhp_remove_instance(state, &inst1->node); 611 .... 612 cpuhp_remove_instance(state, &inst2->node); 613 .... 614 cpuhp_remove_multi_state(state); 615 616 617Testing of hotplug states 618========================= 619 620One way to verify whether a custom state is working as expected or not is to 621shutdown a CPU and then put it online again. It is also possible to put the CPU 622to certain state (for instance *CPUHP_AP_ONLINE*) and then go back to 623*CPUHP_ONLINE*. This would simulate an error one state after *CPUHP_AP_ONLINE* 624which would lead to rollback to the online state. 625 626All registered states are enumerated in ``/sys/devices/system/cpu/hotplug/states`` :: 627 628 $ tail /sys/devices/system/cpu/hotplug/states 629 138: mm/vmscan:online 630 139: mm/vmstat:online 631 140: lib/percpu_cnt:online 632 141: acpi/cpu-drv:online 633 142: base/cacheinfo:online 634 143: virtio/net:online 635 144: x86/mce:online 636 145: printk:online 637 168: sched:active 638 169: online 639 640To rollback CPU4 to ``lib/percpu_cnt:online`` and back online just issue:: 641 642 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 643 169 644 $ echo 140 > /sys/devices/system/cpu/cpu4/hotplug/target 645 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 646 140 647 648It is important to note that the teardown callback of state 140 have been 649invoked. And now get back online:: 650 651 $ echo 169 > /sys/devices/system/cpu/cpu4/hotplug/target 652 $ cat /sys/devices/system/cpu/cpu4/hotplug/state 653 169 654 655With trace events enabled, the individual steps are visible, too:: 656 657 # TASK-PID CPU# TIMESTAMP FUNCTION 658 # | | | | | 659 bash-394 [001] 22.976: cpuhp_enter: cpu: 0004 target: 140 step: 169 (cpuhp_kick_ap_work) 660 cpuhp/4-31 [004] 22.977: cpuhp_enter: cpu: 0004 target: 140 step: 168 (sched_cpu_deactivate) 661 cpuhp/4-31 [004] 22.990: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 662 cpuhp/4-31 [004] 22.991: cpuhp_enter: cpu: 0004 target: 140 step: 144 (mce_cpu_pre_down) 663 cpuhp/4-31 [004] 22.992: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 664 cpuhp/4-31 [004] 22.993: cpuhp_multi_enter: cpu: 0004 target: 140 step: 143 (virtnet_cpu_down_prep) 665 cpuhp/4-31 [004] 22.994: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 666 cpuhp/4-31 [004] 22.995: cpuhp_enter: cpu: 0004 target: 140 step: 142 (cacheinfo_cpu_pre_down) 667 cpuhp/4-31 [004] 22.996: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 668 bash-394 [001] 22.997: cpuhp_exit: cpu: 0004 state: 140 step: 169 ret: 0 669 bash-394 [005] 95.540: cpuhp_enter: cpu: 0004 target: 169 step: 140 (cpuhp_kick_ap_work) 670 cpuhp/4-31 [004] 95.541: cpuhp_enter: cpu: 0004 target: 169 step: 141 (acpi_soft_cpu_online) 671 cpuhp/4-31 [004] 95.542: cpuhp_exit: cpu: 0004 state: 141 step: 141 ret: 0 672 cpuhp/4-31 [004] 95.543: cpuhp_enter: cpu: 0004 target: 169 step: 142 (cacheinfo_cpu_online) 673 cpuhp/4-31 [004] 95.544: cpuhp_exit: cpu: 0004 state: 142 step: 142 ret: 0 674 cpuhp/4-31 [004] 95.545: cpuhp_multi_enter: cpu: 0004 target: 169 step: 143 (virtnet_cpu_online) 675 cpuhp/4-31 [004] 95.546: cpuhp_exit: cpu: 0004 state: 143 step: 143 ret: 0 676 cpuhp/4-31 [004] 95.547: cpuhp_enter: cpu: 0004 target: 169 step: 144 (mce_cpu_online) 677 cpuhp/4-31 [004] 95.548: cpuhp_exit: cpu: 0004 state: 144 step: 144 ret: 0 678 cpuhp/4-31 [004] 95.549: cpuhp_enter: cpu: 0004 target: 169 step: 145 (console_cpu_notify) 679 cpuhp/4-31 [004] 95.550: cpuhp_exit: cpu: 0004 state: 145 step: 145 ret: 0 680 cpuhp/4-31 [004] 95.551: cpuhp_enter: cpu: 0004 target: 169 step: 168 (sched_cpu_activate) 681 cpuhp/4-31 [004] 95.552: cpuhp_exit: cpu: 0004 state: 168 step: 168 ret: 0 682 bash-394 [005] 95.553: cpuhp_exit: cpu: 0004 state: 169 step: 140 ret: 0 683 684As it an be seen, CPU4 went down until timestamp 22.996 and then back up until 68595.552. All invoked callbacks including their return codes are visible in the 686trace. 687 688Architecture's requirements 689=========================== 690 691The following functions and configurations are required: 692 693``CONFIG_HOTPLUG_CPU`` 694 This entry needs to be enabled in Kconfig 695 696``__cpu_up()`` 697 Arch interface to bring up a CPU 698 699``__cpu_disable()`` 700 Arch interface to shutdown a CPU, no more interrupts can be handled by the 701 kernel after the routine returns. This includes the shutdown of the timer. 702 703``__cpu_die()`` 704 This actually supposed to ensure death of the CPU. Actually look at some 705 example code in other arch that implement CPU hotplug. The processor is taken 706 down from the ``idle()`` loop for that specific architecture. ``__cpu_die()`` 707 typically waits for some per_cpu state to be set, to ensure the processor dead 708 routine is called to be sure positively. 709 710User Space Notification 711======================= 712 713After CPU successfully onlined or offline udev events are sent. A udev rule like:: 714 715 SUBSYSTEM=="cpu", DRIVERS=="processor", DEVPATH=="/devices/system/cpu/*", RUN+="the_hotplug_receiver.sh" 716 717will receive all events. A script like:: 718 719 #!/bin/sh 720 721 if [ "${ACTION}" = "offline" ] 722 then 723 echo "CPU ${DEVPATH##*/} offline" 724 725 elif [ "${ACTION}" = "online" ] 726 then 727 echo "CPU ${DEVPATH##*/} online" 728 729 fi 730 731can process the event further. 732 733When changes to the CPUs in the system occur, the sysfs file 734/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel 735updates the kdump capture kernel list of CPUs itself (via elfcorehdr and 736other relevant kexec segment), or '0' if userspace must update the kdump 737capture kernel list of CPUs. 738 739The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration 740option. 741 742To skip userspace processing of CPU hot un/plug events for kdump 743(i.e. the unload-then-reload to obtain a current list of CPUs), this sysfs 744file can be used in a udev rule as follows: 745 746 SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" 747 748For a CPU hot un/plug event, if the architecture supports kernel updates 749of the elfcorehdr (which contains the list of CPUs) and other relevant 750kexec segments, then the rule skips the unload-then-reload of the kdump 751capture kernel. 752 753Kernel Inline Documentations Reference 754====================================== 755 756.. kernel-doc:: include/linux/cpuhotplug.h 757