1.. _memory_hotplug: 2 3============== 4Memory hotplug 5============== 6 7Memory hotplug event notifier 8============================= 9 10Hotplugging events are sent to a notification queue. 11 12Memory notifier 13---------------- 14 15There are six types of notification defined in ``include/linux/memory.h``: 16 17MEM_GOING_ONLINE 18 Generated before new memory becomes available in order to be able to 19 prepare subsystems to handle memory. The page allocator is still unable 20 to allocate from the new memory. 21 22MEM_CANCEL_ONLINE 23 Generated if MEM_GOING_ONLINE fails. 24 25MEM_ONLINE 26 Generated when memory has successfully brought online. The callback may 27 allocate pages from the new memory. 28 29MEM_GOING_OFFLINE 30 Generated to begin the process of offlining memory. Allocations are no 31 longer possible from the memory but some of the memory to be offlined 32 is still in use. The callback can be used to free memory known to a 33 subsystem from the indicated memory block. 34 35MEM_CANCEL_OFFLINE 36 Generated if MEM_GOING_OFFLINE fails. Memory is available again from 37 the memory block that we attempted to offline. 38 39MEM_OFFLINE 40 Generated after offlining memory is complete. 41 42A callback routine can be registered by calling:: 43 44 hotplug_memory_notifier(callback_func, priority) 45 46Callback functions with higher values of priority are called before callback 47functions with lower values. 48 49A callback function must have the following prototype:: 50 51 int callback_func( 52 struct notifier_block *self, unsigned long action, void *arg); 53 54The first argument of the callback function (self) is a pointer to the block 55of the notifier chain that points to the callback function itself. 56The second argument (action) is one of the event types described above. 57The third argument (arg) passes a pointer of struct memory_notify:: 58 59 struct memory_notify { 60 unsigned long start_pfn; 61 unsigned long nr_pages; 62 } 63 64- start_pfn is start_pfn of online/offline memory. 65- nr_pages is # of pages of online/offline memory. 66 67It is possible to get notified for MEM_CANCEL_ONLINE without having been notified 68for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and 69MEM_GOING_OFFLINE. 70This can happen when a consumer fails, meaning we break the callchain and we 71stop calling the remaining consumers of the notifier. 72It is then important that users of memory_notify make no assumptions and get 73prepared to handle such cases. 74 75The callback routine shall return one of the values 76NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP 77defined in ``include/linux/notifier.h`` 78 79NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. 80 81NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, 82MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops 83further processing of the notification queue. 84 85NOTIFY_STOP stops further processing of the notification queue. 86 87Numa node notifier 88------------------ 89 90There are six types of notification defined in ``include/linux/node.h``: 91 92NODE_ADDING_FIRST_MEMORY 93 Generated before memory becomes available to this node for the first time. 94 95NODE_CANCEL_ADDING_FIRST_MEMORY 96 Generated if NODE_ADDING_FIRST_MEMORY fails. 97 98NODE_ADDED_FIRST_MEMORY 99 Generated when memory has become available fo this node for the first time. 100 101NODE_REMOVING_LAST_MEMORY 102 Generated when the last memory available to this node is about to be offlined. 103 104NODE_CANCEL_REMOVING_LAST_MEMORY 105 Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails. 106 107NODE_REMOVED_LAST_MEMORY 108 Generated when the last memory available to this node has been offlined. 109 110A callback routine can be registered by calling:: 111 112 hotplug_node_notifier(callback_func, priority) 113 114Callback functions with higher values of priority are called before callback 115functions with lower values. 116 117A callback function must have the following prototype:: 118 119 int callback_func( 120 121 struct notifier_block *self, unsigned long action, void *arg); 122 123The first argument of the callback function (self) is a pointer to the block 124of the notifier chain that points to the callback function itself. 125The second argument (action) is one of the event types described above. 126The third argument (arg) passes a pointer of struct node_notify:: 127 128 struct node_notify { 129 int nid; 130 } 131 132- nid is the node we are adding or removing memory to. 133 134It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without 135having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to 136NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY. 137This can happen when a consumer fails, meaning we break the callchain and we 138stop calling the remaining consumers of the notifier. 139It is then important that users of node_notify make no assumptions and get 140prepared to handle such cases. 141 142The callback routine shall return one of the values 143NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP 144defined in ``include/linux/notifier.h`` 145 146NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. 147 148NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY, 149NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or 150NODE_REMOVED_LAST_MEMORY action to cancel hotplugging. 151It stops further processing of the notification queue. 152 153NOTIFY_STOP stops further processing of the notification queue. 154 155Please note that we should not fail for NODE_ADDED_FIRST_MEMORY / 156NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that 157point anymore. 158 159Locking Internals 160================= 161 162When adding/removing memory that uses memory block devices (i.e. ordinary RAM), 163the device_hotplug_lock should be held to: 164 165- synchronize against online/offline requests (e.g. via sysfs). This way, memory 166 block devices can only be accessed (.online/.state attributes) by user 167 space once memory has been fully added. And when removing memory, we 168 know nobody is in critical sections. 169- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) 170 171Especially, there is a possible lock inversion that is avoided using 172device_hotplug_lock when adding memory and user space tries to online that 173memory faster than expected: 174 175- device_online() will first take the device_lock(), followed by 176 mem_hotplug_lock 177- add_memory_resource() will first take the mem_hotplug_lock, followed by 178 the device_lock() (while creating the devices, during bus_add_device()). 179 180As the device is visible to user space before taking the device_lock(), this 181can result in a lock inversion. 182 183onlining/offlining of memory should be done via device_online()/ 184device_offline() - to make sure it is properly synchronized to actions 185via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) 186 187When adding/removing/onlining/offlining memory or adding/removing 188heterogeneous/device memory, we should always hold the mem_hotplug_lock in 189write mode to serialise memory hotplug (e.g. access to global/zone 190variables). 191 192In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read 193mode allows for a quite efficient get_online_mems/put_online_mems 194implementation, so code accessing memory can protect from that memory 195vanishing. 196