xref: /linux/Documentation/core-api/memory-hotplug.rst (revision b4ada0618eed0fbd1b1630f73deb048c592b06a1)
1.. _memory_hotplug:
2
3==============
4Memory hotplug
5==============
6
7Memory hotplug event notifier
8=============================
9
10Hotplugging events are sent to a notification queue.
11
12Memory notifier
13----------------
14
15There are six types of notification defined in ``include/linux/memory.h``:
16
17MEM_GOING_ONLINE
18  Generated before new memory becomes available in order to be able to
19  prepare subsystems to handle memory. The page allocator is still unable
20  to allocate from the new memory.
21
22MEM_CANCEL_ONLINE
23  Generated if MEM_GOING_ONLINE fails.
24
25MEM_ONLINE
26  Generated when memory has successfully brought online. The callback may
27  allocate pages from the new memory.
28
29MEM_GOING_OFFLINE
30  Generated to begin the process of offlining memory. Allocations are no
31  longer possible from the memory but some of the memory to be offlined
32  is still in use. The callback can be used to free memory known to a
33  subsystem from the indicated memory block.
34
35MEM_CANCEL_OFFLINE
36  Generated if MEM_GOING_OFFLINE fails. Memory is available again from
37  the memory block that we attempted to offline.
38
39MEM_OFFLINE
40  Generated after offlining memory is complete.
41
42A callback routine can be registered by calling::
43
44  hotplug_memory_notifier(callback_func, priority)
45
46Callback functions with higher values of priority are called before callback
47functions with lower values.
48
49A callback function must have the following prototype::
50
51  int callback_func(
52    struct notifier_block *self, unsigned long action, void *arg);
53
54The first argument of the callback function (self) is a pointer to the block
55of the notifier chain that points to the callback function itself.
56The second argument (action) is one of the event types described above.
57The third argument (arg) passes a pointer of struct memory_notify::
58
59	struct memory_notify {
60		unsigned long start_pfn;
61		unsigned long nr_pages;
62	}
63
64- start_pfn is start_pfn of online/offline memory.
65- nr_pages is # of pages of online/offline memory.
66
67It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
68for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
69MEM_GOING_OFFLINE.
70This can happen when a consumer fails, meaning we break the callchain and we
71stop calling the remaining consumers of the notifier.
72It is then important that users of memory_notify make no assumptions and get
73prepared to handle such cases.
74
75The callback routine shall return one of the values
76NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
77defined in ``include/linux/notifier.h``
78
79NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
80
81NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
82MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
83further processing of the notification queue.
84
85NOTIFY_STOP stops further processing of the notification queue.
86
87Numa node notifier
88------------------
89
90There are six types of notification defined in ``include/linux/node.h``:
91
92NODE_ADDING_FIRST_MEMORY
93 Generated before memory becomes available to this node for the first time.
94
95NODE_CANCEL_ADDING_FIRST_MEMORY
96 Generated if NODE_ADDING_FIRST_MEMORY fails.
97
98NODE_ADDED_FIRST_MEMORY
99 Generated when memory has become available fo this node for the first time.
100
101NODE_REMOVING_LAST_MEMORY
102 Generated when the last memory available to this node is about to be offlined.
103
104NODE_CANCEL_REMOVING_LAST_MEMORY
105 Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
106
107NODE_REMOVED_LAST_MEMORY
108 Generated when the last memory available to this node has been offlined.
109
110A callback routine can be registered by calling::
111
112  hotplug_node_notifier(callback_func, priority)
113
114Callback functions with higher values of priority are called before callback
115functions with lower values.
116
117A callback function must have the following prototype::
118
119  int callback_func(
120
121    struct notifier_block *self, unsigned long action, void *arg);
122
123The first argument of the callback function (self) is a pointer to the block
124of the notifier chain that points to the callback function itself.
125The second argument (action) is one of the event types described above.
126The third argument (arg) passes a pointer of struct node_notify::
127
128        struct node_notify {
129                int nid;
130        }
131
132- nid is the node we are adding or removing memory to.
133
134It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without
135having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to
136NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY.
137This can happen when a consumer fails, meaning we break the callchain and we
138stop calling the remaining consumers of the notifier.
139It is then important that users of node_notify make no assumptions and get
140prepared to handle such cases.
141
142The callback routine shall return one of the values
143NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
144defined in ``include/linux/notifier.h``
145
146NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
147
148NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
149NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
150NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
151It stops further processing of the notification queue.
152
153NOTIFY_STOP stops further processing of the notification queue.
154
155Please note that we should not fail for NODE_ADDED_FIRST_MEMORY /
156NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that
157point anymore.
158
159Locking Internals
160=================
161
162When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
163the device_hotplug_lock should be held to:
164
165- synchronize against online/offline requests (e.g. via sysfs). This way, memory
166  block devices can only be accessed (.online/.state attributes) by user
167  space once memory has been fully added. And when removing memory, we
168  know nobody is in critical sections.
169- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
170
171Especially, there is a possible lock inversion that is avoided using
172device_hotplug_lock when adding memory and user space tries to online that
173memory faster than expected:
174
175- device_online() will first take the device_lock(), followed by
176  mem_hotplug_lock
177- add_memory_resource() will first take the mem_hotplug_lock, followed by
178  the device_lock() (while creating the devices, during bus_add_device()).
179
180As the device is visible to user space before taking the device_lock(), this
181can result in a lock inversion.
182
183onlining/offlining of memory should be done via device_online()/
184device_offline() - to make sure it is properly synchronized to actions
185via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
186
187When adding/removing/onlining/offlining memory or adding/removing
188heterogeneous/device memory, we should always hold the mem_hotplug_lock in
189write mode to serialise memory hotplug (e.g. access to global/zone
190variables).
191
192In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
193mode allows for a quite efficient get_online_mems/put_online_mems
194implementation, so code accessing memory can protect from that memory
195vanishing.
196