xref: /linux/Documentation/admin-guide/mm/swap_numa.rst (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1*ad782c48SSeongJae Park===========================================
2*ad782c48SSeongJae ParkAutomatically bind swap device to numa node
3*ad782c48SSeongJae Park===========================================
4*ad782c48SSeongJae Park
5*ad782c48SSeongJae ParkIf the system has more than one swap device and swap device has the node
6*ad782c48SSeongJae Parkinformation, we can make use of this information to decide which swap
7*ad782c48SSeongJae Parkdevice to use in get_swap_pages() to get better performance.
8*ad782c48SSeongJae Park
9*ad782c48SSeongJae Park
10*ad782c48SSeongJae ParkHow to use this feature
11*ad782c48SSeongJae Park=======================
12*ad782c48SSeongJae Park
13*ad782c48SSeongJae ParkSwap device has priority and that decides the order of it to be used. To make
14*ad782c48SSeongJae Parkuse of automatically binding, there is no need to manipulate priority settings
15*ad782c48SSeongJae Parkfor swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
16*ad782c48SSeongJae ParkswapB, with swapA attached to node 0 and swapB attached to node 1, are going
17*ad782c48SSeongJae Parkto be swapped on. Simply swapping them on by doing::
18*ad782c48SSeongJae Park
19*ad782c48SSeongJae Park	# swapon /dev/swapA
20*ad782c48SSeongJae Park	# swapon /dev/swapB
21*ad782c48SSeongJae Park
22*ad782c48SSeongJae ParkThen node 0 will use the two swap devices in the order of swapA then swapB and
23*ad782c48SSeongJae Parknode 1 will use the two swap devices in the order of swapB then swapA. Note
24*ad782c48SSeongJae Parkthat the order of them being swapped on doesn't matter.
25*ad782c48SSeongJae Park
26*ad782c48SSeongJae ParkA more complex example on a 4 node machine. Assume 6 swap devices are going to
27*ad782c48SSeongJae Parkbe swapped on: swapA and swapB are attached to node 0, swapC is attached to
28*ad782c48SSeongJae Parknode 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
29*ad782c48SSeongJae ParkThe way to swap them on is the same as above::
30*ad782c48SSeongJae Park
31*ad782c48SSeongJae Park	# swapon /dev/swapA
32*ad782c48SSeongJae Park	# swapon /dev/swapB
33*ad782c48SSeongJae Park	# swapon /dev/swapC
34*ad782c48SSeongJae Park	# swapon /dev/swapD
35*ad782c48SSeongJae Park	# swapon /dev/swapE
36*ad782c48SSeongJae Park	# swapon /dev/swapF
37*ad782c48SSeongJae Park
38*ad782c48SSeongJae ParkThen node 0 will use them in the order of::
39*ad782c48SSeongJae Park
40*ad782c48SSeongJae Park	swapA/swapB -> swapC -> swapD -> swapE -> swapF
41*ad782c48SSeongJae Park
42*ad782c48SSeongJae ParkswapA and swapB will be used in a round robin mode before any other swap device.
43*ad782c48SSeongJae Park
44*ad782c48SSeongJae Parknode 1 will use them in the order of::
45*ad782c48SSeongJae Park
46*ad782c48SSeongJae Park	swapC -> swapA -> swapB -> swapD -> swapE -> swapF
47*ad782c48SSeongJae Park
48*ad782c48SSeongJae Parknode 2 will use them in the order of::
49*ad782c48SSeongJae Park
50*ad782c48SSeongJae Park	swapD/swapE -> swapA -> swapB -> swapC -> swapF
51*ad782c48SSeongJae Park
52*ad782c48SSeongJae ParkSimilaly, swapD and swapE will be used in a round robin mode before any
53*ad782c48SSeongJae Parkother swap devices.
54*ad782c48SSeongJae Park
55*ad782c48SSeongJae Parknode 3 will use them in the order of::
56*ad782c48SSeongJae Park
57*ad782c48SSeongJae Park	swapF -> swapA -> swapB -> swapC -> swapD -> swapE
58*ad782c48SSeongJae Park
59*ad782c48SSeongJae Park
60*ad782c48SSeongJae ParkImplementation details
61*ad782c48SSeongJae Park======================
62*ad782c48SSeongJae Park
63*ad782c48SSeongJae ParkThe current code uses a priority based list, swap_avail_list, to decide
64*ad782c48SSeongJae Parkwhich swap device to use and if multiple swap devices share the same
65*ad782c48SSeongJae Parkpriority, they are used round robin. This change here replaces the single
66*ad782c48SSeongJae Parkglobal swap_avail_list with a per-numa-node list, i.e. for each numa node,
67*ad782c48SSeongJae Parkit sees its own priority based list of available swap devices. Swap
68*ad782c48SSeongJae Parkdevice's priority can be promoted on its matching node's swap_avail_list.
69*ad782c48SSeongJae Park
70*ad782c48SSeongJae ParkThe current swap device's priority is set as: user can set a >=0 value,
71*ad782c48SSeongJae Parkor the system will pick one starting from -1 then downwards. The priority
72*ad782c48SSeongJae Parkvalue in the swap_avail_list is the negated value of the swap device's
73*ad782c48SSeongJae Parkdue to plist being sorted from low to high. The new policy doesn't change
74*ad782c48SSeongJae Parkthe semantics for priority >=0 cases, the previous starting from -1 then
75*ad782c48SSeongJae Parkdownwards now becomes starting from -2 then downwards and -1 is reserved
76*ad782c48SSeongJae Parkas the promoted value. So if multiple swap devices are attached to the same
77*ad782c48SSeongJae Parknode, they will all be promoted to priority -1 on that node's plist and will
78*ad782c48SSeongJae Parkbe used round robin before any other swap devices.
79