xref: /linux/Documentation/mm/swap-table.rst (revision 8804d970fab45726b3c7cd7f240b31122aa94219)
1*87cc5157SChris Li.. SPDX-License-Identifier: GPL-2.0
2*87cc5157SChris Li
3*87cc5157SChris Li:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>
4*87cc5157SChris Li
5*87cc5157SChris Li==========
6*87cc5157SChris LiSwap Table
7*87cc5157SChris Li==========
8*87cc5157SChris Li
9*87cc5157SChris LiSwap table implements swap cache as a per-cluster swap cache value array.
10*87cc5157SChris Li
11*87cc5157SChris LiSwap Entry
12*87cc5157SChris Li----------
13*87cc5157SChris Li
14*87cc5157SChris LiA swap entry contains the information required to serve the anonymous page
15*87cc5157SChris Lifault.
16*87cc5157SChris Li
17*87cc5157SChris LiSwap entry is encoded as two parts: swap type and swap offset.
18*87cc5157SChris Li
19*87cc5157SChris LiThe swap type indicates which swap device to use.
20*87cc5157SChris LiThe swap offset is the offset of the swap file to read the page data from.
21*87cc5157SChris Li
22*87cc5157SChris LiSwap Cache
23*87cc5157SChris Li----------
24*87cc5157SChris Li
25*87cc5157SChris LiSwap cache is a map to look up folios using swap entry as the key. The result
26*87cc5157SChris Livalue can have three possible types depending on which stage of this swap entry
27*87cc5157SChris Liwas in.
28*87cc5157SChris Li
29*87cc5157SChris Li1. NULL: This swap entry is not used.
30*87cc5157SChris Li
31*87cc5157SChris Li2. folio: A folio has been allocated and bound to this swap entry. This is
32*87cc5157SChris Li   the transient state of swap out or swap in. The folio data can be in
33*87cc5157SChris Li   the folio or swap file, or both.
34*87cc5157SChris Li
35*87cc5157SChris Li3. shadow: The shadow contains the working set information of the swapped
36*87cc5157SChris Li   out folio. This is the normal state for a swapped out page.
37*87cc5157SChris Li
38*87cc5157SChris LiSwap Table Internals
39*87cc5157SChris Li--------------------
40*87cc5157SChris Li
41*87cc5157SChris LiThe previous swap cache is implemented by XArray. The XArray is a tree
42*87cc5157SChris Listructure. Each lookup will go through multiple nodes. Can we do better?
43*87cc5157SChris Li
44*87cc5157SChris LiNotice that most of the time when we look up the swap cache, we are either
45*87cc5157SChris Liin a swap in or swap out path. We should already have the swap cluster,
46*87cc5157SChris Liwhich contains the swap entry.
47*87cc5157SChris Li
48*87cc5157SChris LiIf we have a per-cluster array to store swap cache value in the cluster.
49*87cc5157SChris LiSwap cache lookup within the cluster can be a very simple array lookup.
50*87cc5157SChris Li
51*87cc5157SChris LiWe give such a per-cluster swap cache value array a name: the swap table.
52*87cc5157SChris Li
53*87cc5157SChris LiA swap table is an array of pointers. Each pointer is the same size as a
54*87cc5157SChris LiPTE. The size of a swap table for one swap cluster typically matches a PTE
55*87cc5157SChris Lipage table, which is one page on modern 64-bit systems.
56*87cc5157SChris Li
57*87cc5157SChris LiWith swap table, swap cache lookup can achieve great locality, simpler,
58*87cc5157SChris Liand faster.
59*87cc5157SChris Li
60*87cc5157SChris LiLocking
61*87cc5157SChris Li-------
62*87cc5157SChris Li
63*87cc5157SChris LiSwap table modification requires taking the cluster lock. If a folio
64*87cc5157SChris Liis being added to or removed from the swap table, the folio must be
65*87cc5157SChris Lilocked prior to the cluster lock. After adding or removing is done, the
66*87cc5157SChris Lifolio shall be unlocked.
67*87cc5157SChris Li
68*87cc5157SChris LiSwap table lookup is protected by RCU and atomic read. If the lookup
69*87cc5157SChris Lireturns a folio, the user must lock the folio before use.
70